Cloud Platform Staff Site Reliability Engineer

4 weeks ago


Redwood City, California, United States Zilliz Full time
About Zilliz

Zilliz is a fast-growing startup that specializes in developing the industry's leading vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world's most popular open-source vector database, the company builds next-generation database technologies to help organizations quickly create AI applications. Our mission is to democratize AI by simplifying data management for AI applications and making vector databases accessible to every organization.

Job Responsibilities
  • Work at the intersection of development and site reliability, creating SRE tools and systems, as well as supporting existing infrastructure and platforms.
  • Ensure the reliability, availability, and performance of Zilliz's distributed database systems.
  • Develop and implement strategies for monitoring, incident management, and disaster recovery.
  • Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention.
  • Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness.
  • Collaborate with software engineers to enhance system reliability, scalability, and performance.
  • Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes.
  • Actively contribute to the Milvus open-source community, focusing on improving reliability and operational efficiency.
Requirements
  • 4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems.
  • Proficiency in scripting languages such as Python, Go, or Java.
  • Strong knowledge of container orchestration technologies like Kubernetes and Docker.
  • Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools.
  • Experience with infrastructure as code tools such as Terraform or Ansible.
  • Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo.
  • Proven ability to troubleshoot complex distributed systems and resolve issues promptly.
  • Bachelor's degree or above in computer science, software engineering, or other relevant disciplines.
  • Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously.
Benefits
  • Competitive compensation (cash + equity)
  • Regular bonus and equity refresh opportunities
  • Medical, dental, and vision insurance
  • Paid time off, including vacation, sick leave, and global reset/wellbeing days
  • Generous 401(k) and regional retirement plans

Compensation Range: $160,000-$230,000 USD

Zilliz is committed to building an inclusive and diverse workforce. We are an Equal Opportunity Employer and welcome people from all backgrounds, experiences, abilities, and perspectives. All qualified applicants will receive consideration for employment regardless of race, color, national origin, religion, sexual orientation, gender, gender identity, age, physical disability, or length of time spent unemployed.



  • Redwood City, California, United States Box Full time

    Transforming the Way the World Works TogetherAt Box, we're revolutionizing Cloud Content Management, and we need a talented Senior Software Engineer, Site Reliability Engineering to join our team. As a key member of our SRE organization, you'll play a crucial role in bringing AI to our content cloud, ensuring the reliability and scalability of our...


  • Redwood City, California, United States Box Full time

    Transform the Future of Content ManagementAt Box, we're revolutionizing the way organizations work with content. As a Senior Engineering Manager, Site Reliability Operations, you'll play a critical role in ensuring the seamless operation of our cloud infrastructure. Join our team and be part of shaping the future of content management.Key...


  • Foster City, California, United States Omega Solutions Inc Full time

    Job Title: Site Reliability EngineerAt Omega Solutions Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our critical platforms and applications.Key Responsibilities:* 8+ years of experience in Site Reliability...


  • Foster City, California, United States Zoox Full time

    About the RoleZoox is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the uptime and reliability of our autonomous vehicle fleet's critical systems.Key ResponsibilitiesDesign and implement scalable and fault-tolerant systems for our autonomous vehicle...


  • Redwood City, California, United States Bear Robotics, Inc. Full time

    Job Title: Software Engineer PlatformJob Summary:We are seeking a skilled Software Engineer to join our team at Bear Robotics, Inc. As a key member of our team, you will design and develop a scalable and secure API to enable our software's integration with third-party services. Your expertise will contribute to our fast-paced startup environment by...


  • Redwood City, California, United States Karius, Inc. Full time

    About KariusKarius is a pioneering life sciences company that is revolutionizing the way pathogens and other microbes are observed throughout the body.By unlocking the information present in microbial cell-free DNA, we're empowering doctors to quickly solve their most challenging cases, providing industry partners with access to the microbial landscape to...


  • Redwood City, California, United States C3 AI Full time

    About the RoleC3 AI is seeking a highly skilled Senior Software Engineer to join our rapidly growing Data org within the Platform Engineering department. As a key member of our team, you will design, develop, and maintain various features in a highly scalable and extensible AI/ML platform for large-scale applications, involving data science, distributed...


  • Redwood City, California, United States C3 AI Full time

    Senior Software Engineer, PlatformC3 AI is seeking a highly skilled Senior Software Engineer to join the Platform Engineering department. As a key member of the team, you will design, develop, and maintain various features in a highly scalable and extensible AI/ML platform for large-scale applications.You will work on high-value technologies at the...

  • Staff Data Engineer

    4 weeks ago


    Redwood City, California, United States Karius Full time

    About KariusKarius is a leading life science company that is revolutionizing the way pathogens and other microbes are observed throughout the body. By unlocking the information present in microbial cell-free DNA, we're helping doctors quickly solve their most challenging cases, providing industry partners with access to thousands of biomarkers to accelerate...


  • Redwood City, California, United States Box Full time

    About BoxBox is the world's leading Content Cloud, trusted by over 115,000 organizations worldwide, including nearly 70% of the Fortune 500. Our mission is to bring intelligence to content management and empower our customers to transform workflows across their organizations.Our TeamThe Service Mesh Team at Box is responsible for building and expanding our...


  • Redwood City, California, United States Stanford University Full time

    Job SummaryStanford University is seeking an experienced Service Reliability Engineer to join its Enterprise Technology team. The successful candidate will be responsible for deploying and managing highly available hybrid systems on-premise and in the cloud, focusing on Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) offerings.Key...

  • Senior Cloud Engineer

    4 weeks ago


    Redwood City, California, United States Box Full time

    Box is the market leader for Cloud Content Management, and we're looking for a skilled Senior Cloud Engineer to join our team.As a Senior Cloud Engineer, you will be responsible for designing and implementing scalable, secure, and reliable cloud-based systems that meet the needs of our customers.You will work closely with our cross-functional teams to ensure...


  • Redwood City, California, United States Karius, Inc. Full time

    About KariusKarius is a life science company that is revolutionizing the way pathogens and microbes are observed throughout the body.By unlocking the information present in microbial cell-free DNA, we're helping doctors quickly solve their most challenging cases, providing industry partners with access to the microbial landscape to accelerate biomarker...


  • Redwood City, California, United States Snorkel AI Inc. Full time

    Lead the AI Platform Team at Snorkel AI Inc.We're on a mission to democratize AI by building the definitive AI data development platform. Our AI Platform team builds innovative software systems to power the Snorkel Flow platform, including services to train and serve generative AI and machine learning models using novel data-centric techniques, libraries to...

  • Software Engineer

    4 weeks ago


    Redwood City, California, United States hireVouch Full time

    Software Engineer - Distributed Systems and PlatformsWe're seeking a skilled Software Engineer to join our team at hireVouch. As a key member of our engineering team, you'll be responsible for designing and implementing a fault-tolerant distributed runtime for Shoreline Op, a purpose-built operations-oriented language. Your work will involve integrating with...


  • Redwood City, California, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for designing and delivering mission-critical cloud infrastructure, with a focus on security, resiliency, scale, and performance.Key Responsibilities:Work with the Site Reliability Engineering (SRE) team to...

  • Staff Data Engineer

    4 weeks ago


    Redwood City, California, United States Karius Full time

    About KariusKarius is a venture-backed life science startup that is transforming the way pathogens and other microbes are observed throughout the body. By unlocking the information present in microbial cell-free DNA, we're helping doctors quickly solve their most challenging cases, providing industry partners with access to 1000's of biomarkers to accelerate...


  • Redwood City, California, United States Box Full time

    Unlock the Power of AI with BoxBox is the market leader for Cloud Content Management, and we're looking for a talented Staff Machine Learning Engineer to join our Intelligent Platform team. As a key member of our Box AI team, you'll work closely with senior leadership to define our AI strategy and build out the AI Platform that powers Box AI.With Box AI,...


  • Redwood City, California, United States QIAGEN Full time

    OverviewAt QIAGEN, we're driven by a vision to make improvements in life possible. Our mission is to empower life scientists to gain insights from molecular information in their biological samples. We're seeking a talented Data Platform Engineer to join our team in Redwood City.ResponsibilitiesDesign, develop, and maintain robust, scalable, high-performance...

  • Senior Cloud Engineer

    4 weeks ago


    Redwood City, California, United States Snorkel AI Full time

    We're on a mission to democratize AI by building the definitive AI data development platform.The AI landscape has undergone significant change since Snorkel AI started as a research project in the Stanford AI Lab.However, one constant remains: the data used to build AI is the key to achieving differentiation, high performance, and production-ready systems.We...