Current jobs related to Cloud Platform Site Reliability Engineer - Redwood City, California - Zilliz


  • Redwood City, California, United States Zilliz Full time

    About ZillizZilliz is a pioneering startup that specializes in developing cutting-edge vector database technologies for enterprise-grade AI applications.As the company behind the world's most popular open-source vector database, Milvus, Zilliz is committed to simplifying data management for AI applications and making vector databases accessible to every...


  • Redwood City, California, United States Zilliz Full time

    About ZillizZilliz is a fast-growing startup that specializes in developing the industry's leading vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world's most popular open-source vector database, the company builds next-generation database technologies to help organizations quickly create AI applications. Our...


  • Redwood City, California, United States Oracle Full time

    Transforming Cloud Application DevelopmentThe Oracle Fusion Applications group is designing and building the next-generation deployment platform for its suite of software products.We focus on transforming how software developers and DevOps engineers build cloud applications for enterprise reliability and scalability.As a Principal Site Reliability Engineer,...


  • Redwood City, California, United States Box Full time

    About BoxBox is the market leader for Cloud Content Management, empowering businesses to accelerate their digital transformation. Our mission is to power how the world works together, and we're seeking a talented Senior Software Engineer to join our Site Reliability Engineering team.Job SummaryWe're looking for a highly skilled Senior Software Engineer to...


  • Redwood City, California, United States Box Full time

    Transforming the Way the World Works TogetherAt Box, we're revolutionizing Cloud Content Management, and we need a talented Senior Software Engineer, Site Reliability Engineering to join our team. As a key member of our SRE organization, you'll play a crucial role in bringing AI to our content cloud, ensuring the reliability and scalability of our...


  • Redwood City, California, United States Moloco Full time

    About MolocoMoloco is a pioneering machine learning company that empowers organizations to unlock the full value of their unique first-party data, revolutionizing the traditional path to performance advertising. By harnessing the power of cutting-edge machine learning technologies, we play a unique and visible role in shaping the digital economy, allowing...


  • Redwood City, California, United States Box Full time

    Transform the Future of Content ManagementAt Box, we're revolutionizing the way organizations work with content. As a Senior Engineering Manager, Site Reliability Operations, you'll play a critical role in ensuring the seamless operation of our cloud infrastructure. Join our team and be part of shaping the future of content management.Key...


  • Foster City, California, United States Omega Solutions Inc Full time

    Job Title: Site Reliability EngineerAt Omega Solutions Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our critical platforms and applications.Key Responsibilities:* 8+ years of experience in Site Reliability...


  • Redwood City, California, United States Karius Full time

    About KariusKarius is a venture-backed life science startup that is revolutionizing the way pathogens and other microbes are observed throughout the body.By unlocking the information present in microbial cell-free DNA, we're helping doctors quickly solve their most challenging cases, providing industry partners with access to the microbial landscape to...


  • Redwood City, California, United States Stanford University Full time

    Job Title: Service Reliability EngineerStanford University is seeking an experienced Service Reliability Engineer to join the Enterprise Technology team. This role will support the implementation, maintenance, and upkeep of on-premise and cloud systems.Key Responsibilities:Deploy and manage highly available hybrid systems on on-premise and cloud platforms...


  • Foster City, California, United States Zoox Full time

    About the RoleZoox is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the uptime and reliability of our autonomous vehicle fleet's critical systems.Key ResponsibilitiesDesign and implement scalable and fault-tolerant systems for our autonomous vehicle...


  • Redwood City, California, United States Bear Robotics, Inc. Full time

    Job Title: Software Engineer PlatformJob Summary:We are seeking a skilled Software Engineer to join our team at Bear Robotics, Inc. As a key member of our team, you will design and develop a scalable and secure API to enable our software's integration with third-party services. Your expertise will contribute to our fast-paced startup environment by...


  • Redwood City, California, United States C3 AI Full time

    About the RoleC3 AI is seeking a highly skilled Senior Software Engineer to join our rapidly growing Data org within the Platform Engineering department. As a key member of our team, you will design, develop, and maintain various features in a highly scalable and extensible AI/ML platform for large-scale applications, involving data science, distributed...


  • Redwood City, California, United States Bear Robotics, Inc. Full time

    Job Title:Software Engineer - PlatformDepartment:Software Engineering**Job Level: L4**FLSA: ExemptJob Summary:Bear Robotics, Inc. is seeking a skilled Software Engineer to design and develop a scalable and secure API for integrating our software with third-party services. As a key member of our team, you will play a crucial role in enhancing our product's...


  • Foster City, California, United States Bayone Full time

    Job DescriptionAs a Site Reliability Engineer at Bayone, you will be responsible for ensuring the smooth operation of our large production service. This includes:Key ResponsibilitiesService Maintenance: Perform regular host OS upgrades, Docker image upgrades, and SSL certificate upgrades to ensure the service remains up-to-date and secure.Metrics and...


  • Redwood City, California, United States C3 AI Full time

    Senior Software Engineer, PlatformC3 AI is seeking a highly skilled Senior Software Engineer to join the Platform Engineering department. As a key member of the team, you will design, develop, and maintain various features in a highly scalable and extensible AI/ML platform for large-scale applications.You will work on high-value technologies at the...


  • Redwood City, California, United States Box Full time

    About BoxBox is the world's leading Content Cloud, trusted by over 115,000 organizations worldwide, including nearly 70% of the Fortune 500. Our mission is to bring intelligence to content management and empower our customers to transform workflows across their organizations.Our TeamThe Service Mesh Team at Box is responsible for building and expanding our...


  • Redwood City, California, United States Karius, Inc. Full time

    About KariusKarius is a pioneering life sciences company that is revolutionizing the way pathogens and other microbes are observed throughout the body.By unlocking the information present in microbial cell-free DNA, we're empowering doctors to quickly solve their most challenging cases, providing industry partners with access to the microbial landscape to...


  • Foster City, California, United States Zoox Full time

    About the RoleZoox is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining the systems that support our autonomous vehicle fleet.Key ResponsibilitiesDesign and implement scalable, fault-tolerant systems to support our autonomous...


  • Redwood City, California, United States Stanford University Full time

    Job SummaryStanford University is seeking an experienced Service Reliability Engineer to join its Enterprise Technology team. The successful candidate will be responsible for deploying and managing highly available hybrid systems on-premise and in the cloud, focusing on Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) offerings.Key...

Cloud Platform Site Reliability Engineer

2 months ago


Redwood City, California, United States Zilliz Full time
About Zilliz

Zilliz is a pioneering startup that specializes in developing cutting-edge vector database technologies for enterprise-grade AI applications.

As the company behind the world's most popular open-source vector database, Milvus, Zilliz is committed to simplifying data management for AI applications and making vector databases accessible to every organization.

What You Will Do:
  1. Work at the intersection of development and site reliability, creating SRE tools and systems, as well as supporting existing infrastructure and platforms.
  2. Ensure the reliability, availability, and performance of Zilliz's distributed database systems.
  3. Develop and implement strategies for monitoring, incident management, and disaster recovery.
  4. Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention.
  5. Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness.
  6. Collaborate with software engineers to enhance system reliability, scalability, and performance.
  7. Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes.
  8. Actively contribute to the Milvus open-source community, focusing on improving reliability and operational efficiency.
What We Are Looking For:
  • 4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems.
  • Proficiency in scripting languages such as Python, Go, or Java.
  • Strong knowledge of container orchestration technologies like Kubernetes and Docker.
  • Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools.
  • Experience with infrastructure as code tools such as Terraform or Ansible.
  • Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo.
  • Proven ability to troubleshoot complex distributed systems and resolve issues promptly.
  • Bachelor's degree or above in computer science, software engineering, or other relevant disciplines.
  • Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously.
Benefits:
  • Competitive compensation (cash + equity)
  • Regular bonus and equity refresh opportunities
  • Medical, dental, and vision insurance
  • Paid time off, including vacation, sick leave, and global reset/wellbeing days
  • Generous 401(k) and regional retirement plans

Zilliz is committed to building an inclusive and diverse workforce. We are an Equal Opportunity Employer and welcome people from all backgrounds, experiences, abilities, and perspectives.