Staff Site Reliability Engineer Cloud Platform

2 months ago


California, United States Zilliz Full time

What you will do:

Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms. Ensure the reliability, availability, and performance of Zilliz’s distributed database systems. Develop and implement strategies for monitoring, incident management, and disaster recovery. Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention. Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness. Collaborate with software engineers to enhance system reliability, scalability, and performance. Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes. Actively contribute to the Milvus open-source community, focusing on improving reliability and operational efficiency. What we are looking for:

4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems. Proficiency in scripting languages such as Python, Go, or Java. Strong knowledge of container orchestration technologies like Kubernetes and Docker. Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools. Experience with infrastructure as code tools such as Terraform or Ansible. Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo. Proven ability to troubleshoot complex distributed systems and resolve issues promptly. Bachelor’s degree or above in computer science, software engineering, or other relevant disciplines. Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously.

#J-18808-Ljbffr



  • California, United States Zilliz Full time

    What you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms. Ensure the reliability, availability, and performance of Zillizs distributed database systems. Develop and implement strategies for monitoring, incident management, and disaster...


  • California, Missouri, United States Bitwarden Inc. Full time

    About BitwardenBitwarden empowers organizations, developers, and individuals to securely manage and share sensitive information. With a transparent, open-source approach to password management, secrets management, and innovations in passwordless and passkey technologies, Bitwarden simplifies the implementation of robust security practices across all online...


  • California, United States JobBoard.io Full time

    By making evidence the heart of security, we help customers stay ahead of ever-changing cyber-attacks. Corelight is a cybersecurity company that transforms network and cloud activity into evidence. Evidence that elite defenders use to proactively hunt for threats, accelerate response to cyber incidents, gain complete network visibility, and create powerful...


  • California, Missouri, United States Bitwarden Inc. Full time

    About BitwardenBitwarden empowers organizations, developers, and individuals to securely manage and share sensitive information. With a transparent, open-source approach to password management, secrets management, and innovations in passwordless and passkey solutions, Bitwarden facilitates the extension of robust security practices across all online...


  • California, Missouri, United States Insight Global Full time

    Position Overview: A leading media organization is in search of a dedicated team of Site Reliability Engineers to enhance their streaming services. This role demands extensive expertise in cloud technologies, particularly AWS, alongside proficiency in Kubernetes, Terraform, and Python.Key Responsibilities: - Demonstrate robust experience as a Site...


  • California, United States Life Science People Full time

    Life Science People is collaborating with a well-capitalized organization in search of a talented Systems Software Engineer to enhance a dedicated team of engineers.This team is at the forefront of integrating laboratory automation, high-throughput assays, and machine learning, transforming biological discovery into a digital format. Their primary focus is...


  • California, United States Job Board Full time

    By making evidence the heart of security, we help customers stay ahead of ever-changing cyber-attacks. Corelight is a cybersecurity company that transforms network and cloud activity into evidence. Evidence that elite defenders use to proactively hunt for threats, accelerate response to cyber incidents, gain complete network visibility, and create powerful...


  • California, Missouri, United States Open Systems Technologies Full time

    Company Overview:Open Systems Technologies is a leading financial services firm focused on innovative technology solutions.Position Summary:We are seeking a skilled Cloud Infrastructure Engineer to enhance our team.Compensation: $150-200kKey Responsibilities:• Architect, deploy, and oversee AWS cloud environments utilizing Terraform and CloudFormation.•...


  • San Jose, California, United States Hireio, Inc. Full time

    About the CompanyHireio, Inc. is a leading technology company that specializes in short-form mobile video hosting services. With over 1.3 billion mobile downloads in the United States and 2 billion worldwide, we have established ourselves as a leader in the industry.About the TeamOur Data Infrastructure team is a pioneer in innovation, seamlessly merging...


  • California, United States CTI Full time

    Job DescriptionJob DescriptionWho we areCTI is a leading software, systems, and operational support corporation, specializing in providing user-focused technologies for military and other security applications. We are dedicated to engineering solutions on open, government-owned platforms to ensure the right capabilities are employed on the battlefield.We are...


  • California, United States Sequoia Full time

    About SequoiaFor over two decades, Sequoia has been dedicated to enhancing the employee experience for people-centric organizations. Our mission is rooted in the belief that prioritizing employee welfare leads to superior business results. Our team is committed to empowering clients through strategic guidance, exceptional service, and the innovative Sequoia...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:As a Kubernetes Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our cloud-based systems. Your primary responsibility will be to maintain high availability,...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RoleJob Overview:As a Kubernetes Site Reliability Engineer, you will play a crucial role in managing essential cloud infrastructure to ensure uninterrupted service, facilitate seamless scaling, and enable the deployment of innovative...


  • Los Gatos, California, United States Netflix Full time

    "At Netflix, we strive to bring joy to people across the world through amazing stories. As we grow internationally, we are continually enhancing our cloud-based infrastructure to improve our performance, scalability, and reliability.The SRE team's goal is to ensure customer joy by successfully managing risk and minimizing impact across Netflix. We do this...


  • California, Missouri, United States Insight Global Full time

    Position Title: Site Reliability Engineer (AWS/Kubernetes/Python/Terraform)Job Overview:A leading media organization is in search of skilled Site Reliability Engineers to enhance their streaming operations. This role demands extensive expertise in AWS, Kubernetes, Terraform, and Python, contributing to a permanent role within the company.Key...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:The role involves overseeing essential cloud infrastructures to ensure uninterrupted service, facilitate seamless scaling, and enable the development of new applications and services. We seek a driven engineer who is...


  • California, United States CTI Full time

    Job DescriptionJob DescriptionCTI is seeking a Senior Platform Engineer to join our dynamic engineering team. As a Senior Platform Engineer, you will collaborate with cross-functional teams to design, develop, and maintain scalable platforms that support our growing technology infrastructure. You will have the opportunity to work on innovative projects,...


  • California, United States Storm3 Full time

    Position: Senior Platform EngineerCompensation: Base Salary: 165K-200KCompany Overview: Storm3 is a pioneering Series A HealthTech startup focused on enhancing mental and emotional well-being through innovative technology.Role Overview: We are in search of a Senior Platform Engineer to contribute to the evolution of a scientifically-driven XR application...


  • California, United States Storm3 Full time

    Position: Senior Platform EngineerCompensation: $165,000 - $200,000Company Overview: Storm3 is a pioneering Series A startup in the HealthTech sector.We are in search of a Senior Platform Engineer to enhance a scientifically-driven XR application aimed at promoting mental and emotional wellness, known as "Inner Fitness." This role is crucial for a skilled...


  • California, United States Storm3 Full time

    Senior Platform EngineerCompensation: 165K-200KCompany Overview: Series A HealthTech StartupStorm3 is on the lookout for a Senior Platform Engineer to enhance a science-driven XR application aimed at promoting mental and emotional wellness, known as "Inner Fitness." We are searching for a seasoned professional to become part of our dynamic team, focusing on...