Kubernetes Reliability Engineer

1 week ago


California, United States Bayside Solutions Full time

Kubernetes Site Reliability Engineer

W2 Contract

Salary Range: $124,800 - $145,600 per year

Location: Cupertino, CA - Hybrid Role

Position Overview:

As a Kubernetes Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our cloud-based systems. Your primary responsibility will be to maintain high availability, facilitate seamless scaling, and support the deployment of new applications and services. We are looking for a dedicated engineer who values quality, precision, and operational excellence, focusing on enhancing system stability and scalability.

Key Responsibilities:

  • Oversee mission-critical cloud environments to ensure uninterrupted service delivery.
  • Implement best practices for system monitoring and performance optimization.
  • Collaborate with cross-functional teams to troubleshoot and resolve issues across the software stack.

Qualifications:

  • Certified Kubernetes Administrator (CKA) Certification is mandatory.
  • Proven experience managing production K8s clusters with over 100 nodes.
  • Extensive expertise in Kubernetes and its ecosystem.
  • Familiarity with monitoring tools such as Prometheus, Alert Manager, and Grafana.
  • Strong understanding of Linux operating systems and distributions.
  • Exceptional communication skills to effectively convey technical concepts.

Preferred Skills:

Experience in Site Reliability Engineering (SRE), Kubernetes, K8s, CKA, Prometheus, Alert Manager, Grafana, and Linux is highly desirable.

Bayside Solutions, Inc. may collect your personal information during the position application process. Please reference Bayside Solutions, Inc.'s CCPA Privacy Policy at



  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RoleJob Overview:As a Kubernetes Site Reliability Engineer, you will play a crucial role in managing essential cloud infrastructure to ensure uninterrupted service, facilitate seamless scaling, and enable the deployment of innovative...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:The role involves overseeing essential cloud infrastructures to ensure uninterrupted service, facilitate seamless scaling, and enable the development of new applications and services. We seek a driven engineer who is...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:The role involves overseeing essential cloud infrastructure to ensure uninterrupted service, facilitate seamless scaling, and enable the development of new applications and services. We seek a driven engineer who is...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:The role involves overseeing essential cloud infrastructures to ensure continuous availability, facilitate seamless scaling, and support the development of new applications and services. We are seeking a driven...


  • California, United States Bayside Solutions Full time

    Kubernetes Site Reliability EngineerW2 ContractSalary Range: $124,800 - $145,600 per yearLocation: Cupertino, CA - Hybrid RolePosition Overview:The primary responsibility of this role is to oversee critical cloud infrastructure, ensuring consistent uptime, facilitating seamless scaling, and enabling the development of new applications and services. We are...


  • California, United States Insight Global Full time

    Site Reliability Engineer (AWS/Kubernetes/Python/Terraform) Post Date: Jul 02, 2024 Location: ZIP/Postal Code 90067 Job Type: Permanent Category: Software Engineering Pay Rate: $95k - $212k (estimate) Job Description A media company is seeking a team of SREs to join their streaming team. The role requires strong experience in AWS, Kubernetes, Terraform, and...


  • California, Missouri, United States Insight Global Full time

    Position Title: Site Reliability Engineer (AWS/Kubernetes/Python/Terraform)Job Overview:A leading media organization is in search of skilled Site Reliability Engineers to enhance their streaming operations. This role demands extensive expertise in AWS, Kubernetes, Terraform, and Python, contributing to a permanent role within the company.Key...


  • California, Missouri, United States Insight Global Full time

    Position Overview: A leading media organization is in search of a dedicated team of Site Reliability Engineers to enhance their streaming services. This role demands extensive expertise in cloud technologies, particularly AWS, alongside proficiency in Kubernetes, Terraform, and Python.Key Responsibilities: - Demonstrate robust experience as a Site...


  • California, United States Zilliz Full time

    What you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms. Ensure the reliability, availability, and performance of Zilliz’s distributed database systems. Develop and implement strategies for monitoring, incident management, and disaster...


  • California, United States Zilliz Full time

    What you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms. Ensure the reliability, availability, and performance of Zillizs distributed database systems. Develop and implement strategies for monitoring, incident management, and disaster...


  • San Jose, California, United States Hireio, Inc. Full time

    About the CompanyHireio, Inc. is a leading technology company that specializes in short-form mobile video hosting services. With over 1.3 billion mobile downloads in the United States and 2 billion worldwide, we have established ourselves as a leader in the industry.About the TeamOur Data Infrastructure team is a pioneer in innovation, seamlessly merging...


  • California, Missouri, United States Insight Global Full time

    Position OverviewWe are seeking an experienced Infrastructure Reliability Specialist to join our dynamic team. This role is crucial for maintaining the reliability and performance of our systems in a fast-paced environment.Key ResponsibilitiesManage and optimize cloud infrastructure using AWS services, including EKS and IAM.Implement and maintain Kubernetes...


  • San Jose, California, United States Hireio, Inc. Full time

    About UsHireio, Inc. is a leading video editing solution provider that aims to make content creation easier and more engaging. With a strong focus on innovation and customer satisfaction, we have established ourselves as a top player in the industry.Job DescriptionWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key...


  • San Jose, California, United States Hireio, Inc. Full time

    About Hireio, Inc.Hireio, Inc. stands at the forefront of the mobile video landscape, recognized as a premier platform for short-form video content. As a leading Unicorn startup, we have achieved remarkable milestones, including over 1.3 billion mobile downloads in the United States and 2 billion globally. With a robust user base of 1.5 billion monthly...


  • California, Missouri, United States Amazon Full time

    Position Overview:The Reliability and Maintainability Engineer plays a crucial role in ensuring the performance and longevity of systems within Amazon's Kuiper Government Solutions (KGS). This position is focused on enhancing the reliability, availability, and maintainability of both space-based and terrestrial systems.Key Responsibilities:Lead the RAM...


  • California, United States Moonhub Full time

    Join Moonhub as a Lead Security Infrastructure EngineerWe are seeking a skilled infrastructure engineer with a deep interest in the realms of cryptocurrency and blockchain technology. If you excel as a protocol engineer, we encourage you to consider this opportunity.About the PositionIn your role as a Lead Security Infrastructure Engineer at Moonhub, your...


  • California, United States Job Board Full time

    By making evidence the heart of security, we help customers stay ahead of ever-changing cyber-attacks. Corelight is a cybersecurity company that transforms network and cloud activity into evidence. Evidence that elite defenders use to proactively hunt for threats, accelerate response to cyber incidents, gain complete network visibility, and create powerful...


  • California, United States Mphasis Full time

    Job DescriptionMphasis is seeking a highly skilled Senior Apache Solr Engineer - AWS Expert to join our team. As a key member of our search infrastructure team, you will be responsible for designing, building, and managing scalable and reliable search solutions using Apache Solr on AWS.Key ResponsibilitiesSearch Index Management: Design and implement...


  • Sacramento, California, United States Two95 International Inc. Full time

    Position: Reliability Engineering Manager Location: Remote Type: Fulltime Salary: Competitive PRIMARY RESPONSIBILITIES: The Reliability Engineering Manager will ensure that reliability strategies are integrated into overarching IT objectives and that performance expectations are clearly articulated. The manager will collaborate with both business and IT...


  • Sacramento, California, United States Two95 International Inc. Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineering Manager to join our team at Two95 International Inc. as a key member of our IT department. The successful candidate will be responsible for ensuring the reliability and efficiency of our IT systems and infrastructure.Key ResponsibilitiesReliability Program Development: Work with the...