Cloud Operations Engineer

6 days ago


San Diego, California, United States Platform Science Full time
About the Role

We are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.

Key Responsibilities
  • Develop and Enhance CI/CD Pipelines: Design and implement Continuous Integration/Continuous Deployment (CI/CD) pipelines to streamline our software development lifecycle.
  • Maintain Helm Charts: Ensure the accuracy and efficiency of Helm charts to manage application deployment and management.
  • Establish Observability Solutions: Implement standardized observability solutions to empower development teams in efficiently managing their applications.
  • Lead Reliability Efforts: Drive the achievement of uptime goals and mentor colleagues in SRE best practices.
  • Conduct Production Readiness Reviews: Work with teams to identify and establish Service Level Indicators and Service Level Objectives (SLIs/SLOs).
  • Design Software Solutions: Develop software solutions to address operational challenges effectively and improve system stability and reliability.
  • On-Call Duties: Provide expert support to development teams for mission-critical applications in production environments.
  • Improve Resiliency: Use chaos engineering to improve the resiliency of applications and systems.
Requirements
  • 5+ Years of SRE Experience: Possess hands-on experience in SRE or Platform Engineering roles.
  • Automation Expertise: Demonstrated expertise with automation technologies like Jenkins, ArgoCD, or similar.
  • Kubernetes Experience: Experience with Kubernetes (2+ years), Helm, and Docker within production environments.
  • Software Development Lifecycle: Proficiency with current software development lifecycle (SDLC) concepts and best practices, CI/CD pipelines, and test-driven development.
  • AWS Experience: Experience with AWS, encompassing proficiency in EKS, IAM, autoscaling, networking, and load balancing/request routing in a production environment.
  • Programming Skills: Proficient in Python, Bash, Nodejs, and/or Go.
  • Distributed Tracing: Proficient with distributed tracing methodologies and observability tools such as Prometheus, ELK, or Datadog.
  • Documentation and Knowledge-Sharing: Strong emphasis on documentation and fostering knowledge-sharing practices within the team and organization.
  • Mentoring and Training: Track record of successfully training and mentoring engineers.
  • Cloud Optimization: Proven expertise in optimizing performance and managing costs within cloud environments.
  • SLI/SLO Concepts: Sound understanding of SLI/SLO concepts and adherence to SRE best practices.
  • Education: Bachelors in Computer Science or related field.


  • San Diego, California, United States ServiceNow Full time

    About the RoleWe are seeking a highly skilled Cloud Operations Engineer to join our team at ServiceNow. As a key member of our System Administration team, you will be responsible for the administration and operations of our global cloud infrastructure.Key ResponsibilitiesContribute to Configuration Management and Infrastructure as Code for our global private...


  • San Diego, California, United States Matphil Technologies, Inc. Full time

    Cloud Operations Engineer at Matphil Technologies, Inc.About: Matphil Technologies, Inc. specializes in cloud-based solutions for asset management and service assurance across various sectors, including life sciences and precision instrument calibration. We foster a dynamic work atmosphere that promotes innovation and creative problem-solving.Key...


  • San Diego, California, United States Catapult Solutions Group Full time

    Job Title: DevOps Engineer - API Gateway InitiativeDepartment: EngineeringLocation: RemoteRole Type: Full-TimeAbout Catapult Solutions GroupCatapult Solutions Group is a prominent technology firm recognized for its innovative software offerings and robust technology frameworks. Operating globally, the organization specializes in developing state-of-the-art...


  • San Diego, California, United States Booz Allen Hamilton Full time

    Position Overview:As a Cloud Infrastructure Engineer, you will play a pivotal role in leveraging cloud technologies to enhance our clients' IT frameworks. Your expertise will guide organizations in effectively utilizing cloud resources to meet their operational goals. Join our dedicated team of professionals as we work to ensure the safety of our national...


  • San Diego, California, United States Yoh Full time

    Job SummaryYoh, a Day & Zimmermann company, is seeking a highly skilled Reliability Engineer - Cloud Infrastructure to join our team. As a key member of our cloud operations team, you will be responsible for designing, building, and operating scalable and secure cloud infrastructure to support our clients' business needs.Key ResponsibilitiesDesign and...


  • San Diego, California, United States Computer Technologies Consultants Full time

    Computer Technologies Consultants (CTC) is on the lookout for a skilled Cloud Security Operations Engineer to contribute to the development of automated deployment pipelines for cloud infrastructures and assist in the automation of build, testing, and release processes. The chosen candidate will play a crucial role in our team, focusing on promoting and...


  • San Diego, California, United States CTC Full time

    Position OverviewComputer Technologies Consultants (CTC) is in search of a skilled Cloud Security Operations Engineer to contribute to the creation of automated deployment pipelines tailored for cloud infrastructures. This role is pivotal in our mission to enhance user engagement with Analytic Modeling capabilities, ultimately bolstering our analytic...


  • San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based platform.Key ResponsibilitiesDevelop and enhance Continuous Integration/Continuous Deployment (CI/CD)...


  • San Diego, California, United States Cloud Analytics Technologies LLC Full time

    Job SummaryWe are seeking a highly skilled Network Consulting Engineer to join our team at Cloud Analytics Technologies LLC. As a Network Consulting Engineer, you will be responsible for designing and implementing cloud-based network solutions for our clients.Key ResponsibilitiesDesign and implement cloud-based network architecturesCollaborate with...


  • San Diego, California, United States Insight Global Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Insight Global. As a key member of our SRE team, you will play a critical role in ensuring the high availability and performance of our database infrastructure.Key ResponsibilitiesCollaborate with cross-functional teams to design and implement scalable and...


  • San Francisco, California, United States Promote Project Full time

    About the RoleWe are seeking a highly skilled Cloud Operations Engineer to join our team at Promote Project. As a Cloud Operations Engineer, you will play a critical role in designing, building, and maintaining our cloud infrastructure to ensure high availability, scalability, and performance.Key ResponsibilitiesDesign and implement cloud infrastructure...


  • San Francisco, California, United States ViralMoment Full time

    Job Overview Position: Site Reliability Engineer Location: Remote About ViralMoment: ViralMoment stands at the forefront of AI-driven social media analysis, focusing on the evaluation of social videos to uncover trending themes and deliver actionable insights to brands and agencies. Our objective is to equip our clients with state-of-the-art AI solutions to...


  • San Diego, California, United States Amazon Full time

    About Amazon EC2Amazon Elastic Compute Cloud (Amazon EC2) stands at the forefront of cloud technology, delivering scalable and dependable solutions to enterprises globally. Core FeaturesAmazon EC2 encompasses a variety of functionalities including Amazon Elastic Block Store (Amazon EBS) volumes, virtual private clouds (VPCs), security groups, elastic IP...


  • San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Platform Science. As a key member of our cloud operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.Key ResponsibilitiesDevelop and Enhance CI/CD Pipelines: Design and implement Continuous...


  • San Diego, California, United States SAIC Full time

    SAIC stands as a leading integrator in Joint All Domain Command and Control (JADC2), specializing in the modernization of traditional command and control systems into advanced hybrid cloud infrastructures that enhance decision-making and boost mission effectiveness. This initiative is part of a larger strategy aimed at delivering a tactical operational...


  • San Diego, California, United States Brain Corp Full time

    Vacancy at Brain CorpBrain Corp is an innovative AI company based in San Diego, California, USA. We focus on developing cutting-edge technology for the robotics industry with the mission to improve real-world operations. Our AI solutions play a crucial role in helping retailers maintain optimal product placement and cleanliness on shelves at competitive...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Various locations (Onsite)Candidate Requirements: Must have experience from GoogleKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9 yearsContact: Annu Tiwari | Senior...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Remote options availableCandidate Requirements: Previous experience at Google preferredKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9 yearsContact: Annu Tiwari | Senior...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Remote (with potential onsite requirements)Preferred Background: Candidates with previous experience at GoogleKey Competencies:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience Level: Maximum of 9...


  • San Jose, California, United States Amaze Systems Inc. Full time

    Job OverviewPosition: Cloud Operations EngineerLocation: Multiple locations available (Onsite)Eligibility: Candidates with prior experience at Google are preferred.Required Skills:DevOps methodologiesSite Reliability Engineering (SRE)Google Cloud Platform (GCP)Kubernetes orchestrationContainerization with DockerExperience: Maximum of 9 years in relevant...