Cloud Reliability Engineer

3 weeks ago


Bellevue, Washington, United States PeopleConnect Full time
Job Overview

We are seeking a highly skilled Senior Site Reliability Engineer to join our team at PeopleConnect, Inc. This role will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.

The ideal candidate will have a strong background in cloud technologies, automation, and performance optimization, with excellent troubleshooting skills and experience in performance optimization and root cause analysis.

This is a hybrid position requiring 2-3 days in the office located in Bellevue, WA. Local area candidates are encouraged to apply, and please note we are not able to offer visa sponsorship, visa transfer, or corp-corp arrangements.

Key Responsibilities:
  1. Cloud Strategy and Architecture:
  • Provide thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture.
  • Analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.
  • Drive orchestration efforts for cloud services, design self-service aspects, and stay updated with emerging cloud technologies.
  1. Infrastructure Automation and Design:
  • Collaborate on designing, building, and maintaining scalable infrastructure across cloud and on-prem environments.
  • Automate provisioning and configuration using tools like Terraform, Terragrunt, and Puppet.
  • Develop automation scripts, maintain CI/CD pipelines, and plan for scalability and capacity, conducting load testing as needed.
  1. Reliability and Performance Engineering:
  • Ensure system reliability, availability, and performance through monitoring, alerting, and incident response.
  • Implement and manage SLOs/SLIs to meet reliability standards.
  • Identify and address performance bottlenecks across the infrastructure and application stack.
  • Build and maintain observability solutions (e.g., monitoring, logging, and tracing) and improve system health dashboards.
  1. Security and Compliance:
  • Implement security measures for Cloud Native applications and ensure compliance with industry standards (SOC2, PCI, etc).
  • Collaborate with security teams to audit and monitor systems, continuously updating security configurations and dashboards.
  1. Incident Management and Root Cause Analysis:
  • Participate in on-call rotations to provide 24/7 support for production environment.
  • Lead incident response activities and perform root cause analysis to prevent recurring incidents.
  • Conduct and document post-incident retrospectives (postmortems) to drive continuous improvement.
  • Create and Maintain runbooks and operational documentation for continuous improvement.
  • Proactively test system resilience through Chaos Engineering experiments and failure injection.
  1. Disaster Recovery and Business Continuity:
  • Design and test disaster recovery (DR) and business continuity strategies, ensuring backup and failover mechanisms are effective.
  1. Cost Management and Financial Optimization:
  • Monitor cloud usage and implement financial optimization practices (FinOps) to control infrastructure costs.
  • Collaborate with stakeholders to drive financial efficiency.
  1. Collaboration, Knowledge Sharing, and Communication:
  • Collaborate across teams to ensure alignment and effective project implementation.
  • Communicate during incidents and changes, providing transparency to stakeholders.
  • Mentor and share knowledge with team members to foster a collaborative and continuous learning environment.
  • Maintain comprehensive documentation of system configurations, processes, and best practices.

Qualifications:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 5+ years of experience as a Site Reliability Engineer or in a similar role, working with highly available and production environments.
  • Proficiency in AWS and containerization technologies like Kubernetes and Docker.
  • Strong experience with Infrastructure as Code (IaC) using Terraform, with automation scripting skills in Python, Bash/Shell, or Go.
  • Deep knowledge of Linux/Unix systems and networking fundamentals (e.g., TCP/IP, DNS, HTTP, VPN).
  • Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana) and incident management.
  • Familiarity with CI/CD pipelines, preferably using tools like GitLab, and strong knowledge of DevOps practices.
  • Excellent troubleshooting skills, with experience in performance optimization and root cause analysis.
  • Strong communication and collaboration skills.
  • Bonus skills: experience with Rundeck, Java, Spring Framework, Terragrunt, Puppet, Vector, Loki, VictoriaMetrics, and additional cloud platforms (e.g., GCP, Azure), as well as relevant certifications such as AWS Solutions Architect or Certified Kubernetes Administrator (CKA).

Estimated Salary Range: $152,700 - $190,600



  • Bellevue, Washington, United States PeopleConnect Full time

    Job ResponsibilitiesCloud Strategy and Architecture: Provide thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture. Analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.Infrastructure Automation and Design: Collaborate on...


  • Bellevue, Washington, United States Omni Inclusive Full time

    Omni InclusiveWe are seeking a highly skilled Cloud Reliability Specialist to join our team. As a key member of our cloud reliability team, you will be responsible for designing and implementing reliable cloud infrastructure solutions that meet the needs of our business.Salary: $100,000 - $150,000 per yearAbout the JobThis is a challenging and rewarding role...


  • Bellevue, Washington, United States PeopleConnect Full time

    About the RoleThis is an exciting opportunity to work with a leading online social platform. As a Senior Site Reliability Engineer, you will be responsible for providing thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture.You will analyze and implement cloud services to meet business goals,...


  • Bellevue, Washington, United States ZipRecruiter Full time

    About the RoleWe are seeking a highly skilled Reliability Engineering Expert to join our team at ZipRecruiter. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.In this role, you will work closely with cross-functional...

  • Cloud Engineer

    3 weeks ago


    Bellevue, Washington, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Cloud Engineer to join our team at Amazon. As a Cloud Engineer, you will be responsible for designing, developing, and maintaining cloud-based systems that meet the needs of our business.The ideal candidate will have 5+ years of experience in systems design, software development, operations, automation, and...


  • Bellevue, Washington, United States PeopleConnect Full time

    Job Summary:As a Senior Site Reliability Engineer at PeopleConnect, you will be responsible for designing and implementing the infrastructure and systems necessary to support our applications and services. Your expertise in cloud technologies, automation, and performance optimization will be key to the success of our engineering and operations efforts.The...


  • Bellevue, Washington, United States People Connect USA Full time

    About the RoleAs a Sr Site Reliability Engineer, you will be part of a high-performing team that focuses on delivering scalable and reliable solutions. You will analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.You will also collaborate on designing, building, and maintaining scalable...


  • Bellevue, Washington, United States Statsig, Inc Full time

    About the OpportunityWe are seeking a highly skilled Cloud Architect and Data Engineer to join our team at Statsig, Inc. in Bellevue, WA. As a key member of our data engineering team, you will be responsible for designing and implementing large-scale cloud-based data architectures using technologies such as BigQuery, Spark, and modern cloud platforms.Your...


  • Bellevue, Washington, United States Snowflake Computing Full time

    Snowflake Computing is a fast-growing company that requires talented individuals to help accelerate its growth.We are seeking a highly skilled Senior Software Engineer- Cloud Engineering to join our team. The ideal candidate will have at least 6+ years of experience in building and supporting mission-critical services and infrastructure in a SaaS...


  • Bellevue, Washington, United States Statsig, Inc Full time

    About the RoleWe are seeking a skilled Data Engineer for Cloud Infrastructure to join our team at Statsig, Inc. in Bellevue, WA. As a key member of our data engineering team, you will play a vital role in ensuring the scalability, reliability, and efficiency of our data pipelines and computations.In this role, you will be responsible for designing,...


  • Bellevue, Washington, United States iSpot Full time

    About the OpportunityiSpot is a leading company in the ad tech industry, and we are seeking a Cloud-Native Architecture Engineer to join our team. As a key member of our engineering organization, you will play a critical role in shaping the technical direction of our data measurement platform.The ideal candidate will have a strong background in cloud-native...

  • Reliability Architect

    3 weeks ago


    Bellevue, Washington, United States People Connect USA Full time

    Job Description:We are seeking an experienced Sr Site Reliability Engineer to join our team at PeopleConnect USA. The successful candidate will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.The ideal candidate will have a strong background in cloud technologies,...


  • Bellevue, Washington, United States Snowflake Computing Full time

    **Unlock the Power of the Data Cloud**Snowflake Computing is revolutionizing the way businesses interact with data. As a Senior Cloud Architect, you will play a critical role in shaping the architecture and design of our Data Cloud platform.This is an exciting opportunity to work with cutting-edge cloud technology and make a significant impact on our...

  • Cloud Engineer

    3 weeks ago


    Bellevue, Washington, United States Amazon Full time

    About the JobWe are seeking a highly skilled Cloud Engineer to join our team at Amazon. As a Cloud Engineer, you will be responsible for designing and building large-scale cloud infrastructure that meets the needs of our customers.Job ResponsibilitiesDesign and implement scalable and secure cloud architecturesCollaborate with cross-functional teams to ensure...


  • Bellevue, Washington, United States DevSelect Full time

    **Job Title:** Cloud Engineering LeadWe are seeking a skilled Cloud Engineering Lead to join our team at DevSelect. As a key member of our organization, you will be responsible for leading the development and implementation of large-scale cloud-based solutions on AWS.The ideal candidate will have extensive experience in cloud engineering, with a strong...

  • Cloud Data Engineer

    3 weeks ago


    Bellevue, Washington, United States Databricks Full time

    About the RoleWe are seeking an experienced Cloud Data Engineer to join our Money team at Databricks. As a key member of our engineering organization, you will play a critical role in designing and managing our data platforms, ensuring they meet the needs of our customers.Key ResponsibilitiesDesign and implement scalable data pipelines to process large...


  • Bellevue, Washington, United States Amazon Full time

    As a Cloud-Scale Log Analytics Engineer at Amazon, you will have the opportunity to design and build a cloud-scale log analytics and search platform. This platform will enable customers to manage and derive insights from vast volumes of data in the cloud.AWS OpenSearch Service makes it easy to deploy, operate, and scale Elasticsearch for log analytics,...

  • Cloud Engineer

    3 weeks ago


    Bellevue, Washington, United States T-Mobile Full time

    Job OverviewSenior Engineer responsible for certifying telecommunications applications on the Magenta Cloud Platform. This includes supporting integration into cloud and orchestration platforms, collaborating with application owners, and verifying the cloud platform meets each application requirement. Key responsibilities include MCP Orchestration and...


  • Bellevue, Washington, United States T-Mobile Full time

    About the RoleThis is an exciting opportunity to join T-Mobile's cloud team as a Hardware Engineer Lead. You will oversee the hardware design strategy, in-rack power, and thermal requirements. Your recommendations will drive new products and overall hardware architecture approach by partnering with senior-level engineers running their workloads and...


  • Bellevue, Washington, United States Snowflake Computing Full time

    Job Title: Cloud Security ArchitectWe are looking for a Cloud Security Architect to join our team at Snowflake Computing. This is an exciting opportunity to lead highly impactful initiatives and design highly available, reliable, and secured distributed services.As a Cloud Security Architect, you will:Lead the development of secure cloud-based...