Cloud Reliability Engineer
3 weeks ago
We are seeking a highly skilled Senior Site Reliability Engineer to join our team at PeopleConnect, Inc. This role will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.
The ideal candidate will have a strong background in cloud technologies, automation, and performance optimization, with excellent troubleshooting skills and experience in performance optimization and root cause analysis.
This is a hybrid position requiring 2-3 days in the office located in Bellevue, WA. Local area candidates are encouraged to apply, and please note we are not able to offer visa sponsorship, visa transfer, or corp-corp arrangements.
Key Responsibilities:- Cloud Strategy and Architecture:
- Provide thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture.
- Analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.
- Drive orchestration efforts for cloud services, design self-service aspects, and stay updated with emerging cloud technologies.
- Infrastructure Automation and Design:
- Collaborate on designing, building, and maintaining scalable infrastructure across cloud and on-prem environments.
- Automate provisioning and configuration using tools like Terraform, Terragrunt, and Puppet.
- Develop automation scripts, maintain CI/CD pipelines, and plan for scalability and capacity, conducting load testing as needed.
- Reliability and Performance Engineering:
- Ensure system reliability, availability, and performance through monitoring, alerting, and incident response.
- Implement and manage SLOs/SLIs to meet reliability standards.
- Identify and address performance bottlenecks across the infrastructure and application stack.
- Build and maintain observability solutions (e.g., monitoring, logging, and tracing) and improve system health dashboards.
- Security and Compliance:
- Implement security measures for Cloud Native applications and ensure compliance with industry standards (SOC2, PCI, etc).
- Collaborate with security teams to audit and monitor systems, continuously updating security configurations and dashboards.
- Incident Management and Root Cause Analysis:
- Participate in on-call rotations to provide 24/7 support for production environment.
- Lead incident response activities and perform root cause analysis to prevent recurring incidents.
- Conduct and document post-incident retrospectives (postmortems) to drive continuous improvement.
- Create and Maintain runbooks and operational documentation for continuous improvement.
- Proactively test system resilience through Chaos Engineering experiments and failure injection.
- Disaster Recovery and Business Continuity:
- Design and test disaster recovery (DR) and business continuity strategies, ensuring backup and failover mechanisms are effective.
- Cost Management and Financial Optimization:
- Monitor cloud usage and implement financial optimization practices (FinOps) to control infrastructure costs.
- Collaborate with stakeholders to drive financial efficiency.
- Collaboration, Knowledge Sharing, and Communication:
- Collaborate across teams to ensure alignment and effective project implementation.
- Communicate during incidents and changes, providing transparency to stakeholders.
- Mentor and share knowledge with team members to foster a collaborative and continuous learning environment.
- Maintain comprehensive documentation of system configurations, processes, and best practices.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 5+ years of experience as a Site Reliability Engineer or in a similar role, working with highly available and production environments.
- Proficiency in AWS and containerization technologies like Kubernetes and Docker.
- Strong experience with Infrastructure as Code (IaC) using Terraform, with automation scripting skills in Python, Bash/Shell, or Go.
- Deep knowledge of Linux/Unix systems and networking fundamentals (e.g., TCP/IP, DNS, HTTP, VPN).
- Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana) and incident management.
- Familiarity with CI/CD pipelines, preferably using tools like GitLab, and strong knowledge of DevOps practices.
- Excellent troubleshooting skills, with experience in performance optimization and root cause analysis.
- Strong communication and collaboration skills.
- Bonus skills: experience with Rundeck, Java, Spring Framework, Terragrunt, Puppet, Vector, Loki, VictoriaMetrics, and additional cloud platforms (e.g., GCP, Azure), as well as relevant certifications such as AWS Solutions Architect or Certified Kubernetes Administrator (CKA).
Estimated Salary Range: $152,700 - $190,600
-
Cloud Reliability and Performance Engineer
3 weeks ago
Bellevue, Washington, United States PeopleConnect Full timeJob ResponsibilitiesCloud Strategy and Architecture: Provide thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture. Analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.Infrastructure Automation and Design: Collaborate on...
-
Cloud Reliability Specialist
4 weeks ago
Bellevue, Washington, United States Omni Inclusive Full timeOmni InclusiveWe are seeking a highly skilled Cloud Reliability Specialist to join our team. As a key member of our cloud reliability team, you will be responsible for designing and implementing reliable cloud infrastructure solutions that meet the needs of our business.Salary: $100,000 - $150,000 per yearAbout the JobThis is a challenging and rewarding role...
-
Cloud Reliability Specialist
2 weeks ago
Bellevue, Washington, United States PeopleConnect Full timeAbout the RoleThis is an exciting opportunity to work with a leading online social platform. As a Senior Site Reliability Engineer, you will be responsible for providing thought leadership, mentorship, and technical vision related to site reliability, DevOps, and a 'cloud-first' culture.You will analyze and implement cloud services to meet business goals,...
-
Reliability Engineering Expert
3 weeks ago
Bellevue, Washington, United States ZipRecruiter Full timeAbout the RoleWe are seeking a highly skilled Reliability Engineering Expert to join our team at ZipRecruiter. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.In this role, you will work closely with cross-functional...
-
Cloud Engineer
3 weeks ago
Bellevue, Washington, United States Amazon Full timeAbout the RoleWe are seeking a highly skilled Cloud Engineer to join our team at Amazon. As a Cloud Engineer, you will be responsible for designing, developing, and maintaining cloud-based systems that meet the needs of our business.The ideal candidate will have 5+ years of experience in systems design, software development, operations, automation, and...
-
Site Reliability Engineer Position
2 weeks ago
Bellevue, Washington, United States PeopleConnect Full timeJob Summary:As a Senior Site Reliability Engineer at PeopleConnect, you will be responsible for designing and implementing the infrastructure and systems necessary to support our applications and services. Your expertise in cloud technologies, automation, and performance optimization will be key to the success of our engineering and operations efforts.The...
-
Site Reliability Engineer Lead
4 weeks ago
Bellevue, Washington, United States People Connect USA Full timeAbout the RoleAs a Sr Site Reliability Engineer, you will be part of a high-performing team that focuses on delivering scalable and reliable solutions. You will analyze and implement cloud services to meet business goals, focusing on cost optimizations, efficiencies, and scalability.You will also collaborate on designing, building, and maintaining scalable...
-
Cloud Architect and Data Engineer
3 weeks ago
Bellevue, Washington, United States Statsig, Inc Full timeAbout the OpportunityWe are seeking a highly skilled Cloud Architect and Data Engineer to join our team at Statsig, Inc. in Bellevue, WA. As a key member of our data engineering team, you will be responsible for designing and implementing large-scale cloud-based data architectures using technologies such as BigQuery, Spark, and modern cloud platforms.Your...
-
Cloud Engineering Expert
3 weeks ago
Bellevue, Washington, United States Snowflake Computing Full timeSnowflake Computing is a fast-growing company that requires talented individuals to help accelerate its growth.We are seeking a highly skilled Senior Software Engineer- Cloud Engineering to join our team. The ideal candidate will have at least 6+ years of experience in building and supporting mission-critical services and infrastructure in a SaaS...
-
Data Engineer for Cloud Infrastructure
3 weeks ago
Bellevue, Washington, United States Statsig, Inc Full timeAbout the RoleWe are seeking a skilled Data Engineer for Cloud Infrastructure to join our team at Statsig, Inc. in Bellevue, WA. As a key member of our data engineering team, you will play a vital role in ensuring the scalability, reliability, and efficiency of our data pipelines and computations.In this role, you will be responsible for designing,...
-
Cloud-Native Architecture Engineer
2 weeks ago
Bellevue, Washington, United States iSpot Full timeAbout the OpportunityiSpot is a leading company in the ad tech industry, and we are seeking a Cloud-Native Architecture Engineer to join our team. As a key member of our engineering organization, you will play a critical role in shaping the technical direction of our data measurement platform.The ideal candidate will have a strong background in cloud-native...
-
Reliability Architect
3 weeks ago
Bellevue, Washington, United States People Connect USA Full timeJob Description:We are seeking an experienced Sr Site Reliability Engineer to join our team at PeopleConnect USA. The successful candidate will be responsible for designing, implementing, and maintaining the infrastructure and systems necessary to support our applications and services.The ideal candidate will have a strong background in cloud technologies,...
-
Data Cloud Infrastructure Engineer
3 weeks ago
Bellevue, Washington, United States Snowflake Computing Full time**Unlock the Power of the Data Cloud**Snowflake Computing is revolutionizing the way businesses interact with data. As a Senior Cloud Architect, you will play a critical role in shaping the architecture and design of our Data Cloud platform.This is an exciting opportunity to work with cutting-edge cloud technology and make a significant impact on our...
-
Cloud Engineer
3 weeks ago
Bellevue, Washington, United States Amazon Full timeAbout the JobWe are seeking a highly skilled Cloud Engineer to join our team at Amazon. As a Cloud Engineer, you will be responsible for designing and building large-scale cloud infrastructure that meets the needs of our customers.Job ResponsibilitiesDesign and implement scalable and secure cloud architecturesCollaborate with cross-functional teams to ensure...
-
Cloud Engineering Lead
4 weeks ago
Bellevue, Washington, United States DevSelect Full time**Job Title:** Cloud Engineering LeadWe are seeking a skilled Cloud Engineering Lead to join our team at DevSelect. As a key member of our organization, you will be responsible for leading the development and implementation of large-scale cloud-based solutions on AWS.The ideal candidate will have extensive experience in cloud engineering, with a strong...
-
Cloud Data Engineer
3 weeks ago
Bellevue, Washington, United States Databricks Full timeAbout the RoleWe are seeking an experienced Cloud Data Engineer to join our Money team at Databricks. As a key member of our engineering organization, you will play a critical role in designing and managing our data platforms, ensuring they meet the needs of our customers.Key ResponsibilitiesDesign and implement scalable data pipelines to process large...
-
Cloud-Scale Log Analytics Engineer
3 weeks ago
Bellevue, Washington, United States Amazon Full timeAs a Cloud-Scale Log Analytics Engineer at Amazon, you will have the opportunity to design and build a cloud-scale log analytics and search platform. This platform will enable customers to manage and derive insights from vast volumes of data in the cloud.AWS OpenSearch Service makes it easy to deploy, operate, and scale Elasticsearch for log analytics,...
-
Cloud Engineer
3 weeks ago
Bellevue, Washington, United States T-Mobile Full timeJob OverviewSenior Engineer responsible for certifying telecommunications applications on the Magenta Cloud Platform. This includes supporting integration into cloud and orchestration platforms, collaborating with application owners, and verifying the cloud platform meets each application requirement. Key responsibilities include MCP Orchestration and...
-
T-Mobile Cloud Hardware Engineer Lead
3 weeks ago
Bellevue, Washington, United States T-Mobile Full timeAbout the RoleThis is an exciting opportunity to join T-Mobile's cloud team as a Hardware Engineer Lead. You will oversee the hardware design strategy, in-rack power, and thermal requirements. Your recommendations will drive new products and overall hardware architecture approach by partnering with senior-level engineers running their workloads and...
-
Cloud Security Architect
3 weeks ago
Bellevue, Washington, United States Snowflake Computing Full timeJob Title: Cloud Security ArchitectWe are looking for a Cloud Security Architect to join our team at Snowflake Computing. This is an exciting opportunity to lead highly impactful initiatives and design highly available, reliable, and secured distributed services.As a Cloud Security Architect, you will:Lead the development of secure cloud-based...