Senior Site Reliability Engineer/DevOps

2 weeks ago

San Francisco CA, United States Cypress HCM Full time

Site Reliability Engineer
As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principles to automate operational tasks, monitor system health, and participate in incident response. This role is execution-focused, supporting the senior team in ensuring our services are available and performant while maintaining an awareness of healthcare compliance standards.
The contract extension beyond 3 months will be on the need basis.

Key Responsibilities (Contract Deliverables)
Cloud Infrastructure Execution & Maintenance:
Assist in deploying, configuring, and maintaining highly available AWS infrastructure components.
Support the operations and maintenance of existing containerized applications running on platforms like Kubernetes and Docker.
AWS Networking Support:
Assist the senior team in maintaining the security and efficiency of our core AWS network topologies (e.g., Perform basic configuration and maintenance of cloud load balancing (ALB/NLB) and DNS services (Route 53).
Support the enforcement of network security policies and assist in routine network troubleshooting for connectivity or latency issues.
Site Reliability & Automation:
Identify and automate routine manual operational tasks using scripting (Python/Bash) to improve efficiency and reduce toil.
Participate in blameless post-mortems and incident response efforts, focusing on documenting steps taken and assisting with root cause analysis.
Assist in managing and verifying CI/CD pipelines to ensure safe and efficient software releases.
Support compliance efforts by ensuring operational procedures align with security best practices and regulatory requirements (HIPAA, SOC 2).

8+ years of hands-on experience in a DevOps, Cloud Engineering, or SRE-focused technical role.
~ Solid practical experience with AWS cloud provider services.
~ Hands-on experience with Linux administration and troubleshooting.
~ Proficiency in scripting languages such as Python, Bash, etc for automation.
~ Experience with monitoring systems like Prometheus, Grafana, AWS Cloud Watch.
~ Experience with writing SQLs on PostgreSQL DB.
~ Working understanding of core networking principles (VPC, routing, load balancing).
~ Basic understanding of security and compliance requirements in a regulated industry (e.g., Relevant certifications such as AWS Certified SysOps Administrator or AWS Certified Developer.
Compensation: $72 - $90 per hour

Senior / Staff Site Reliability Engineer (SRE)

3 weeks ago

San Francisco, United States DevOps projects Full time

2025-10-25 Senior / Staff Site Reliability Engineer (SRE) Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more. Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put out customers first in...
Senior Site Reliability Engineer/DevOps

2 weeks ago

San Francisco, CA, United States ConductorOne Full time

ConductorOne is the first AI-native identity security platform that protects every identity: human, non-human, and AI. With powerful automation, platform-level AI, and out-of-the-box connectors, it centralizes access visibility, enforces fine-grained controls, enables just-in-time access, and automates user access reviews across all apps. We’re building...
Site Reliability Engineer

3 weeks ago

San Francisco, United States DevOps projects Full time

Site Reliability Engineer Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services—servicing...
Senior Site Reliability Engineer/DevOps

3 weeks ago

San Francisco, CA, United States ConductorOne Full time

We’re a hyper-creative, fast-moving team building the future of identity security. Human, non-human, and AI identity counts are exploding. ConductorOne is the answer: an AI-native platform that automates identity security at scale. We’re building something iconic, with a team of Conductors who own problems, raise the bar, and obsess about our...
Software Engineer, Site Reliability

3 weeks ago

San Francisco, United States DevOps projects Full time

Get weekly curated DevOps opportunities, salary insights, and career tips --- no spam, only relevant roles that match your stack and experience level. Software Engineer, Site Reliability Why Harvey Harvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with...
Site Reliability Engineer — AI Cloud

3 weeks ago

San Francisco, United States DevOps projects Full time

A leading tech company in San Francisco is looking for a Site Reliability Engineer to enhance system reliability and performance. This role requires 7+ years of experience in Site Reliability Engineering or DevOps, alongside strong skills in Python, Go, and monitoring tools. You will be part of a collaborative team driving improvements across cloud APIs and...
Site Reliability Engineer

4 weeks ago

Berkeley, CA, United States DevOps projects Full time

Site Reliability Engineer Have you got what it takes to succeed The following information should be read carefully by all candidates. About the Company LMArena is an engineering-first startup redefining how the world evaluates large language models. Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attracts over...
Mid-Level Site Reliability/ DevOps Engineer

3 weeks ago

San Francisco, CA, United States Jobright.ai Full time

Mid-Level Site Reliability/ DevOps Engineer Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.ai Mid-Level Site Reliability/ DevOps Engineer 2 days ago Be among the first 25 applicants Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.ai Jobright is an AI-powered career platform that helps job...
Site Reliability Engineer

4 weeks ago

San Francisco, CA, United States DevOps projects Full time

Site Reliability Engineer About HappyRobot Read on to find out what you will need to succeed in this position, including skills, qualifications, and experience. HappyRobot is a platform to build and deploy AI workers that automate communication. See a demo Our AI workers connect to any system or data source to handle phone calls, email, messages… We target...
Mid-Level Site Reliability/ DevOps Engineer

2 weeks ago

San Francisco, United States Jobright.ai Full time

Mid-Level Site Reliability/ DevOps EngineerJoin to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.aiMid-Level Site Reliability/ DevOps Engineer2 days ago Be among the first 25 applicantsJoin to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.aiJobright is an AI-powered career platform that helps job...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer/DevOps