Staff/Principal Site Reliability Engineer

3 weeks ago

San Francisco, United States Veza Full time

Staff/Principal Site Reliability Engineer We are seeking an exceptional Staff/Principal Site Reliability Engineer to lead critical infrastructure initiatives and drive Innovation across our organization. You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results under tight deadlines in a fast paced environment. You will work cross‑functionally alongside builders who have helped shape the success of companies such all ways as Google, Okta, AWS, and Snowflake. Strategic Leadership & Technical Execution Lead enterprise‑wide reliability and infrastructure projects across multiple teams with high autonomy Navigate ambiguous problem spaces and deliver innovative solutions under tight deadlines Architect and deploy solutions for Cloud Prem and SaaS customers at scale Drive technical innovation and establish SRE best practices across the organization Respond to critical incidents, lead root cause analysis, and implement long‑term resolutions Develop automation solutions to streamline operations and reduce manual workload Participate in on‑call rotation and ensure effective incident handoff and documentation Cross‑Functional Collaboration & Communication Partner with Engineering, Product, and Customer Success teams to align reliability goals with business objectives Communicate complex technical concepts effectively to technical and non‑technical audiences, including executives Influence technical decisions across teams through thought leadership and demonstrated expertise Build consensus and Drive adoption of new tools, processes, and architectural patterns Customer‑Facing Technical Leadership Provide tier 2/3 technical support to enterprise customers for complex troubleshooting Work directly with customer technical teams to resolve deployment, configuration, and integration challenges Conduct technical onboarding and provide expert guidance on platform architecture and best practices Create customer‑facing documentation, troubleshooting guides, and run‑books Lead customer calls and technical discussions as a trusted advisor Team Development Mentor SRE and engineering team members, elevating technical capabilities Foster a culture of reliability, operational excellence, and continuous improvement You have: Required Experience BS degree in Computer Science or related field (or equivalent practical experience) 7+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering Proven track record leading large‑scale, cross‑team infrastructure projects from conception to production Demonstrated ability to work autonomously on ambiguous projects with tight deadlines Technical Expertise 5+ years with AWS (VPC, EC2, RDS, EKS, CloudFormation) and cloud automation Expert‑level experience with Kubernetes, Helm, Linux, and Terraform Strong experience with GitOps model, distributed version control, and CI/CD pipelines Proficiency with monitoring tools (Prometheus, Grafana, DataDog) Strong programming/scripting skills (Python, Go, Bash) for automation Deep understanding of distributed systems, microservices, and reliability patterns Experience with Bazel and CueLang a plus Leadership & Communication Exceptional ability to articulate complex technical concepts to diverse audiences Track record of Driving technical change across organizational boundaries Successfully Delivered multiple complex projects under tight deadlines Strong customer service orientation with patience and empathy Work Style Thrives in ambiguous environments and makes progress without perfect information Hands‑on, "can do" attitude with bias for action Low ego and high intellectual curiosity Comfortable working across time zones Self‑motivated with strong ownership mentality Compensation Disclosure $184,000—$240,000 USD Compensation depends on skills, qualifications, experience, and work location. Variable compensation such as commission is not included. Our Culture Ownership Mindset Act with Integrity Guardians of our Customers Opinionated Humility Build Trust, Earn Trust Veza is proud to be an equal opportunity employer. We are committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, or other applicable legally protected characteristics. We also consider qualified applicants according to applicable federal, state, and local laws. If a candidate with a disability requires an accommodation during the recruitment process, please email recruiting@veza.com. #J-18808-Ljbffr

Staff/Principal Site Reliability Engineer

2 weeks ago

San Francisco, CA, United States Veza Full time

Staff/Principal Site Reliability Engineer We are seeking an exceptional Staff/Principal Site Reliability Engineer to lead critical infrastructure initiatives and drive Innovation across our organization. You will work crossfunctionally alongside builders who have helped shape the success of companies such all ways as Google, Okta, AWS, and Snowflake. ...
Principal Site Reliability Engineer

2 weeks ago

San Francisco, United States Harrison Clarke Full time

Harrison Clarke are working with several high profile companies that are seeking a Principal Site Reliability Engineer (SRE), to lead the design, implementation, and scaling of the infrastructure and systems that support their products. The ideal candidate should have extensive experience in designing highly scalable infrastructure, building systems, and...
Principal Site Reliability Engineer

6 days ago

San Francisco, California, United States Harrison Clarke Full time $120,000 - $180,000 per year

Harrison Clarke are working with several high profile companies that are seeking aPrincipal Site Reliability Engineer (SRE), to lead the design, implementation, and scaling of the infrastructure and systems that support their products.The ideal candidate should have extensive experience in designing highly scalable infrastructure, building systems, and...
Staff/Lead/Senior/Principal Site Reliability Engineer

6 days ago

San Francisco, California, United States Relevance AI Full time $180,000 - $300,000 per year

Location : San Francisco, USA (Hybrid 3 days/week)About Us At Relevance AI, our mission is to empower anyone to delegate work to the AI workforce. We're building a new category of AI automation, enabling teams to create and deploy intelligent AI agents that replicate human-quality work, decision-making, and collaboration at scale.We're scaling fast backed by...
Staff Engineer, Site Reliability

3 weeks ago

San Francisco, United States Zapier Full time

About ZapierZapier is building a platform to help millions of businesses globally scale with automation and AI. Our mission is to make automation work for everyone by delivering products that delight our customers. You’ll collaborate with brilliant people, use the latest tools, and leverage the flexibility of remote work. Your work will directly fuel our...
Senior / Principal Site Reliability Engineer

2 weeks ago

San Francisco, United States Datacrunch Full time

Imagine a future where everyone has instant, low-cost access to intelligence. We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI models. In addition, our GPUs run on 100% renewable energy. We’re ambitious, curious, and gutsy doers. We practice a low hierarchy across the company and high...
Staff Site Reliability Engineer

6 days ago

San Francisco, California, United States Heartflow Full time $185,750 - $250,922 per year

Heartflow is a medical technology company advancing the diagnosis and management of coronary artery disease, the #1 cause of death worldwide, using cutting-edge technology. The flagship product—an AI-driven, non-invasive cardiac test supported by the ACC/AHA Chest Pain Guidelines called the Heartflow FFRCT Analysis—provides a color-coded, 3D model of a...
Principal Site Reliability Engineer

2 weeks ago

San Francisco, CA, United States Harrison Clarke Full time

Harrison Clarke are working with several high profile companies that are seeking a Principal Site Reliability Engineer (SRE) , to lead the design, implementation, and scaling of the infrastructure and systems that support their products. The ideal candidate should have extensive experience in designing highly scalable infrastructure, building systems, and...
Principal Site Reliability Operations Engineer

22 hours ago

San Mateo, United States Roblox Full time

Principal Site Reliability Operations Engineer San Mateo, CA, United StatesEngineeringID: 5649Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. At Roblox, we’re building the tools and platform...
Senior Staff Site Reliability Engineer

6 days ago

San Francisco, United States WEX Full time

About the Team & RoleWe are looking for a highly motivated and high-potential Senior Staff Site Reliability Engineer (SRE) to join our team as a senior technical leader, driving transformational change and delivering significant business impact across WEX’s platform ecosystem.This is a truly exciting moment to be part of the SRE organization at WEX. Our...

Americas

Europe

Asia / Oceania

Africa

Staff/Principal Site Reliability Engineer