Senior Site Reliability Engineer/DevOps

2 weeks ago


Palo Alto CA, United States black.ai Full time

Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world‑changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint.
By harnessing the laws of quantum physics, quantum computers can provide exponential performance increases over today’s most powerful supercomputers, offering the potential for extraordinary advances across a broad range of industries including climate, energy, healthcare, pharmaceuticals, finance, agriculture, transportation, materials design, and many more.
PsiQuantum has determined the fastest path to delivering a useful quantum computer, years earlier than the rest of the industry. Our architecture is based on silicon photonics which gives us the ability to produce our components at Tier‑1 semiconductor fabs such as GlobalFoundries where we leverage high‑volume semiconductor manufacturing processes, the same processes that are already producing billions of chips for telecom and consumer electronics applications. We also benefit from the quantum mechanics reality that photons don’t feel heat or electromagnetic interference, allowing us to take advantage of existing cryogenic cooling systems and industry standard fiber connectivity.
In 2024, PsiQuantum announced two government‑funded projects to support the build‑out of our first Quantum Data Centers and utility‑scale quantum computers in Brisbane, Australia and Chicago, Illinois. Both projects are backed by nations that understand quantum computing’s potential impact and the need to scale this technology to unlock that potential. And we won’t just be building the hardware, but also the fault tolerant quantum applications that will provide industry‑transforming results.
Quantum computing is not just an evolution of the decades‑old advancement in compute power. Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs/SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments.
Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation) for engineers and stakeholders.
Operate and tune our observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low‑latency telemetry ingestion and alerting.
Develop automation and self‑service tooling in Python/Bash to streamline alerts, runbooks, and operational tasks.
Collaborate with Platform and Product teams on capacity planning, performance testing, and change management.
Improve CI/CD health checks and release safety nets within GitLab.
Contribute to infrastructure as code (Terraform, Ansible) for monitoring stack deployments and upgrades.
Bachelor’s Degree or higher in Computer Science, Engineering or other related technical field.
5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.
Solid scripting/automation skills in Python and Bash; familiarity with GitLab CI pipelines.
Working knowledge of AWS services, networking fundamentals, and load balancing.
Experience running incident response and writing actionable post‑mortems.
Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.
comfortable acting as a generalist across infrastructure, application, and data layers.
PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.
Note: PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. The ranges below reflect the target ranges for a new hire base salary. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Full time roles are eligible for equity and benefits.



  • Palo Alto, CA, United States Menlo Ventures Full time

    Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more. Backed by top investors including Greylock, Norwest Venture Partners, and IVP, we’ve built a complete SaaS security platform to reduce risk, detect and respond to...


  • Palo Alto, United States Menlo Ventures Full time

    Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more. Backed by top investors including Greylock, Norwest Venture Partners, and IVP, we’ve built a complete SaaS security platform to reduce risk, detect and respond to...


  • Palo Alto, United States Menlo Ventures Full time

    Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more. Backed by top investors including Greylock, Norwest Venture Partners, and IVP, we’ve built a complete SaaS security platform to reduce risk, detect and respond to...


  • San Francisco, CA, United States Jobright.ai Full time

    Mid-Level Site Reliability/ DevOps Engineer Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Mid-Level Site Reliability/ DevOps Engineer 2 days ago Be among the first 25 applicants Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright is an AI-powered career platform that helps job seekers discover...

  • Senior Engineer, DevOps

    56 minutes ago


    Palo Alto, United States BrightAI Full time

    OverviewJoin to apply for the Senior Engineer, DevOps role at BrightAIBrightAI is a high-growth Physical-AI company transforming how businesses interact with the physical world through intelligent automation. We are building a cutting-edge AI platform that processes visual, spatial, and temporal data across billions of real-world events—from edge devices...


  • Palo Alto, United States Iopa Solutions Full time

    Overview Do you thrive at the intersection of engineering, reliability, and leadership? Want to shape the reliability strategy of a high-growth SaaS company that operates at true global scale? We’re searching for a Head of Site Reliability Engineering to take ownership of our reliability vision, build and mentor a high-performing SRE organisation, and...

  • Site Reliability Engineer

    46 minutes ago


    Palo Alto, United States FLUIX Full time

    FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based in Silicon Valley. We specialize in providing AI-driven solutions for data centers and power providers, leveraging cutting-edge Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to double America’s compute capacity...


  • Palo Alto, United States J.P. Morgan Full time

    Overview Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Senior Site Reliability Engineer at JPMorgan Chase within the (insert LOB or sub LOB), you work with your fellow stakeholders to define non-functional requirements...


  • Palo Alto, United States JPMorgan Chase & Co. Full time

    Overview Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Senior Site Reliability Engineer at JPMorgan Chase within the (insert LOB or sub LOB), youwork with your fellow stakeholders to define non-functional requirements...


  • Palo Alto, CA, United States black.ai Full time

    A leading quantum computing company is seeking a Site Reliability Engineer to join their OS/Platform team in Palo Alto. Scroll down to find an indepth overview of this job, and what is expected of candidates Make an application by clicking on the Apply button. This role involves maintaining the health and performance of services through effective monitoring...