Product SRE Site Reliability Engineer

4 weeks ago


San Francisco, United States Convergenz Full time

Product SRE

  • Location: Must be in Bay Area, CA orSeattle and willing to go in to office a few times a week
  • On-Call Expectation: 24/7 every 4-8 weeks
  • Primary: Application-level Debugging & Troubleshooting, Linux Admin, Cloud Native Monitoring & Admin, Shell Scripting, Development Experience, Communication Skills, Python
  • Secondary:? I.e., Messaging, Caching, Docker & Kubernetes, Development in Golang, or C / C+ Role NamePrimary SkillsSecondary Skills SRE • Linux Admin (System Administration & Network Configuration)
    • Debugging & Troubleshooting (Application and Infrastructure) production performance issues
    • Working as SRE on debugging end to end triage is a big plus
    • Ability to code in Python (Go lang Preferred),
    • Kubernetes Administration
    • CICD Tooling & DevOps Automation

    Knowledge of containers: cgroup, namespace, overlay volumes, etc.
    Scripting skills are required: bash/python

    Debugging/Troubleshooting skills on application/infrastructure/Linux levels


  • San Francisco, United States OpenAI Full time

    Site Reliability Engineer, Research Platform, SRE Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Jose, United States Equifax Full time

    Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SREs in our team take...


  • San Francisco, United States OpenAI Full time

    About the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...


  • San Francisco, United States OpenAI Full time

    About the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...


  • San Ramon, United States The LaSalle Group Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States The LaSalle Group Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Francisco, United States OpenAI Full time

    About the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...


  • San Francisco, United States OpenAI Full time

    About the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...


  • San Francisco, CA, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that’s based in San Ramon, CA, who’s in need of a well-rounded, Site Reliability Engineer (SRE) – Grafana Observability – with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that’s based in San Ramon, CA, who’s in need of a well-rounded, Site Reliability Engineer (SRE) – Grafana Observability – with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Ramon, United States LaSalle Network Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...