Product SRE Site Reliability Engineer
4 weeks ago
Product SRE
- Location: Must be in Bay Area, CA orSeattle and willing to go in to office a few times a week
- On-Call Expectation: 24/7 every 4-8 weeks
- Primary: Application-level Debugging & Troubleshooting, Linux Admin, Cloud Native Monitoring & Admin, Shell Scripting, Development Experience, Communication Skills, Python
- Secondary:? I.e., Messaging, Caching, Docker & Kubernetes, Development in Golang, or C / C+ Role NamePrimary SkillsSecondary Skills SRE • Linux Admin (System Administration & Network Configuration)
• Debugging & Troubleshooting (Application and Infrastructure) production performance issues
• Working as SRE on debugging end to end triage is a big plus
• Ability to code in Python (Go lang Preferred),
• Kubernetes Administration
• CICD Tooling & DevOps Automation
Knowledge of containers: cgroup, namespace, overlay volumes, etc.
Scripting skills are required: bash/python
Debugging/Troubleshooting skills on application/infrastructure/Linux levels
-
San Francisco, United States OpenAI Full timeSite Reliability Engineer, Research Platform, SRE Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the...
-
Principal Site Reliability Engineer
4 weeks ago
San Francisco, United States Apollo Solutions Full timePrincipal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...
-
Principal Site Reliability Engineer
3 weeks ago
San Francisco, United States Apollo Solutions Full timePrincipal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...
-
Principal Site Reliability Engineer
3 weeks ago
San Francisco, United States Apollo Solutions Full timePrincipal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...
-
Site Reliability Engineer
3 weeks ago
San Jose, United States Equifax Full timeSite Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SREs in our team take...
-
San Francisco, United States OpenAI Full timeAbout the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...
-
San Francisco, United States OpenAI Full timeAbout the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...
-
Site Reliability Engineer
1 month ago
San Ramon, United States The LaSalle Group Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
Site Reliability Engineer
3 weeks ago
San Ramon, United States The LaSalle Group Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
San Francisco, United States OpenAI Full timeAbout the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...
-
San Francisco, United States OpenAI Full timeAbout the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...
-
Site Reliability Engineer
4 weeks ago
San Francisco, CA, United States Apollo Solutions Full timeSite Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...
-
Site Reliability Engineer
1 month ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
Site Reliability Engineer
4 weeks ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that’s based in San Ramon, CA, who’s in need of a well-rounded, Site Reliability Engineer (SRE) – Grafana Observability – with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the...
-
Site Reliability Engineer
2 weeks ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that’s based in San Ramon, CA, who’s in need of a well-rounded, Site Reliability Engineer (SRE) – Grafana Observability – with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the...
-
Site Reliability Engineer
1 month ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
Site Reliability Engineer
1 month ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
Site Reliability Engineer
7 days ago
San Ramon, United States LaSalle Network Full timeLaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...
-
Principal Site Reliability Engineer
1 month ago
San Francisco, United States Apollo Solutions Full timePrincipal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...
-
Site Reliability Engineer
3 weeks ago
San Francisco, United States Talkdesk Full timeAt Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...