Site Reliability Engineer
17 hours ago
Site Reliability Engineer (SRE)
Location: San Francisco Bay Area
Role Overview:
We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation and optimizing cloud infrastructure. This role offers the opportunity to work with cutting-edge AI/ML technologies, leveraging them to solve complex challenges in cloud infrastructure management and performance optimization.
Key Responsibilities:
- System Reliability & Performance: Design, implement, and maintain scalable systems, ensuring high availability, performance, and disaster recovery across production environments.
- Automation & Tool Development: Develop automation tools to streamline operations, improve system reliability, and reduce manual interventions.
- Cloud Infrastructure Management: Create and manage cloud instances (e.g., dev, staging, production) using AWS, GCP, or Azure, optimizing infrastructure performance and cost.
- Integration of AI/ML Models: Collaborate with engineering teams to integrate machine learning models into production environments, ensuring that these models scale efficiently and perform optimally.
- Incident Management: Respond to and resolve incidents, minimizing downtime and ensuring quick recovery. Lead post-incident reviews and implement preventive measures.
- Continuous Improvement: Identify areas of improvement and drive initiatives to enhance system reliability, performance, and security.
- Security & Compliance: Ensure that infrastructure and applications adhere to security best practices and compliance standards.
Qualifications:
- Educational Background: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Experience: Proven experience as a Site Reliability Engineer or in a similar role within a SaaS environment, managing and optimizing cloud infrastructure (preferably AWS, GCP, or Azure), and familiarity with integrating AI and machine learning technologies.
- Technical Skills:
- Proficiency in programming and scripting languages such as Python, Go, or Bash.
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Solid understanding of networking, security, and performance optimization practices.
- Knowledge of CI/CD pipelines and DevOps practices to ensure smooth development and deployment cycles.
- Problem-Solving: Strong analytical and problem-solving skills with attention to detail.
- Collaboration & Communication: Excellent interpersonal skills, with the ability to work collaboratively in cross-functional teams and communicate technical concepts clearly.
Benefits:
- Competitive Salary: Attractive compensation package, including equity options.
- Health & Wellness: Comprehensive health, dental, and vision insurance, along with other benefits.
- Work Environment: A collaborative and innovative work environment within a growing company.
- Growth Opportunities: Opportunities for career growth, professional development, and a chance to shape the future of the company’s technology and infrastructure.
-
EVONA | Site Reliability Engineer
7 days ago
san francisco bay area, United States EVONA Full timeSite Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...
-
Site Reliability Engineer
19 hours ago
San Francisco, United States EVONA Full timeSite Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...
-
Site Reliability Engineer
5 days ago
San Francisco, United States Arbitrum Full timeOur mission is to bring blockchain to a billion people. The Alchemy Platform is a world class developer platform designed to make building on the blockchain easy. We've built leading infrastructure in the space, powering over$105billion in transactions for tens of millions of users in 99% of countries worldwide. The Alchemy team draws from decades of deep...
-
Site Reliability Engineer
5 days ago
San Francisco, United States BaseTen Labs, Inc. Full timeABOUT BASETENWe're a growing team of builders backed by top-tier investors including IVP, Spark Capital, and Sarah Guo at Conviction. ML teams at enterprises and category-defining AI-native companies like Descript, Bland, and Patreon use Baseten to power their core production workloads with best in class performance, security, and reliability. While we've...
-
Sr Site Reliability Engineer
5 days ago
San Francisco, United States Hulu Full timeJob Posting Title:Sr Site Reliability EngineerReq ID:10109036Job Description:Job Summary:Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach to Reliability Engineering by supporting, educating, mentoring, and...
-
Site Reliability Engineer
7 days ago
San Francisco, United States Resource Informatics Group Full timeJob Title: Site Reliability Engineer Work Location: San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills:10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate setup,...
-
Site Reliability Engineer
6 days ago
San Francisco, United States ESL FACEIT GROUP Full timeAt EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming...
-
Site Reliability Engineering Lead
1 day ago
San Francisco, California, United States MongoDB Full timeMongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data. Our developer data platform, MongoDB Atlas, is a globally distributed, multi-cloud database available in over 115 regions across AWS, Google Cloud, and Microsoft Azure.Job OverviewWe are seeking an experienced Site Reliability Engineer (SRE)...
-
Site Reliability Engineer
8 hours ago
San Francisco, United States Asystem Full timeParticle is a startup based in the San Francisco Bay Area. We are seeking candidates who are self-starters, adaptable, and flexible in a startup environment. As a team of veteran technologists from Twitter, Tesla, Periscope, and more, we are developing a next-generation news platform to redefine your daily intake of news. We value active engagement in...
-
Site Reliability Engineer Specialist
3 days ago
San Francisco, California, United States Federal Reserve Bank of San Francisco Full timeCompany OverviewThe Federal Reserve Bank of San Francisco is a leading financial institution dedicated to fostering an inclusive economy that benefits everyone. We're seeking talented individuals like you to join our dynamic team and contribute to our mission.Job DescriptionWe are looking for a Site Reliability Engineer to play a crucial role in maintaining...
-
Site Reliability Engineer
4 days ago
San Diego, United States TALENT Software Services Full timeAre you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....
-
Site Reliability Engineer
4 days ago
San Diego, United States TALENT Software Services Full timeAre you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....
-
Site Reliability Engineer
3 days ago
San Diego, United States TALENT Software Services Full timeAre you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....
-
Site Reliability Engineer II
6 days ago
San Francisco, United States Navient Full timeOur mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast. We build tools that help people feel in control of their financial future, including: Private student loans - low rates,...
-
Site Reliability Engineering Leader
4 days ago
San Francisco, California, United States Federal Reserve Bank of San Francisco Full timeJob Description SummaryThis role requires a strong background in software development, system administration, and cloud computing. The successful candidate will have experience with automated deployments, containerization, and microservices architecture.The Sr. Site Reliability Engineer will work closely with the engineering team to design, deploy, and...
-
Site Reliability Engineer
5 days ago
San Francisco, United States Stefanini, Inc Full timeJoin us to co-create solutions for a better future!Job DetailsInformation TechnologySite Reliability Engineer San Francisco, CA Posted: 12/27/2024Job ID#: 59294Job Category: Information TechnologyPosition Type: Full TimeDuration: Long-TermStefanini Group is hiring!Stefanini is looking for a Site Reliability Engineer in San Francisco, CA (Hybrid)For quick...
-
Site Reliability Engineer II
7 days ago
San Francisco, United States Earnest Full timeOur mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast. We build tools that help people feel in control of their financial future, including: Private student loans - low rates,...
-
Lead DevOps/Site Reliability Engineer
7 days ago
San Francisco, United States Saxon Global Full timeLead DevOps/Site Reliability Enginee Looking for a resource more senior in the DevOps space, with a leaning toward site reliability engineering. Docker containers, Kubernetes automation Mostly focused on the automation, current pain points around deployments reliability around their data engineering processes. SRE who can go beyond the memory, what kind of...
-
Site Reliability Engineer
5 days ago
San Francisco, United States Mistral AI Full timeAbout MistralAt Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. Our mission is to make AI ubiquitous and open. We are creative, low-ego, team-spirited, and have been passionate about AI for years. We hire people who thrive in competitive environments, because they find them more fun to work in....
-
Site Reliability Engineer
2 days ago
San Diego, United States Motion Recruitment Full timeOur Client, an A Global Media/Entertainment Company, is looking for a Site Reliability Engineer to join their team in San Diego, CA!Pay: $80-90/hourHybrid***This is a 6 Month Contract Open to Conversion OR Extension!***As the Site Reliability Engineer you will be part of the CICD and Cloud SRE team supporting the heart of PlayStation Network to make sure...