Site Reliability Engineer

17 hours ago


San Francisco Bay Area, United States EVONA Full time

Site Reliability Engineer (SRE)

Location: San Francisco Bay Area

Role Overview:

We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation and optimizing cloud infrastructure. This role offers the opportunity to work with cutting-edge AI/ML technologies, leveraging them to solve complex challenges in cloud infrastructure management and performance optimization.

Key Responsibilities:

  • System Reliability & Performance: Design, implement, and maintain scalable systems, ensuring high availability, performance, and disaster recovery across production environments.
  • Automation & Tool Development: Develop automation tools to streamline operations, improve system reliability, and reduce manual interventions.
  • Cloud Infrastructure Management: Create and manage cloud instances (e.g., dev, staging, production) using AWS, GCP, or Azure, optimizing infrastructure performance and cost.
  • Integration of AI/ML Models: Collaborate with engineering teams to integrate machine learning models into production environments, ensuring that these models scale efficiently and perform optimally.
  • Incident Management: Respond to and resolve incidents, minimizing downtime and ensuring quick recovery. Lead post-incident reviews and implement preventive measures.
  • Continuous Improvement: Identify areas of improvement and drive initiatives to enhance system reliability, performance, and security.
  • Security & Compliance: Ensure that infrastructure and applications adhere to security best practices and compliance standards.

Qualifications:

  • Educational Background: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Experience: Proven experience as a Site Reliability Engineer or in a similar role within a SaaS environment, managing and optimizing cloud infrastructure (preferably AWS, GCP, or Azure), and familiarity with integrating AI and machine learning technologies.
  • Technical Skills:
  • Proficiency in programming and scripting languages such as Python, Go, or Bash.
  • Experience with containerization and orchestration tools like Docker and Kubernetes.
  • Solid understanding of networking, security, and performance optimization practices.
  • Knowledge of CI/CD pipelines and DevOps practices to ensure smooth development and deployment cycles.
  • Problem-Solving: Strong analytical and problem-solving skills with attention to detail.
  • Collaboration & Communication: Excellent interpersonal skills, with the ability to work collaboratively in cross-functional teams and communicate technical concepts clearly.

Benefits:

  • Competitive Salary: Attractive compensation package, including equity options.
  • Health & Wellness: Comprehensive health, dental, and vision insurance, along with other benefits.
  • Work Environment: A collaborative and innovative work environment within a growing company.
  • Growth Opportunities: Opportunities for career growth, professional development, and a chance to shape the future of the company’s technology and infrastructure.



  • san francisco bay area, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States Arbitrum Full time

    Our mission is to bring blockchain to a billion people. The Alchemy Platform is a world class developer platform designed to make building on the blockchain easy. We've built leading infrastructure in the space, powering over$105billion in transactions for tens of millions of users in 99% of countries worldwide. The Alchemy team draws from decades of deep...


  • San Francisco, United States BaseTen Labs, Inc. Full time

    ABOUT BASETENWe're a growing team of builders backed by top-tier investors including IVP, Spark Capital, and Sarah Guo at Conviction. ML teams at enterprises and category-defining AI-native companies like Descript, Bland, and Patreon use Baseten to power their core production workloads with best in class performance, security, and reliability. While we've...


  • San Francisco, United States Hulu Full time

    Job Posting Title:Sr Site Reliability EngineerReq ID:10109036Job Description:Job Summary:Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach to Reliability Engineering by supporting, educating, mentoring, and...


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location: San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills:10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate setup,...


  • San Francisco, United States ESL FACEIT GROUP Full time

    At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming...


  • San Francisco, California, United States MongoDB Full time

    MongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data. Our developer data platform, MongoDB Atlas, is a globally distributed, multi-cloud database available in over 115 regions across AWS, Google Cloud, and Microsoft Azure.Job OverviewWe are seeking an experienced Site Reliability Engineer (SRE)...


  • San Francisco, United States Asystem Full time

    Particle is a startup based in the San Francisco Bay Area. We are seeking candidates who are self-starters, adaptable, and flexible in a startup environment. As a team of veteran technologists from Twitter, Tesla, Periscope, and more, we are developing a next-generation news platform to redefine your daily intake of news. We value active engagement in...


  • San Francisco, California, United States Federal Reserve Bank of San Francisco Full time

    Company OverviewThe Federal Reserve Bank of San Francisco is a leading financial institution dedicated to fostering an inclusive economy that benefits everyone. We're seeking talented individuals like you to join our dynamic team and contribute to our mission.Job DescriptionWe are looking for a Site Reliability Engineer to play a crucial role in maintaining...


  • San Diego, United States TALENT Software Services Full time

    Are you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....


  • San Diego, United States TALENT Software Services Full time

    Are you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....


  • San Diego, United States TALENT Software Services Full time

    Are you an experienced Site Reliability Engineer with a desire to excel? If so, then Talent Software Services may have the job for you! Our client is seeking an experienced Site Reliability Engineer to work at their company in San Diego, CA.Position Summary: It is an exciting time to be part of client's CICD and Cloud Site Reliability Engineering (SRE) team....


  • San Francisco, United States Navient Full time

    Our mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast. We build tools that help people feel in control of their financial future, including: Private student loans - low rates,...


  • San Francisco, California, United States Federal Reserve Bank of San Francisco Full time

    Job Description SummaryThis role requires a strong background in software development, system administration, and cloud computing. The successful candidate will have experience with automated deployments, containerization, and microservices architecture.The Sr. Site Reliability Engineer will work closely with the engineering team to design, deploy, and...


  • San Francisco, United States Stefanini, Inc Full time

    Join us to co-create solutions for a better future!Job DetailsInformation TechnologySite Reliability Engineer San Francisco, CA Posted: 12/27/2024Job ID#: 59294Job Category: Information TechnologyPosition Type: Full TimeDuration: Long-TermStefanini Group is hiring!Stefanini is looking for a Site Reliability Engineer in San Francisco, CA (Hybrid)For quick...


  • San Francisco, United States Earnest Full time

    Our mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast. We build tools that help people feel in control of their financial future, including: Private student loans - low rates,...


  • San Francisco, United States Saxon Global Full time

    Lead DevOps/Site Reliability Enginee Looking for a resource more senior in the DevOps space, with a leaning toward site reliability engineering. Docker containers, Kubernetes automation Mostly focused on the automation, current pain points around deployments reliability around their data engineering processes. SRE who can go beyond the memory, what kind of...


  • San Francisco, United States Mistral AI Full time

    About MistralAt Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. Our mission is to make AI ubiquitous and open. We are creative, low-ego, team-spirited, and have been passionate about AI for years. We hire people who thrive in competitive environments, because they find them more fun to work in....


  • San Diego, United States Motion Recruitment Full time

    Our Client, an A Global Media/Entertainment Company, is looking for a Site Reliability Engineer to join their team in San Diego, CA!Pay: $80-90/hourHybrid***This is a 6 Month Contract Open to Conversion OR Extension!***As the Site Reliability Engineer you will be part of the CICD and Cloud SRE team supporting the heart of PlayStation Network to make sure...