System Reliability Manager

3 weeks ago


San Francisco, California, United States Unreal Gigs Full time

Job Title: System Reliability Manager


Company Overview:

">">

We're a forward-thinking company that values expertise and teamwork.

">">

Salary: $130,000 per year

">">

Job Description:

">">

We're looking for a seasoned System Reliability Manager to oversee the reliability and scalability of our cloud infrastructure.

">">

Key Responsibilities:

">">
  • ">">
  • Cross-Functional Collaboration: Lead cross-functional teams to identify and resolve performance bottlenecks and ensure high availability and reliability of systems.">
">">Containerization and Orchestration: Ensure that services are deployed in a scalable, fault-tolerant manner, enabling rapid iteration and delivery of software.">">">Disaster Recovery and Backups: Plan and implement disaster recovery strategies, ensuring data backups and failover systems are in place.">">

Required Skills and Qualifications:

">">

The ideal candidate will have experience working with cloud platforms (AWS, GCP, or Azure) and deploying infrastructure using Infrastructure as Code tools like Terraform, Ansible, or CloudFormation.

">">

Educational Requirements:

">">
  • ">">
  • Bachelor's degree in Computer Science, IT, or a related field.">
">

Benefits:

">">
  • ">">
  • Life and disability insurance, including short-term and long-term coverage.">
">
  • ">">
  • Tuition reimbursement and career advancement programs.">
">

About Unreal Gigs:

">">

We foster a collaborative environment, prioritizing employee satisfaction and well-being through comprehensive benefits and growth opportunities.



  • San Francisco, California, United States Anrok, Inc Full time

    Anrok, Inc seeks highly skilled individuals to fill the Reliable System Specialist position, focusing on designing, building, and operating the systems that support our product and the engineers who build it. As a key member of our team, you will be responsible for taking ownership of Anrok's reliability, security, scalability, and performance. To excel in...


  • San Francisco, California, United States ESL FACEIT Group Full time

    At ESL FACEIT Group, we strive to create immersive experiences that bring players and fans together. Our corporate social responsibility is centered around the idea of "GG for all," where everyone has an equal chance to succeed.About UsWe're passionate about cultivating a culture that supports the growth of esports, gaming tournaments, leagues, events, and...


  • San Francisco, California, United States WEX, Inc. Full time

    About the Role:The WEX Site Reliability Engineering team is seeking a technical leader with expertise in designing, implementing, and managing complex systems at scale. This Senior Staff SRE will work closely with engineering teams to ensure that our systems are reliable, performant, and secure.Key Responsibilities:Technical Leadership: Provide guidance and...


  • San Francisco, California, United States Cloudflare Inc Full time

    About UsScaling with CloudflareAt Cloudflare, we're scaling rapidly, and we need talented engineers to help us keep up. As a system reliability engineer, you'll play a critical role in ensuring the stability and performance of our global network.We protect and accelerate any internet application online without adding hardware, installing software, or...


  • San Francisco, California, United States YO HR CONSULTANCY Full time

    Job OverviewAs a Site Reliability Engineer at YO HR CONSULTANCY, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure. This role involves working with various technologies such as Docker, Kubernetes, Ansible, and Python to design, deploy, and maintain highly available systems.Responsibilities• Extensive...


  • San Francisco, California, United States Orb Full time

    Revolutionizing Billing InfrastructureAt Orb, we're on a mission to transform the way businesses bill and manage their revenue. By leveraging cutting-edge technology, we enable companies to automate their billing processes and adapt pricing strategies with ease.Our approach prioritizes collaboration, focus, and kindness, fostering a culture that values...


  • San Francisco, California, United States Mistral AI Full time

    About UsMistral AI is a dynamic startup focused on bringing cutting-edge AI technology to the world. Our goal is to make AI ubiquitous and open, and we're looking for talented individuals to join our team. We value creativity, teamwork, and a passion for AI.Job DescriptionWe're seeking a Highly Available Infrastructure Expert to ensure the reliability,...


  • San Jose, California, United States Tik Tok Full time

    Job OverviewThe Reliability Systems Architect will be responsible for designing, developing, and operating large-scale distributed systems that meet the needs of our users. This role requires expertise in software development and infrastructure operations, with a focus on scalability, reliability, and efficiency.Skill Requirements:Programming skills in C,...


  • San Francisco, California, United States Oven Full time

    About Our CompanyBun, an open-source JavaScript tooling company, seeks to make programming more accessible. Backed by significant investments from top investors in Silicon Valley, we've gained recognition as one of the top GitHub repositories, boasting a vibrant community of over 33,000 Discord members.As part of our team, you'll play a crucial role in...


  • San Francisco, California, United States Anthropic Full time

    Job Title: Machine Learning Systems Engineer">Company Overview:">At Anthropic, we aim to create reliable, interpretable, and steerable AI systems that benefit society as a whole. Our team of researchers, engineers, and experts collaborate to achieve this goal.">Job Description:">You will be responsible for designing and implementing critical algorithms and...


  • San Jose, California, United States MILLENNIUMSOFT Full time

    Job SummaryAs a Reliability Engineer II, you will play a proactive role in the Research & Development Hardware Engineering group. You will provide hands-on, analytical, and reliability expertise to support engineering programs associated with the development of complex life science instruments. In this collaborative team environment, you will work on...


  • San Diego, California, United States TASC Full time

    About TASCTASC is a leading provider of advanced engineering services, working on revolutionary systems in air and space that impact people's lives around the world. Our team has the incredible opportunity to work on cutting-edge projects that preserve freedom and democracy and advance human discovery and our understanding of the universe.Role OverviewThe...


  • San Francisco, California, United States Gridware Full time

    About GridwareGridware is a pioneering company that develops cutting-edge technologies to enhance and protect the electrical grid, which forms the backbone of our modern society. Our mission is to ensure the reliability and safety of this critical infrastructure.We are headquartered in the Bay Area, California, and backed by top climate-tech and Silicon...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    Staff Software Engineer - Compute Reliability and EfficiencyAbout Reddit:Ronald Barrett founded Reddit in 2005 as a platform for users to share and discuss content. Today, Reddit is home to thousands of active communities where users can engage in open and authentic conversations. With over 100,000+ active communities and approximately 97M+ daily active...


  • San Francisco, California, United States ABM Full time

    Job Overview:ABM, a leading provider of integrated facility solutions, is seeking a skilled Mechanical Systems Manager to oversee the maintenance and repair of mechanical and electrical systems within a property. The ideal candidate will have a strong background in operations management and be able to effectively supervise staff to achieve maximum system...

  • Reliability Expert

    11 hours ago


    San Francisco, California, United States OpenAI Full time

    At OpenAI, we're pushing the boundaries of artificial intelligence to benefit all of humanity. We're dedicated to ensuring that our systems scale with reliability and performance in mind.About the RoleWe're seeking experienced engineers to join our Applied Engineering team, where you'll work across research, engineering, product, and design to bring our...


  • San Diego, California, United States Booz Allen Hamilton Full time

    At Booz Allen, we prioritize innovation and technical excellence. As a Reliability Systems Specialist, you will be instrumental in shaping the future of undersea systems for the Navy.The Opportunity: We are seeking an experienced engineer to lead reliability analysis and develop robust system designs. Your ability to combine technical skills with big-picture...


  • San Francisco, California, United States Cloudflare, Inc. Full time

    We are Cloudflare, a highly ambitious and large-scale technology company with a soul. Our mission is to help build a better Internet by protecting the free and open Internet.As a key member of our team, you will play a crucial role in building and operating our Edge platform running in over 320 cities across more than 120 countries. This is an exceptional...


  • San Francisco, California, United States Early Warning® Full time

    At Early Warning, we have a rich history of powering and protecting the U.S. financial system with cutting-edge solutions like Zelle and PazeSM. As a trusted partner in payments, we collaborate with thousands of institutions to increase access to financial services and safeguard transactions for millions of consumers and businesses.This position is part of...


  • San Diego, California, United States Prolim Full time

    **About Us**Prolim is a leading provider of digital transformation solutions, helping clients achieve their goals through innovation and collaboration.We are seeking a Reliability Engineer 3 to join our team based out of San Diego, CA. As a key member of our Advanced Networking Systems Operating Unit (ANS OU), you will play a crucial role in ensuring the...