Enterprise Site Reliability Engineer

1 week ago


San Francisco, United States OpenAI Full time

OpenAI’s IT organization supports the mission of deploying artificial general intelligence (AGI) for the benefit of all. Our team is committed to providing seamless technological support and solutions to ensure that all OpenAI employees are well-equipped and connected. This enables them to contribute effectively towards our AI research, corporate operations and product initiatives.

About the Role

As an Enterprise Site Reliability Engineer, you are tasked with enhancing our Enterprise IT services through the use of OpenAI's models, automation, and development of internal tools. This includes automating internal identity and access management systems, designing applications and dashboards for internal users, and optimizing workflows. In addition, the role involves creating and managing intelligent Slack bots and integrations.

In this role, you will:

  1. Architect scalable systems and deliver full lifecycle application development, including requirement gathering, planning, building, deploying, and maintaining software solutions.
  2. Manage AzureAD IAM, SAML, OAUTH, and SCIM systems.
  3. Handle identity and permissions on major cloud providers (Azure, AWS, GCP).
  4. Design and implement role-based access control (RBAC) systems.
  5. Mitigate common security vulnerabilities in Identity and Access Management.
  6. Use Python for scripting and infrastructure automation tools such as Ansible, Chef, Puppet, and Terraform.
  7. Utilize OpenAI's LLMs for internal application development and workflow optimization.
  8. Create and manage Slack bots and integrations.
  9. Manage multiple projects simultaneously, ensuring attention to detail and effective task prioritization.

You may thrive in this role if you have:

  1. Proven application development skills with experience in reliable systems.
  2. Experience with building Web APIs and services using Python (including async), Flask, or FastAPI.
  3. In-depth knowledge of AzureAD IAM, SAML, OAUTH, and SCIM.
  4. Advanced scripting skills in Python and familiarity with infrastructure automation tools.
  5. Understanding of identity and permissions on major cloud providers (Azure, AWS, GCP).
  6. Experience with role-based access control (RBAC) systems design and implementation.
  7. Understanding of common security vulnerabilities in Identity and Access Management and strategies to mitigate them.
  8. Proven ability to manage projects effectively, pay close attention to detail, and prioritize tasks.
  9. Excellent communication skills and ability to engage effectively with both technical and non-technical stakeholders.
  10. Experience with developing and managing Slack bots and integrations.

This is a hybrid role and will require 3 days a week in our San Francisco office.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

#J-18808-Ljbffr

  • San Francisco, United States OpenAI Full time

    Site Reliability Engineer, Enterprise IAMOpenAI's IT organization supports the mission of deploying artificial general intelligence (AGI) for the benefit of all. Our team is committed to providing seamless technological support and solutions to ensure that all OpenAI employees are well–equipped and connected. This enables them to contribute effectively...


  • San Francisco, United States OpenAI Full time

    Site Reliability Engineer, Enterprise IAMOpenAI’s IT organization supports the mission of deploying artificial general intelligence (AGI) for the benefit of all. Our team is committed to providing seamless technological support and solutions to ensure that all OpenAI employees are well-equipped and connected. This enables them to contribute effectively...


  • San Francisco, California, United States Swish Analytics Full time

    {"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...


  • San Francisco, United States WEX Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, California, United States WEX Full time

    Job SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...


  • san francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States Focal Systems Full time

    Location: San Francisco - hybrid (1-2 days per week)Salary: $165-175k + stock Company Description Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar...


  • San Francisco, California, United States Arbitrum Inc Full time

    Reliability EngineerAt Arbitrum Inc, we're on a mission to bring blockchain to a billion people. Our developer platform is designed to make building on the blockchain easy, and we're looking for a skilled Reliability Engineer to join our Infrastructure team.As a Reliability Engineer, you'll collaborate with our engineering team to design, deploy, and...


  • San Francisco, California, United States Aitopics Full time

    About the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key member of our team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure.Your work will directly impact the availability and performance of our data services, enabling the organization to make...


  • San Francisco, California, United States Hinge Health Full time

    About the RoleHinge Health is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our platform, including automation, logging, monitoring, and alerting.You will thrive in a collaborative environment, have excellent communication skills, and be...


  • San Francisco, California, United States Tampa Gardens Senior Living Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Cloud Infrastructure Team. As a key member of our team, you will be responsible for deploying, managing, optimizing, and upgrading the systems that run Sight Machine software.You will work closely with our Development Engineering team to ensure the stability,...


  • San Francisco, United States Focal Systems Full time

    Location: San Francisco - hybrid (1-2 days per week)Salary: $170-190k + stockCompany DescriptionFocal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail...


  • San Francisco, California, United States Zilliz Full time

    Job Title: Cloud Platform Staff Site Reliability EngineerWe are seeking a highly skilled Cloud Platform Staff Site Reliability Engineer to join our team at Zilliz. As a key member of our SRE team, you will be responsible for ensuring the reliability, availability, and performance of our distributed database systems.Key Responsibilities:Design and build tools...


  • San Jose, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States WEX, Inc. Full time

    About the RoleThe WEX Site Reliability Engineering (SRE) team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits Reliability organization which supports our internal...