Observability Engineer

Found in: Talent US C2 - 2 weeks ago


San Francisco, United States OpenAI Full time

Join the engineering teams that bring OpenAI’s ideas safely to the world

The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.

About the Role

As OpenAI continues to grow, we are looking for experienced, problem-solving engineers to ensure our systems scale. Our success depends on our ability to quickly iterate on products while also ensuring that they are performant and reliable.

You will work in a deeply iterative, collaborative, fast-paced environment to bring our technology to millions of users around the world, and ensure it’s delivered with safety and reliability in mind.

Successful candidates will play a crucial role in ensuring the observability, reliability, scalability, and performance of our systems as we continue to expand. As an Observability Engineer, you will be at the forefront of maintaining and enhancing the stability, scalability, and performance of our rapidly evolving infrastructure. You will work closely with cross-functional teams, including software engineers, product managers, and data scientists, to build and maintain resilient systems that can handle our growing user base and workload. This role requires a blend of technical expertise, strategic thinking, and effective communication to ensure that systems are not only operational but also optimized and aligned with business goals.

In this role, you will:

Develop and maintain systems that allow for effective monitoring, logging, and tracing of software applications. This includes choosing appropriate tools and technologies, setting up dashboards, and ensuring the scalability and reliability of the observability infrastructure.

Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance. Ensure compatibility and efficiency across various platforms and services.

Collaborate with different engineering teams to integrate observability practices into their workflows.

Regularly analyze system performance and identify areas for improvement. This involves working closely with other engineering teams to understand their needs and challenges and providing insights and solutions for better system performance.

Consistently stay up-to-date with the latest trends in observability, logging, monitoring, and cloud technologies. Introducing innovative solutions and best practices to improve system observability and reliability. Experiment with new tools and practices to enhance the observability landscape.

Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.

Create comprehensive documentation for observability systems and processes. Prepare reports and insights for management regarding system performance and reliability.

You might thrive in this role if you:

Have a track record of building, operating and accelerating observability systems that empower your fellow engineers, at scale.

Enjoy seeking out and addressing bottlenecks and areas for performance improvement in our systems.

Utilize Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management.

Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.

Help create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.

Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

Qualifications:

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).

Proven proficiency in monitoring tools (e.g., DataDog, Prometheus, Grafana, ELK stack) and cloud platforms (e.g., AWS, Azure, GCP).

Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure).

Proficiency in programming/scripting languages.

Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).

Excellent communication skills are crucial for this role, as it involves interfacing with various stakeholders, presenting findings and plans, and documenting systems and processes.

Experience with microservices architecture and service mesh technologies.

Strong understanding of distributed systems, networking, and database technologies.

Excellent problem-solving skills and ability to work in a fast-paced environment.

This role is exclusively based in our San Francisco HQ. We offer relocation assistance to new employees.

#LI-TN1

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 



  • San Francisco, United States Cloudflare Inc Full time

    Available Locations: Hybrid - Austin, Champaign, San Francisco, Washington DC or Remote - US About the Role Cloudflare runs one of the largest networks in the world, and the telemetry we gather from that network is broad in variety and deep in cardinality. We’re looking for a Product Manager who can help build an observability platform that enables us to...


  • San Francisco, United States Caldera Full time

    Senior Infrastructure Engineer, SecurityWere looking for an incredible senior engineer to help us build the future of blockchain scalability.This is an ideal opportunity for an engineer who is already passionate about tackling problems in blockchain scalability, or looking to break into the blockchain engineering space. If youre looking to work in a...

  • Senior iOS Engineer

    Found in: beBee jobs US - 2 weeks ago


    San Francisco, California, United States Observant Full time

    We're looking for a Senior iOS Engineer to build new features, improve performance, build out our in-app tooling/infrastructure, and prototype green-field projects.Observant focuses on Automotive, Machine Learning, Transportation, and B2B · SaaS · Mobile · Artificial Intelligence / Machine Learning. Their company has offices in San Francisco, Phoenix,...

  • Manager, Software Engineering

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States Figma Full time

    Server Platform is at the core of Figma’s infrastructure foundation! This area owns the foundational layer of infrastructure that underpins all other infrastructure and application services at Figma. The Foundation team under Server Platform is Infrastructure's infrastructure. This team enables the efficient operation and rapid development of reliable...


  • San Francisco, United States PSG Global Solutions Full time

    Description We're looking for an Electronics Testing Engineer/Technician , working in Pharmaceuticals and Medical Products industry in 269 E Grand Ave, South San Francisco, California, 94080, United States . Our team combines expertise in Biology, Chemistry, Physics, Medicine, Engineering, Computer Science, and more to create interventions that exponentially...

  • Engineering Manager, Search

    Found in: Jooble US O C2 - 3 days ago


    San Francisco, CA, United States Sentry Full time

    About the role Sentry.io provides developer-first observability to over 4 million developers, and the Search & Storage team enables that mission by building and running a scalable and reliable data platform, capable of handling billions of events and metrics. We use open source technologies like ClickHouse, Kafka, and Kubernetes to help us deliver on our...

  • Director of Engineering

    Found in: beBee jobs US - 2 weeks ago


    San Francisco, California, United States Spencer Ogden Full time

    This company develops, invests, and delivers solar photovoltaic (PV) and battery energy storage systems (BESS) projects through Engineering, Procurement, Construction (EPC) services in North America.The company brings a decade of global leadership in solar PV to deliver utility-scale power generation plants customized for local and regional energy markets....


  • San Francisco, United States Caldera Full time

    Senior Infrastructure Engineer, Security We're looking for an incredible senior engineer to help us build the future of blockchain scalability. This is an ideal opportunity for an engineer who is already passionate about tackling problems in blockchain scalability, or looking to break into the blockchain engineering space. If you're looking to work in a...


  • San Francisco, United States Bubble Group, Inc. Full time

    Bubble empowers businesses and entrepreneurs around the world to build software and apps without writing any code or having to think about infrastructure. We have created a rich visual programming language running on commodity cloud infrastructure, making technology accessible and user friendly and allowing users to bring their visions to life quickly. What...

  • Infrastructure Engineer

    Found in: beBee jobs US - 2 weeks ago


    San Francisco, California, United States Resemble AI Full time

    About the companyWe're taking Generative Voice AI to a new level. We create High-quality synthetic voices that capture human emotion.Creatives of all kinds rely on Resemble's immersive voice engine to rapidly accelerate the development of new voice-centric experiences without losing the flexibility and humanness of speech.Resemble AI supercharges your...

  • Software Engineer, Infrastructure

    Found in: Talent US C2 - 1 week ago


    San Francisco, United States OpenAI Full time

    About the Team The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth. ...


  • San Francisco, United States CareerBuilder Full time

    Bubble empowers businesses and entrepreneurs around the world to build software and apps without writing any code or having to think about infrastructure. We have created a rich visual programming language running on commodity cloud infrastructure, making technology accessible and user friendly and allowing users to bring their visions to life quickly. What...


  • San Francisco, United States CareerBuilder Full time

    My client is seeking a highly motivated Junior Environmental Engineer/Geologist to join our San Francisco office and contribute to our continued success. At this employee-owned firm, you will have the opportunity to work on a wide range of challenging projects, from groundwater assessment and remediation to environmental impact assessments. Your expertise...

  • Sr. Software Engineer, Backend

    Found in: beBee jobs US - 2 weeks ago


    San Francisco, California, United States hims & hers Full time

    About the job:Hims & Hers is seeking an experienced Sr. Software Engineer to help us build the platform used to reliably fulfill customer orders and prescriptions, at scale. As a member of the growing Fulfillment and Pharmacy Engineering Backend team, you will help define, build, test, deploy, and support the platform that delivers self-service capabilities...

  • Hardware Systems Engineer

    Found in: Jooble US O C2 - 3 days ago


    San Francisco, CA, United States CloudFlare Full time

    About the department Cloudflare’s Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world’s internet traffic to be served smoothly. Deployed across 285 cities in 100+ countries, the hardware we select helps improve the security,...


  • San Francisco, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionAbout UsWe are a mission-driven payments company focused on helping merchants accept government benefits through a unified API. With over 42 million Americans relying on government assistance for purchasing essentials, such as groceries, we aim to streamline the process and empower merchants to serve this segment effectively....

  • General Engineering Tech I

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States Bubble Group, Inc. Full time

    Bubble empowers businesses and entrepreneurs around the world to build software and apps without writing any code or having to think about infrastructure. We have created a rich visual programming language running on commodity cloud infrastructure, making technology accessible and user friendly and allowing users to bring their visions to life quickly. ...

  • Geotechnical Project Engineer

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States Engeo, Inc. Full time

    Description Position Summary: While working alongside peers and ENGEO’s technical leaders, you will gain experience working on a wide variety of challenging and world-renowned projects. You will be encouraged and supported to manage increasingly complex and geotechnically challenging projects. A Day in the Life : “Ever-changing! In a given week I...

  • Manager, Software Engineering

    Found in: Jooble US O C2 - 2 weeks ago


    San Francisco, CA, United States Figma Full time

    Server Platform is at the core of Figma’s infrastructure foundation! This area owns the foundational layer of infrastructure that underpins all other infrastructure and application services at Figma. The Foundation team under Server Platform is Infrastructure's infrastructure. This team enables the efficient operation and rapid development of reliable...


  • San Diego, United States Icon Utility Services Full time

    Job DescriptionJob DescriptionICON Utility Services is seeking qualified Field Safety Observers for civil, gas, electrical and pipeline opportunities. Our Field Safety Observer will be deployed to various locations throughout our utility client's service territory, which is mainly San Diego County. FSO's will conduct safety observations on our...