Observability Engineer

3 weeks ago


San Francisco, California, United States University of California Full time
Job Summary

An Observability Engineer within the Incident Command team plays a critical role in monitoring, evaluating, and optimizing the performance and health of IT systems and applications. This position is pivotal in ensuring that the IT infrastructure operates efficiently and is capable of handling emerging issues swiftly and effectively.

Key Responsibilities
  • Develop and maintain monitoring tools and dashboards that provide real-time insights into the operational status of IT systems.
  • Collect and evaluate metrics, logs, and traces to proactively detect, diagnose, and resolve performance bottlenecks or anomalies before they escalate into more significant incidents.
  • Partner closely with other IT and incident management teams to enhance incident response strategies.
  • Improve the observability framework by integrating advanced analytics and machine learning techniques to predict potential system failures and automate response processes.
Requirements
  • Bachelor's degree in computer science, engineering, or a related field.
  • 5-7 years of experience in information technology or IT Service Management/Customer.
  • Expertise in using advanced monitoring and observability tools such as Datadog, Spectrum, Prometheus, Grafana, Splunk, or New Relic.
  • Advanced ability to analyze and interpret complex data from various sources to diagnose issues and understand system behaviors.
  • Skilled in responding to and managing incidents efficiently, minimizing downtime and ensuring quick resolution of issues.
About UCSF

The University of California, San Francisco (UCSF) is a leading academic healthcare organization, home to groundbreaking discoveries, world-class education, and exceptional healthcare services. Infrastructure Services (IS) is the backbone of the technological infrastructure, assuring the technical services that enable the academic, medical, and research missions of the organization.

The Incident Command team within Infrastructure Services operates as a critical support system for the community of medical and health researchers. This team is dedicated to ensuring seamless access to essential IT resources, thereby enabling continuous and vital research work that has a profound impact on human health and well-being.

UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity, and excellence - also known as our PRIDE values.



  • San Francisco, California, United States University of California Full time

    Job SummaryAn Observability Engineer within the Incident Command team plays a pivotal role in ensuring the IT infrastructure operates efficiently and effectively. This position is critical in monitoring, evaluating, and optimizing the performance and health of IT systems and applications.Key ResponsibilitiesDevelop and maintain monitoring tools and...

  • Software Engineer

    4 weeks ago


    San Francisco, California, United States DocuSign, Inc. Full time

    About the RoleWe are seeking a highly skilled Software Engineer to join our Observability team at DocuSign, Inc. As a key member of our team, you will be responsible for designing, implementing, and operating large-scale distributed systems that process telemetry data in real-time and provide insights that engineers depend on to measure, visualize,...

  • Senior Sales Engineer

    4 weeks ago


    San Francisco, California, United States Edge Delta, Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Sales Engineer to join our team at Edge Delta, Inc. As a Senior Sales Engineer, you will play a critical role in driving sales and customer success by leveraging your expertise in observability and security to drive technical sales cycles and close deals.Key ResponsibilitiesTechnical Sales Presentations:...

  • Software Engineer

    3 weeks ago


    San Francisco, California, United States DocuSign Full time

    About the RoleWe are seeking a highly skilled Software Engineer to join our Observability team at DocuSign. As a key member of our team, you will be responsible for designing, implementing, and operating large-scale distributed systems that provide real-time insights into our customers' experiences.ResponsibilitiesDesign and build platform components needed...


  • San Francisco, California, United States Square Inc. Full time

    Job Description**About the Role**We are seeking a highly skilled and experienced Senior Engineering Manager to lead our Mobile Performance, Reliability, and Observability (MPRO) team at Square Inc. As a key member of our engineering organization, you will be responsible for delivering highly performant and reliable experiences to our customers.**Key...


  • San Mateo, California, United States Snowflake Computing Full time

    Build the Future of Data with SnowflakeAs a Senior Cloud Observability Engineer at Snowflake, you will play a critical role in shaping the future of data management. Our team is dedicated to building scalable, reliable, and testable software to automate infrastructure management. If you have a passion for solving complex problems at scale and are an ace...


  • San Francisco, California, United States Cervin Full time

    About the Role:Cervin is seeking a seasoned Product Manager with a strong background in data and observability to drive the development and expansion of our data and observability products. This role will be instrumental in building and executing a product roadmap that aligns with company goals and customer requirements.Key Responsibilities:Product Vision...


  • San Francisco, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled and experienced Partner Development Manager for Security and Observability to join our team at Amazon. As a key member of our business development team, you will be responsible for building and executing a strategy to identify, recruit, and develop strategic partnerships with Independent Software Vendors (ISVs)...

  • DevOps Engineer

    2 weeks ago


    San Francisco, California, United States Indotronix International Corporation Full time

    Job Title: Lead DevOps/Site Reliability EngineerWe are seeking a highly skilled and experienced Lead DevOps/Site Reliability Engineer to join our team at Indotronix International Corporation. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure, ensuring high availability,...

  • DevOps Engineer

    2 weeks ago


    San Francisco, California, United States Saxon Global Full time

    Job Title: Lead DevOps/Site Reliability EngineerWe are seeking a highly skilled and experienced Lead DevOps/Site Reliability Engineer to join our team at Saxon Global. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure, ensuring high availability, scalability, and performance...


  • San Francisco, California, United States Orb Full time

    About OrbOrb is a cutting-edge billing infrastructure company that empowers businesses to unlock their revenue potential. We believe that pricing and billing should not be a barrier to innovation and growth.Role & ImpactAs a Site Reliability Engineer at Orb, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring...


  • San Francisco, California, United States Gridware Full time

    Job Title: Senior Software EngineerWe are seeking a highly skilled Senior Software Engineer to join our team at Gridware. As a key member of our engineering team, you will be responsible for designing and developing robust, scalable software systems that process real-time data from grid monitoring sensors.Key Responsibilities:Design and implement systems to...


  • San Francisco, California, United States Geoworld USA Full time

    Job SummaryWe are seeking a highly motivated and detail-oriented Geotechnical Field Engineer to join our team at Geoworld USA. As a Geotechnical Field Engineer, you will be responsible for performing field geotechnical engineering observation work for our projects, working under the guidance of industry-leading geotechnical engineers and geologists.Key...


  • San Francisco, California, United States Outdefine Full time

    About the JobOutdefine is seeking a skilled Senior Site Reliability Engineer to join our team. As a key member of our Infrastructure team, you will be responsible for ensuring the reliability and scalability of our blockchain-based services.Key ResponsibilitiesRun internal Chainlink and Blockchain nodesProvide enterprise-level blockchain connectivity to...


  • San Francisco, California, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled and experienced Senior Principal Engineer to join our team. As a key member of our engineering team, you will be responsible for providing technical guidance and leadership in the development, delivery, and operation of cloud services that enable animation, film, and game development studios to migrate their...


  • San Francisco, California, United States Airwallex Full time

    About AirwallexAirwallex is a unified payments and financial platform for global businesses, empowering over 100,000 companies worldwide with integrated solutions for business accounts, payments, spend management, and treasury.Job SummaryWe are seeking a highly skilled Core Banking Engineering Manager to lead our mission-critical client ledger and financial...


  • San Francisco, California, United States AEI Full time

    About the RoleAffiliated Engineers, Inc. (AEI) is a leading consulting engineering firm dedicated to delivering high-performance, innovative, and holistic engineering solutions. As a Commissioning Engineer Intern, you will play a crucial role in empowering our clients to achieve energy-efficient building solutions and create a better, greener future.Key...


  • San Francisco, California, United States AtkinsRéalis Full time

    About the RoleWe are seeking a highly skilled Senior Project Engineer to join our team in San Francisco, CA. As a key member of our project management team, you will be responsible for planning, organizing, and coordinating the observation and documentation of construction projects.Key ResponsibilitiesPlan and organize the observation and documentation of...


  • San Francisco, California, United States J.S. Held LLC Full time

    Job SummaryWe are seeking a mid to senior-level Forensic Civil Structural Engineer to enhance our team of professionals supporting clients throughout the US. Our engineers perform forensic investigations, provide consultation, prepare and deliver technical papers, and serve as expert witnesses. Assignments will require regular local and regional travel to...


  • San Francisco, California, United States Saxon Global Full time

    Lead DevOps/Site Reliability EngineerWe are seeking a seasoned DevOps professional to lead our site reliability engineering efforts. As a key member of our team, you will be responsible for designing and implementing scalable, reliable, and efficient cloud infrastructure solutions.Key Responsibilities:Develop and maintain a deep understanding of our cloud...