Principal Site Reliability Engineer

1 month ago


Seattle, United States Oracle Full time

OCI Incident Response is the first line of defense for maintaining the high availability of Oracle’s cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by using our operational experience, knowledge of standard processes, and ability to develop tools to automate incident management.

Cloud Engineering Infrastructure Development

We are looking for a Site Reliability Engineer to join OCI’s Major Incident Management team.

This role is part of a globally distributed team responsible for detecting, triaging, and mitigating OCI service-impacting events as quickly as possible. You will be a part of one of these regional teams and be responsible for minimizing the downtime of OCI services. You will achieve this through delivering excellent major incident management and by architecting systems with high scalability, performance, and security that prevent incidents from occurring. You will partner with other development teams to continuously improve the incident management process. You will provide technical leadership to a team of engineers and be responsible for participating in architecture and design reviews with senior technical leaders and architects in the company


Responsibilities displayed in the job posting


Oracle’s Cloud is innovative and constantly evolving. When it experiences issues, your team will respond within minutes to ensure customer impact is mitigated. This experience will expose you to the inner workings of OCI’s systems and organizations. You will interact with and influence leaders from across the Oracle business and will drive broad cross-organization programs meant to iteratively improve OCI-wide service availability. We are an agile team with significant impact. If you want to be a part of a fast-moving team breaking new ground, we would like to speak with you

Basic Qualifications:

  • Bachelor’s degree or higher in Computer Science or a related field.
  • 5+ years of software development experience.
  • Extensive experience with major incident management in a cloud-based environment
  • Experience having worked in at least one modern object-oriented programming language such as Java or C++.
  • Proven track record of shipping large complex scalable systems/applications in an agile environment.
  • Experience with professional software engineering standard processes such as Agile project management, coding standards, code reviews, source control management, build processes, testing, and operations.

Preferred Qualifications:

  • Strong analytic and problem-solving skills.
  • Strong leadership, project planning, communication, and execution skills
  • Ability to handle multiple competing priorities in a fast-paced environment.
  • Ability to communicate clearly with technical and non-technical collaborators at all levels.
  • Confidence to drive and manage large conference calls.
  • Experience with distributed service-oriented architectures



  • Seattle, United States Oracle Full time

    We are facing several engineering challenges in critical foundational data-plane services that powers the next gen OCI cloud. This is your opportunity to build innovative solutions from the ground up. These are exciting times and our team is still yo Liability, Reliability, Reliability, Developer, Principal, DevOps Engineer, Technology


  • Seattle, United States Oracle Full time

    We are seeking experienced cloud technologists, interested in solving hard problems on tight schedules, to join our Major Incident Management team. OCI Incident Response is the first line of defense for maintaining the high availability of Oracles c Reliability Engineer, Architect, Liability, Engineer, Principal, Reliability, Technology


  • Seattle, United States Oracle Full time

    OCI Incident Response is the first line of defense for maintaining the high availability of Oracle’s cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by using our operational experience, knowledge of standard processes,...


  • Seattle, United States Oracle Full time

    OCI Incident Response is the first line of defense for maintaining the high availability of Oracle’s cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by using our operational experience, knowledge of standard processes,...


  • Seattle, United States Prodigy Resources Full time

    About Us: Prodigy is seeking an SRE to join our client's organization which is leading the charge in fintech innovation, providing state-of-the-art solutions that drive financial success and empower our clients. As they embark on an exciting Greenfield project, they're seeking an experienced Site Reliability Engineer to join their team. This role is critical...


  • Seattle, United States Prodigy Resources Full time

    About Us: Prodigy is seeking an SRE to join our clients organization which is leading the charge in fintech innovation, providing state-of-the-art solutions that drive financial success and empower our clients. As they embark on an exciting Greenfield project, theyre seeking an experienced Site Reliability Engineer to join their team. This role is critical...


  • Seattle, United States Prodigy Resources Full time

    About Us:Prodigy is seeking an SRE to join our client's organization which is leading the charge in fintech innovation, providing state-of-the-art solutions that drive financial success and empower our clients. As they embark on an exciting Greenfield project, they're seeking an experienced Site Reliability Engineer to join their team. This role is critical...


  • Seattle, United States Prodigy Resources Full time

    About Us:Prodigy is seeking an SRE to join our client's organization which is leading the charge in fintech innovation, providing state-of-the-art solutions that drive financial success and empower our clients. As they embark on an exciting Greenfield project, they're seeking an experienced Site Reliability Engineer to join their team. This role is critical...


  • Seattle, United States Capgemini Full time

    **Site Reliability Engineer** **FTE with benefits** Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. + Experiencedwith **Python and Shell Scripting.** + **Shouldhave extensive experience with Azure or AWS (Azure preferred)** + **Experiencewith Monitoring and Observability - Datadog** + **Experiencewith Infrastructure as...


  • Seattle, United States Moloco Full time

    About the Role Moloco is a machine learning company that operates at massive scale (we ingest 10 petabytes of training data per day), and our models are blazingly fast (return predictions in 10 milliseconds or less); and a profitable unicorn (we are valued at $2 billion and have been profitable for the last 13+ quarters). We are looking for an exceptional...


  • Seattle, Washington, United States Flexe Full time

    Flexe solves the hardest omnichannel logistics problems for the world's largest retailers and brands. Integrating technology, open logistics networks, and elastic economic models allows Flexe customers to move fast, at scale, and with precision. Founded in 2013 and headquartered in Seattle, Flexe brings deep logistics expertise and enterprise-grade...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleTikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure.Key ResponsibilitiesDevelop and maintain automation procedures to maximize system efficiency...


  • Seattle, United States Apple Full time

    To view your favorites, sign in with your Apple ID. Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join Apple’s Cloud Service Infrastructure team as a site reliability...


  • Seattle, Washington, United States Moderna, Inc. Full time

    The RoleModerna is expanding our footprint to Seattle to further our mission of delivering the greatest possible impact to people through mRNA medicines Our new technology hub in Seattle will focus on software product development for our Commercial, Data & Machine Learning, Cloud Infrastructure, Security, and Engineering Excellence (dev tools) products and...


  • Seattle, United States Moderna, Inc. Full time

    The RoleModerna is expanding our footprint to Seattle to further our mission of delivering the greatest possible impact to people through mRNA medicines! Our new technology hub in Seattle will focus on software product development for our Commercial, Data & Machine Learning, Cloud Infrastructure, Security, and Engineering Excellence (dev tools) products and...


  • Seattle, Washington, United States Apple Full time

    Overview:Position Number: The Apple Services Engineering team exemplifies Apple's dedication to merging creativity with technology. We invite you to join the Apple Services Engineering Cloud Service Infrastructure team as a Site Reliability Engineer, where you will play a pivotal role in supporting and expanding cloud services for millions of Apple users....


  • Seattle, Washington, United States Apple Full time

    About the RoleWe are seeking a highly skilled and motivated Security Site Reliability Engineer to join our dynamic and growing team at Apple. As a Security SRE, you will play a critical role in ensuring the security, reliability, and scalability of our systems and infrastructure.Key ResponsibilitiesDesign, implement, and maintain security measures, incident...


  • Seattle, United States West500 Partners Full time

    Our client is a fast-growing downtown Seattle startup developing AI automation for professional services, including legal technology and medical records. They have a great product market fit and rapidly increasing revenues and are currently in need of a local Software Engineering Lead with CI/CD expertise, an AWS background, and a keen interest in innovative...


  • Seattle, United States West500 Partners Full time

    Our client is a fast-growing downtown Seattle startup developing AI automation for professional services, including legal technology and medical records. They have a great product market fit and rapidly increasing revenues and are currently in need of a local Software Engineering Lead with CI/CD expertise, an AWS background, and a keen interest in innovative...


  • Seattle, Washington, United States SingleStore Full time

    Position OverviewSingleStore is on the lookout for a Lead Site Reliability Engineer to spearhead our Kubernetes product initiatives related to our managed service offerings. You will play a pivotal role in shaping the architecture, realizing the collective vision, and maintaining your strategic approach to product development.This position is crucial in...