Principal Site Reliability Engineer, Sovereign Cloud Operations

4 weeks ago


Reston, Virginia, United States Oracle Full time
Job Description

We are seeking a highly skilled Principal Site Reliability Engineer to join our Sovereign Cloud Operations team. As a key member of our team, you will be responsible for ensuring the reliability and availability of our sovereign cloud production systems and driving automation and tooling enhancements for our operators.

This role requires a strong technical leader who can work closely with the Oracle Cloud Infrastructure service team and our Operability Improvement organization to implement and maintain a high level of system hygiene and identify and address potential issues that impact the positive experience of our cloud customers.

The ideal candidate will be passionate about operations, willing to take ownership of our systems' performance, and comfortable working in a fast-paced environment. You must be a strong collaborator, developing solid partnerships across the business to foster outcomes for our customers.

Primary Responsibilities:

  • Serve as a technical leader for OCI cloud services across the operations teams servicing sovereign realms.
  • Deep dive into complex customer issues and assist customer support, sovereign cloud operators, and customer account managers in resolving them.
  • Decompose operational issues impacting sovereign cloud operators' efficiency and help facilitate solutions.
  • Collaborate with the Operability Improvement organization to drive tooling and automation to improve change safety and reduce operator toil.
  • Provide rapid ad hoc solutions (e.g., scripting/coding) to provide near-term operational improvements as a stop-gap measure while long-term solutions are developed.

Qualifications:

  • U.S. Citizenship Required.
  • Bachelor's degree or higher in Computer Science or a related field.
  • 10+ years of SRE/DevOps experience (operations-focused).
  • Experience operating services in one of the significant Clouds such as AWS, OCI, Azure, etc.
  • Knowledge/Experience working with government clients to deliver IT services.
  • Strong knowledge of cloud infrastructure, distributed systems, and network architecture.
  • Proven track record of supporting large, complex, scalable systems/applications in an agile environment.
  • Change management, continuous integration, and deployment best practices.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Proficiency in scripting or programming languages like Python, Go, or Bash.
  • Experience with automation and configuration management tools like Terraform, Ansible, or Chef.
  • Familiarity with monitoring and alerting tools such as Prometheus or Grafana.


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionWe are seeking a highly skilled Principal Site Reliability Engineer to join our Sovereign Cloud Operations team.This role is responsible for ensuring the reliability and availability of our sovereign cloud production systems and driving automation and tooling enhancements for our operators.The ideal candidate will work closely with the Oracle...


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Senior Principal Site Reliability Engineer to join our team. As a key member of our Cloud Infrastructure Operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.You will work closely with our development teams to design and deliver critical...


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Cloud Security Engineer to join our team at ScienceLogic. As a key member of our Site Reliability Engineering team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key Responsibilities* Enhance our SaaS...

  • Cloud Engineer

    3 weeks ago


    Reston, Virginia, United States Microsoft Full time

    About the RoleWe are seeking a highly skilled Cloud Engineer to join our team at Microsoft. As a Cloud Engineer, you will be responsible for designing, developing, and delivering software engineering solutions to serve and protect O365 government clouds.Key ResponsibilitiesOwn deployment, availability, reliability, performance, and customer escalation...


  • Reston, Virginia, United States Intelligent Waves LLC Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Intelligent Waves LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and efficiency of our cloud-based systems.Key Responsibilities:Design and implement resilient infrastructure to support cloud migrationDevelop...


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at ScienceLogic. As a key member of our Site Reliability Engineering team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key ResponsibilitiesEnhance our SaaS...


  • Reston, Virginia, United States WideNet Consulting Group Full time

    Job Title: Site Reliability EngineerAbout the RoleWe are seeking an experienced Site Reliability Engineer to join our team at WideNet Consulting Group. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities· Monitor and analyze system performance...


  • Reston, Virginia, United States Intelligent Waves Full time

    Job DescriptionIntelligent Waves is seeking a highly skilled Site Reliability Engineer to join our team in Reston, VA. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud-based systemsCollaborate with...


  • Reston, Virginia, United States Intelligent Waves Full time

    Job Overview:Intelligent Waves is seeking a highly skilled Site Reliability Engineer-DevOps Cloud professional to join our team in Reston, VA. As a key member of our team, you will work with us to automate tasks and provide innovative solutions toward cloud migration.Responsibilities:As a Site Reliability Engineer, you will be responsible for building a...


  • Reston, Virginia, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product portfolio.You will work closely with our development teams to ensure...


  • Reston, Virginia, United States Blue Sky Innovative Solutions Full time

    Job Title: Site Reliability EngineerBlue Sky Innovative Solutions is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. Your expertise in Red Hat Linux Automation and DevOps practices will be essential in...


  • Reston, Virginia, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining scalable and reliable infrastructure for our FedRAMP SASE product portfolio.Key Responsibilities:Design and implement scalable and reliable infrastructure...


  • Reston, Virginia, United States Big Cloud Full time

    Unlock the Power of AI with Big CloudAre you a seasoned Principal Firmware Engineer with a passion for AI and a track record of delivering high-quality firmware solutions?Big Cloud, a leading global semi-conductor company, is seeking an experienced Senior Principal Firmware Engineer to join our team. As a key member of our AI processor development team, you...


  • Reston, Virginia, United States Microsoft Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineering Manager to join our team at Microsoft. As a key member of our Site Reliability Engineering team, you will be responsible for providing technical leadership to a team of highly passionate and skilled engineers.Key Responsibilities:Recruit, on-board, and grow a team of Software...


  • Reston, Virginia, United States Microsoft Corporation Full time

    Job Description:Microsoft Corporation is seeking a highly skilled Senior Cloud Systems Engineer to join our Azure Silver and Sovereign Team as part of the Cloud Transfer Service (CTS) team. The Azure Cloud Transfer Service enables secure access and transfer between enclaves and supports other transfer and access types enabling a wide set of capabilities...


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....


  • Reston, Virginia, United States Oracle Full time

    About the RoleWe are seeking a highly skilled Site Reliability Developer 4 to join our team at Oracle Cloud Infrastructure (OCI). As a key member of our Cloud Platform organization, you will be responsible for designing and delivering mission-critical infrastructure services that are highly available, scalable, and secure.Key Responsibilities:Work with the...


  • Reston, Virginia, United States Intelligent Waves Full time

    Job Summary:Intelligent Waves is seeking a highly skilled Site Reliability Engineer to join our team in Reston, VA. As a key member of our cloud migration team, you will be responsible for designing and implementing resilient and efficient cloud-native solutions using Kubernetes, Ansible, and AWS. Key Responsibilities: Design and implement cloud-native...


  • Reston, Virginia, United States Eliassen Group Full time

    On-site requirements in Salt Lake City, UTWe are seeking a Senior Site Reliability and Support Specialist to join our support organization. In this role, the resource will serve as a production support and SRE specialist for supporting Wealth and Brokerage Business Unit Infrastructure and Applications.The team comes with a diverse technological background...


  • Reston, Virginia, United States Insight Investment Full time

    About This RoleWe are seeking a highly skilled Sovereign Investment Analyst to join our Emerging Markets team in New York. As a key member of the team, you will be responsible for monitoring markets across Latin America and providing independent sovereign analysis.Main ResponsibilitiesConduct daily analysis of economic, financial, and political developments...