Principal Site Reliability Engineer, Sovereign Cloud Operations

3 weeks ago


Reston, Virginia, United States Oracle Full time
Job Description

We are seeking a highly skilled Principal Site Reliability Engineer to join our Sovereign Cloud Operations team.

This role is responsible for ensuring the reliability and availability of our sovereign cloud production systems and driving automation and tooling enhancements for our operators.

The ideal candidate will work closely with the Oracle Cloud Infrastructure service team and our Operability Improvement organization to implement and maintain a high level of system hygiene and identify and address potential issues that impact the positive experience of our cloud customers.

We are looking for a candidate who is passionate about operations and willing to take ownership of our systems' performance.

The candidate should be comfortable working in a fast-paced environment and able to quickly identify and address issues.

Strong collaboration and communication skills are essential to develop solid partnerships across the business to foster outcomes for our customers.

Experience with cloud infrastructure architecture and interaction is a must to be successful in this role.

Primary Responsibilities:
  • Serve as a technical leader for OCI cloud services across the operations teams servicing sovereign realms.
  • Deep dive into complex customer issues and assist customer support, sovereign cloud operators, and customer account managers in resolving them.
  • Decompose operational issues impacting sovereign cloud operators' efficiency and help facilitate solutions.
  • Collaborate with the Operability Improvement organization to drive tooling and automation to improve change safety and reduce operator toil.
  • Provide rapid ad hoc solutions (e.g., scripting/coding) to provide near-term operational improvements as a stop-gap measure while long-term solutions are developed.
Qualifications:
  • U.S. Citizenship Required.
  • Bachelor's degree or higher in Computer Science or a related field.
  • 10+ years of SRE/DevOps experience (operations-focused).
  • Experience operating services in one of the significant Clouds such as AWS, OCI, Azure, etc.
  • Knowledge/Experience working with government clients to deliver IT services.
  • Strong knowledge of cloud infrastructure, distributed systems, and network architecture.
  • Proven track record of supporting large, complex, scalable systems/applications in an agile environment.
  • Change management, continuous integration, and deployment best practices.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Proficiency in scripting or programming languages like Python, Go, or Bash.
  • Experience with automation and configuration management tools like Terraform, Ansible, or Chef.
  • Familiarity with monitoring and alerting tools such as Prometheus or Grafana.


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionWe are seeking a highly skilled Principal Site Reliability Engineer to join our Sovereign Cloud Operations team. As a key member of our team, you will be responsible for ensuring the reliability and availability of our sovereign cloud production systems and driving automation and tooling enhancements for our operators.This role requires a...


  • Reston, Virginia, United States Microsoft Corporation Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Microsoft Corporation. As a key member of our Office 365 government cloud team, you will be responsible for designing, developing, and delivering software engineering solutions to serve and protect our O365 government clouds.Key ResponsibilitiesOwn deployment,...


  • Reston, Virginia, United States Microsoft Full time

    Transforming the Future of Cloud ServicesAt Microsoft, we're committed to being cloud-first, and we're looking for talented Site Reliability Engineers to help us shape the future of cloud services. As a Site Reliability Engineer, you'll play a critical role in designing and implementing scenarios for our customers, ensuring the reliability and scalability of...


  • Reston, Virginia, United States Microsoft Corporation Full time

    Job Title: Site Reliability EngineerAt Microsoft, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud services.Key Responsibilities:Design, develop, and deliver software engineering solutions to serve and...


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Senior Principal Site Reliability Engineer to join our team. As a key member of our Cloud Infrastructure Operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.You will work closely with our development teams to design and deliver critical...


  • Reston, Virginia, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Senior Principal Site Reliability Developer to join our team. As a key member of our Advanced Operations team, you will be responsible for ensuring the reliability and performance of our cloud infrastructure.ResponsibilitiesDesign and deliver critical infrastructure components with a focus on security,...


  • Reston, Virginia, United States ScienceLogic Full time

    Job Title: Principal Site Reliability EngineerAt ScienceLogic, we're leading the charge towards truly autonomous enterprises. Our cutting-edge platform harnesses the power of automation and generative AI to revolutionize how businesses manage and optimize their IT operations.About the RoleWe're seeking a Principal Site Reliability Engineer to join our team....


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Cloud Security Engineer to join our team at ScienceLogic. As a key member of our Site Reliability Engineering team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key Responsibilities* Enhance our SaaS...

  • Cloud Engineer

    2 weeks ago


    Reston, Virginia, United States Microsoft Full time

    About the RoleWe are seeking a highly skilled Cloud Engineer to join our team at Microsoft. As a Cloud Engineer, you will be responsible for designing, developing, and delivering software engineering solutions to serve and protect O365 government clouds.Key ResponsibilitiesOwn deployment, availability, reliability, performance, and customer escalation...


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at ScienceLogic. As a key member of our SRE team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key ResponsibilitiesEnhance our SaaS infrastructure security...


  • Reston, Virginia, United States Intelligent Waves LLC Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Intelligent Waves LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and efficiency of our cloud-based systems.Key Responsibilities:Design and implement resilient infrastructure to support cloud migrationDevelop...


  • Reston, Virginia, United States Blue Sky Innovative Solutions Full time

    Job Title: Site Reliability EngineerBlue Sky Innovative Solutions is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. You will be responsible for designing, implementing, and maintaining our cloud...


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at ScienceLogic. As a key member of our Site Reliability Engineering team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key ResponsibilitiesEnhance our SaaS...


  • Reston, Virginia, United States ScienceLogic Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at ScienceLogic. As a key member of our Site Reliability Engineering team, you will be responsible for designing, deploying, and maintaining our cloud infrastructure used for running our revenue-generating SaaS product line.Key ResponsibilitiesEnhance our SaaS...


  • Reston, Virginia, United States WideNet Consulting Group Full time

    Job Title: Site Reliability EngineerAbout the RoleWe are seeking an experienced Site Reliability Engineer to join our team at WideNet Consulting Group. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities· Monitor and analyze system performance...


  • Reston, Virginia, United States Intelligent Waves Full time

    Job DescriptionIntelligent Waves is seeking a highly skilled Site Reliability Engineer to join our team in Reston, VA. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud-based systemsCollaborate with...


  • Reston, Virginia, United States Insight Global Full time

    Job Title: Site Reliability EngineerNVIDIA is seeking a seasoned Site Reliability Engineer to join its Infrastructure, Planning and Processes organization. This is an ON PREM data center role that requires experience with Linux operating systems and an understanding of Kubernetes.Key Responsibilities:Design and implement reliable systems and processes to...


  • Reston, Virginia, United States Microsoft Corporation Full time

    Transforming the Future of Cloud ServicesAt Microsoft, we're committed to being cloud-first, and we're looking for talented Site Reliability Engineers to help design and implement scenarios for our customers. As a Site Reliability Engineer, you'll play a critical role in shaping the future of cloud services and ensuring the reliability and scalability of our...


  • Reston, Virginia, United States Intelligent Waves Full time

    Job Overview:Intelligent Waves is seeking a highly skilled Site Reliability Engineer-DevOps Cloud professional to join our team in Reston, VA. As a key member of our team, you will work with us to automate tasks and provide innovative solutions toward cloud migration.Responsibilities:As a Site Reliability Engineer, you will be responsible for building a...


  • Reston, Virginia, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, maintaining, and scaling production services and server farms within our FedRAMP SASE product portfolio.You will work closely with our development teams to ensure...