Current jobs related to Site Reliability Engineer - Redmond, Washington - Microsoft


  • Redmond, Washington, United States Microsoft Full time $100,600 - $199,000

    Overview Microsoft has been a leading company in computing for decades.  We are a global service, relied on by governments, utilities, schools, and co-operatives to deliver the things they need to work, every day and to make this work for our customers, we need continual effort to make that delivery reliable.  This is the core of what our Site Reliability...


  • Redmond, Washington, United States Jobs via Dice Full time

    OverviewLeverages end-to-end technical expertise in large scale distributed systems' infrastructure, code, inter- and intra-service dependencies, and operations to proactively and continuously improve the reliability, performance, efficiency, latency, and scalability of services and/or products operating at scale. Partners with software engineering product...


  • Redmond, Washington, United States SpaceX Full time $160,000 - $220,000

    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.SR. SITE RELIABILITY ENGINEER (STARSHIELD) - TOP SECRET CLEARANCEStarshield...


  • Redmond, Washington, United States Microsoft Full time $100,600 - $199,000

    OverviewThe Cloud & AI organization accelerates Microsoft's mission and bold ambitions to ensure that our company and industry is securing digital technology platforms, devices, and clouds in our customers' heterogeneous environments, as well as ensuring the security of our own internal estate. Our culture is centered on embracing a growth mindset, a theme...

  • Reliability Engineer

    5 hours ago


    Redmond, Washington, United States Meta Full time $144,000 - $204,000

    As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. You will collaborate with a large breadth of cross-functional disciplines to understand emerging designs and technologies. You will be responsible for identifying risks associated with these various...

  • Reliability Engineer

    5 hours ago


    Redmond, Washington, United States Meta Full time $118,000 - $170,000

    As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. You will collaborate with a large breadth of cross-functional disciplines to understand emerging designs and technologies. You will be responsible for identifying risks associated with these various...

  • Reliability Engineer

    2 hours ago


    Redmond, Washington, United States Meta Full time

    As a Reliability Engineer in Meta Reality Labs, you will take a critical role in bringing reliable new AI-native augmented/virtual reality and wearable products. You will collaborate with a large breadth of cross-functional disciplines to understand emerging designs and technologies. You will be responsible for identifying risks associated with these various...


  • Redmond, Washington, United States Jobs via Dice Full time

    Dice is the leading career destination for tech experts at every stage of their careers. Our client, SpaceX, is seeking the following. Apply via Dice todaySpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to...


  • Redmond, Washington, United States SpaceX Full time

    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.SATELLITE ON-ORBIT HARDWARE RELIABILITY ENGINEER (STARLINK)SpaceX is leveraging...


  • Redmond, Washington, United States Amazon Full time

    Amazon is a leader in developing first of its kind hardware, such as Kindle, Echo and FireTV. Amazon reliability team aims to develop reliable and robust products that delight our customers. In this role, as a Hardware Reliability Engineer, you will be responsible for the reliability engineering of our new and emerging category of devices – Kuiper Customer...

Site Reliability Engineer

3 hours ago


Redmond, Washington, United States Microsoft Full time $84,200 - $165,200
Overview
The IDEAS organization's mission is to unlock the power of data to deliver actionable insights and personalized experiences at scale, thereby driving usage, engagement, and revenue across Microsoft 365, Azure, Windows, and more. As part of the team. you'll collaborate with teams company-wide, from product engineers to data scientists, using cutting-edge technology (big data platforms, cloud analytics, AI Copilots) to solve complex problems.  Specifically, as a Site Reliability Engineer, you will help drive automation, incident response, and data-driven improvements to ensure our services meet stringent reliability and performance goals. You'll collaborate across engineering teams, contribute to live site operations, and help shape the future of our systems at scale while ensuring that they are secure and compliant.   Come build the data future at Microsoft. Joining the IDEAS organization means joining a team that is transforming how Microsoft harnesses data, and in the process, empowering customers and partners with smarter, AI-infused experiences. It's not just a job – it's a chance to lead a data revolution from within. If you're excited by the idea of turning an enterprise's data into insights, intelligence, and impact, consider applying for the Microsoft IDEAS organization.   Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.  

Responsibilities
  • Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within SLA timelines, and driving post-incident learnings.
  • Develop, enhance, and maintain automation for deployment, operations, and incident mitigation to improve service reliability and reduce manual intervention.
  • Instrument services for observability, collect and analyze telemetry and health metrics, and use data-driven insights to guide reliability and performance improvements.
  • Collaborate closely with engineering partners and stakeholders to align goals, share operational insights, and deliver user-centric solutions.
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements.
  • Ensure compliance with security, privacy, and accessibility standards throughout service onboarding and operations.
  • Stay current with industry trends and internal tools to continuously improve reliability, performance, and observability at scale.


Qualifications

Required Qualifications:

  • Associate's Degree in Computer Science, Information Technology, or related field Bachelor's Degree in Computer Science, Information Technology, or related field
    • OR equivalent experience. 

Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
    • OR equivalent experience. 
  • 1+ year(s) experience in automating root cause analysis and mitigation of incidents.
  • 1+ year(s) experience with automation, live site operations, and incident response in large-scale cloud or distributed systems.
  • Proven experience coding in at least one programming or scripting language including, but not limited to, C#, Java, Python, or PowerShell
  • Experience using analytical and problem-solving skills, telemetry, and data to drive operational decisions.
  • Proven experience using communication and collaboration skills to work effectively across teams.


Site Reliability Engineering IC2 - The typical base pay range for this role across the U.S. is USD $84,200 - $165,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $109,000 - $180,400 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.


Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.