Service Reliability Engineer

4 weeks ago


Seattle, Washington, United States Apple Full time

The Service Reliability Engineer role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on, technical work. This SRE will configure, tune, and fix multi-tiered systems to achieve optimal application performance, stability and availability. We manage jobs as well as applications on bare-metal and cloud computing platforms to deliver data processing for many of Apple's global products. Our teams work with exabytes of data, petabytes of memory, and tens of thousands of jobs to enable predictable and performant data analytics enabling features in Apple Music, TV+, App Store and other world-class products. If you love designing, running systems and infrastructure that will impact millions of users, then this is the place for you.

Key Responsibilities:

  • Configure, tune, and fix multi-tiered systems to achieve optimal application performance, stability, and availability.
  • Manage jobs as well as applications on bare-metal and cloud computing platforms.
  • Deliver data processing for many of Apple's global products.
  • Work with exabytes of data, petabytes of memory, and tens of thousands of jobs.
  • Enable predictable and performant data analytics enabling features in Apple Music, TV+, App Store, and other world-class products.

Requirements:

  • BS degree in computer science or equivalent field with 5+ years or MS degree with 3+ years experience, or equivalent.
  • At least 5 years in a Service Reliability Engineering (SRE), DevOps, or infrastructure-focused role.
  • 5+ years of running services in a large-scale *nix environment.
  • Understanding of SRE principles and goals along with prior on-call experience.
  • The ability to design, author, and release code in any language (Go, Python, Ruby, or Java would be a plus).
  • Deep understanding and experience in one or more of the following - Hadoop, Spark, Flink, Kubernetes, AWS.

Preferred Qualifications:

  • Fast learner with excellent analytical problem-solving and interpersonal skills.
  • Experience working on supporting Java applications.
  • Experience using monitoring and logging solutions like Splunk, Grafana, etc.
  • Familiarity with DNS, HTTP, message queues, queueing theory, RPC frameworks, datastore.
  • Experience working with geographically distributed teams and implementing high-level projects and migrations.
  • Strong communication skills and ability to deliver results on time with high quality.


  • Seattle, Washington, United States PMI Full time

    About the RoleStanley, a HAVI company, is experiencing rapid growth and is seeking an experienced Engineering Manager for Service Reliability to join our team. As a key member of our software development team, you will play a crucial role in shaping and optimizing our software development and deployment processes.Key ResponsibilitiesDevelop and implement...


  • Seattle, Washington, United States Blue Origin Full time

    Reliability Engineer Opportunity at Blue OriginWe are seeking a highly skilled Reliability Engineer to join our team at Blue Origin. As a key member of our Engines business unit, you will be responsible for ensuring the reliability and safety of our engines and propulsion systems.Your primary focus will be on identifying factors that drive engine reliability...

  • Reliability Engineer

    4 weeks ago


    Seattle, Washington, United States Blue Origin Full time

    Reliability Engineer - Engines & AvionicsAt Blue Origin, we're pushing the boundaries of space exploration and development. As a Reliability Engineer - Engines & Avionics, you'll play a critical role in ensuring the reliability and safety of our engines and avionics systems.Key Responsibilities:Identify and mitigate reliability risks in engine and avionics...


  • Seattle, Washington, United States Oracle Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with our development teams to design, implement, and operate large-scale distributed...


  • Seattle, Washington, United States Oracle Full time

    About the Role:Oracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, develop, and deploy software to improve the availability, scalability, and efficiency of...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleThis is a Site Reliability Engineer position, focusing on the data pipeline reliability for the Video Platform team in USDS.Data SREs monitor data and keep production batch and real-time processing jobs up and running with the highest level of availability, ensuring our users have the freshest, complete, and correct data...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform Team at TikTok. As a key member of our team, you will be responsible for designing, building, and operating large-scale, massively distributed services and infrastructures.Key ResponsibilitiesDesign and implement reliable, scalable, and robust big data systems...


  • Seattle, Washington, United States HireIO Inc Full time

    Job SummaryAt HireIO Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and reliability of our Ads systems. This includes designing, analyzing, and troubleshooting large-scale distributed systems, as well as developing tools and...


  • Seattle, Washington, United States Sogeti Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States Tik Tok Full time

    Job SummaryAt TikTok, we're seeking a skilled Site Reliability Engineer to join our Edge Services team. As a Site Reliability Engineer, you will be responsible for architecting and implementing solutions that enable both internal and external customers to harness the power of TikTok's content delivery network. You will contribute to data pipelines, tools,...


  • Seattle, Washington, United States Diverse Lynx Full time

    Job Title: Sr. Site Reliability EngineerLocation: RemoteDuration: 12+ Months contractJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications and services.You will work...


  • Seattle, Washington, United States Hireio, Inc. Full time

    Job OverviewHireio, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our Ads systems team, you will be responsible for ensuring the reliability, scalability, and operability of our services.Key ResponsibilitiesDesign and implement scalable and reliable systems architectureCollaborate with cross-functional teams...

  • Reliability Engineer

    4 weeks ago


    Seattle, Washington, United States Amazon Full time

    About the RoleAs a Reliability Engineer - Hardware Expert at Amazon, you will be responsible for ensuring the reliability of our cloud infrastructure. This involves working closely with cross-functional teams to design, develop, and test hardware components that meet the highest standards of quality and reliability.Key Responsibilities* Collaborate with...


  • Seattle, Washington, United States DAT Freight Solutions Full time

    About DAT Freight SolutionsDAT Freight Solutions is a leading provider of transportation management software and services. We are seeking a highly skilled Site Reliability Engineering Lead to join our team.The successful candidate will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. They will work...


  • Seattle, Washington, United States Apple Full time

    Role SummaryAt Apple, we're looking for talented Site Reliability Engineers to join our Apple Services Engineering team. As a Site Reliability Engineer, you'll play a critical role in ensuring the scalability, availability, and performance of our services, including iCloud, iTunes, Siri, and Maps. You'll work closely with our development teams to design,...


  • Seattle, Washington, United States Apple Full time

    Role OverviewAs a Site Reliability Engineering Manager at Apple, you will be responsible for leading a team that provides the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.Key ResponsibilitiesEstablish SRE practices for a private cloud service to accelerate...


  • Seattle, Washington, United States Apple Full time

    Senior Site Reliability EngineerImagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish.This is a hands-on role to establish SRE practices for a private cloud service to accelerate our...


  • Seattle, Washington, United States Apple Full time

    Job DescriptionThe Apple Services Engineering team is seeking a highly skilled Site Reliability Engineering Leader to lead our Security SRE team. As a key member of our Infrastructure organization, you will be responsible for overseeing critical security infrastructure services and improving their reliability, observability, and manageability.You will...


  • Seattle, Washington, United States F5 Networks Full time

    Job SummaryF5 Networks is seeking a highly skilled Site Reliability Engineer III to join our team. As a Site Reliability Engineer III, you will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms.Key ResponsibilitiesApply modern engineering principles and practices to operational functions and...


  • Seattle, Washington, United States Apple Full time

    Job SummaryThe Apple Services Engineering team is seeking a highly skilled Site Reliability Engineering Leader to lead our security-focused SRE team. As a Site Reliability Engineering Leader, you will be responsible for designing, engineering, and running systems and infrastructure that ensure the highest quality Apple Services experience for our customers....