Current jobs related to Senior Reliability Engineer - Austin, Texas - Amazon


  • Austin, Texas, United States Publishing Full time

    Job DescriptionAt Publishing, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining scalable and reliable cloud infrastructure to support our growing business.ResponsibilitiesDesign and implement scalable cloud...


  • Austin, Texas, United States Expedia Group Full time

    Senior Site Reliability EngineerWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and operations teams to design,...


  • Austin, Texas, United States The Charles Schwab Corporation Full time

    About the RoleAt The Charles Schwab Corporation, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you'll be responsible for designing, implementing, and maintaining scalable, highly available, and secure cloud-based systems.Key ResponsibilitiesLead the execution of site...


  • Austin, Texas, United States Expedia Group Full time

    Senior Software Development Engineer - Site ReliabilityWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and...


  • Austin, Texas, United States Weedmaps Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Weedmaps. As a key member of our engineering team, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based services.Key ResponsibilitiesLeverage your engineering expertise to build, monitor, and improve our...


  • Austin, Texas, United States The Electric Reliability Council of Texas (ERCOT) Full time

    Job SummaryWe are seeking a highly skilled Senior Power System Engineer to join our team at The Electric Reliability Council of Texas (ERCOT). As a key member of our Regional Planning group, you will be responsible for providing engineering analysis and technical support to ensure the reliable operation of the electric power grid.Key Responsibilities:Provide...


  • Austin, Texas, United States Expedia Group Full time

    Job SummaryWe are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team at Expedia Group. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and operations teams to design, build, and maintain...


  • Austin, Texas, United States Procore Technologies Full time

    Job Title: Senior Database Reliability EngineerWe are seeking a highly skilled Senior Database Reliability Engineer to join our Product & Technology Team at Procore Technologies. As a key member of our team, you will play a crucial role in designing and implementing our next-generation data platform for the construction industry.Key Responsibilities:Design...


  • Austin, Texas, United States Publishing Inc Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Publishing.com. As a key member of our IT team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure and operational workflows.ResponsibilitiesDesign and implement scalable solutions to address our growing infrastructure...


  • Austin, Texas, United States AutoRABIT Holding Inc. Full time

    About the RoleAutoRABIT Holding Inc. is seeking a highly skilled Senior Site Reliability/DevOps Engineer to join our team. As a key member of our cloud services team, you will be responsible for developing, scaling, and operating our cloud infrastructure.Key Responsibilities:Design, implement, and maintain scalable, resilient, and secure infrastructure using...


  • Austin, Texas, United States Publishing Inc Full time

    About the RoleAt Publishing Inc, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure high availability, scalability, and performance.ResponsibilitiesDesign and implement scalable...


  • Austin, Texas, United States Tesla Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our Energy team at Tesla. As a key member of our team, you will be responsible for designing, building, and operating the infrastructure that powers our Energy IoT applications.Key ResponsibilitiesInvestigate and resolve complex technical issues related to the availability,...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job DescriptionAt the Electric Reliability Council of Texas, we strive to create a dynamic work environment that fosters innovation and collaboration. Our team is dedicated to building a reliable and efficient power grid, and we're seeking a skilled Reliability and Compliance Engineer to join our efforts.As a key member of our team, you will work closely...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a pioneering company that leverages cutting-edge machine learning to digitize, index, and automate the yard. Our platform empowers warehouse operators to optimize their usage of trucks, trailers, chassis, containers, and personnel.We address industry-wide pain points, including compliance, manual processes, equipment location,...


  • Austin, Texas, United States Terminal Industries Full time

    About UsTerminal Industries is a leading provider of software solutions for the logistics industry. Our platform digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize truck, trailer, chassis, container, and personnel usage.Our PlatformOur platform provides warehouse operators with the intelligence needed to...


  • Austin, Texas, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. As a Site Reliability Engineer, you will play a vital role in designing, building, and maintaining our core infrastructure.This infrastructure enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple...


  • Austin, Texas, United States Teacher Retirement System of Texas Full time

    Job Title:Azure Cloud Engineer/Platform Reliability Engineer (Intermediate or Senior)Job Summary:The Teacher Retirement System of Texas (TRS) is seeking a highly skilled Azure Cloud Engineer/Platform Reliability Engineer to join our team. As a key member of our Core Platforms Department, you will be responsible for ensuring the reliability, scalability, and...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job SummaryAt Electric Reliability Council of Texas, we are seeking a highly skilled Network Model Engineer, Sr. to join our team. As a key member of our organization, you will play a critical role in ensuring the reliable operation of the electric power grid.Key ResponsibilitiesProvide engineering analysis and technical support to ensure continuing reliable...


  • Austin, Texas, United States LogicMonitor Full time

    About UsWe're a dynamic team dedicated to delivering exceptional employee and customer experiences. Our culture is built on trust, customer obsession, agility, and a commitment to excellence. We foster a culture of performance and recognition, allowing us to transform growth as we enable our employees to do the best work of their careers.Our team is located...


  • Austin, Texas, United States Diverse Lynx Full time

    Position Overview for Kafka Platform Reliability Engineer: Execute Site Reliability Engineering responsibilities for the Kafka Streaming Platform. Possess a comprehensive understanding of Kafka architecture, including key concepts such as Producers, Consumers, topics, and partitions. Monitor the platforms diligently and follow established runbooks/SOPs to...

Senior Reliability Engineer

2 months ago


Austin, Texas, United States Amazon Full time


As a Senior Reliability Engineer, you will play a pivotal role in ensuring the operational excellence of Amazon's data centers globally. Your expertise will be essential in conducting thorough evaluations and providing insightful feedback on the design aspects across various engineering disciplines.

In addition to your design responsibilities, you will collaborate closely with operations, security teams, field engineering, and construction management to establish effective processes and innovative procedures aimed at enhancing system reliability.


Your role will involve assessing the implications of data center technologies and features to adapt to the dynamic needs of our customers as we scale our infrastructure.


Key Qualifications:

  • Demonstrate strong engineering judgment and the ability to make informed recommendations in uncertain situations.
  • Exhibit a detail-oriented and data-driven mindset.
  • Have a proven track record in managing engineering projects and collaborating with consultants.
  • Cultivate trust and foster relationships with diverse stakeholders, including Operations, Commissioning, Construction, and Design teams.
  • Show a willingness to engage in fieldwork to gain firsthand insights.
In this role, you will engage with various teams responsible for all facets of data center operations. You will prioritize your tasks to enhance data center reliability, focusing on the most impactful actions. Your responsibilities will require a global perspective on all initiatives.

At AWS Infrastructure Services, we oversee the design, planning, delivery, and operation of AWS's global infrastructure. We are the backbone that ensures the cloud remains operational and efficient.

Our team supports all AWS data centers, managing the servers, storage, networking, power, and cooling systems that guarantee our customers continuous access to the innovations they depend on.

We tackle complex challenges with numerous variables affecting the supply chain, and we seek talented individuals eager to contribute to our mission.


Joining our diverse team means collaborating with software, hardware, and network engineers, supply chain experts, security professionals, operations managers, and other essential roles.

You'll work alongside colleagues across AWS to uphold the highest standards of safety and security while delivering seemingly limitless capacity at the most competitive costs for our customers.

Our inclusive culture encourages bold ideas and empowers you to see them through to fruition.

Responsibilities of the Senior Reliability Engineer include:

  • Conducting audits and peer reviews of data center infrastructure engineering designs with a focus on reliability.
  • Performing engineering analyses of past reliability incidents.
  • Providing technical oversight for global Center of Excellence (CoE) initiatives.
  • Developing predictive reliability models.
  • Reviewing and overseeing reliability performance metrics.
  • Conducting multi-cause categorical analyses for reliability events.
  • Overseeing and reviewing Failure Mode and Effect Analysis (FMEA) studies.
  • Initiating and leading reliability projects that significantly impact infrastructure design and implementation.
  • Establishing standards for customer coordination and repeatable processes related to engineering, testing, construction, commissioning, and best practices. Driving process improvements across the organization to enhance reliability and meet customer expectations.
  • Providing technical oversight for the Regional Electrical and Mechanical Basis of Design (BOD), construction documentation, Statements of Work (SoWs), procurement initiatives, supplier management, commissioning scripts, operation and maintenance manuals, and all relevant products and processes that influence reliability.
  • Acting as a technical advisor for AWS data center electrical, mechanical, structural, site, civil, security, network, fire detection, and suppression systems as they relate to enhancing reliability.
  • Collaborating with internal teams to comprehend customer reliability requirements.
  • Offering technical oversight and review for LSE/CSE CoE.
About the Team

Why AWS

Amazon Web Services (AWS) is the most comprehensive and widely adopted cloud platform in the world.

We pioneered cloud computing and continue to innovate, which is why customers from successful startups to Global 500 companies trust our extensive suite of products and services to drive their businesses.

Diverse Experiences

Amazon values diverse experiences.

We encourage candidates to apply even if they do not meet all the preferred qualifications and skills outlined in the job description.

If your career is just beginning, has not followed a traditional path, or includes alternative experiences, we welcome your application.

Work/Life Balance

We prioritize work-life harmony.

Achieving success at work should not come at the expense of personal sacrifices, which is why we promote flexibility as part of our workplace culture.

When we feel supported both at work and home, we can achieve remarkable outcomes in the cloud.

Inclusive Team Culture

At AWS, we embrace a culture of learning and curiosity. Our employee-led affinity groups foster an inclusive environment that celebrates our differences.

Ongoing events and learning opportunities, such as our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to embrace our uniqueness.

Mentorship and Career Growth

We continuously strive to raise our performance standards as we aim to become Earth's Best Employer.

Here, you will find abundant knowledge-sharing, mentorship, and career advancement resources to help you grow into a well-rounded professional.

Basic Qualifications

Bachelor's Degree in Electrical or Mechanical Engineering or equivalent experience.

A minimum of 3 years of experience collaborating with cross-functional teams in critical facilities.

At least 10 years of experience with mission-critical facilities, including:

Knowledge of uninterruptible power supplies, diesel generators, electrical switchgear, power distribution units, and automatic/static transfer switches.

Understanding of chillers, cooling towers, direct and indirect evaporative cooling, and variable speed drives and fan systems.

Familiarity with building codes and regulations, including Life Safety, IBC, NFPA, NEC, NESC, and OSHA.

Direct experience in the design, construction, operation, or maintenance of data centers.

Ability to research new designs, technologies, construction methods, and innovative operational procedures for data center equipment and facilities.

Capability to critically audit and provide customer-representative feedback on design concepts throughout exploration, development, deployment/construction, and operations.


Willingness to think creatively and innovatively to enhance reliability through improved quality, dependability, and maintainability.


Ability to perform complex business case analyses to justify technical decisions and present findings to management during high-level reviews.

Excellent communication skills, attention to detail, and commitment to maintaining high-quality standards.

Preferred Qualifications

Strong organizational skills with the ability to prioritize and meet deadlines and budgets.

Experience utilizing various web-based and software tools for data analysis and visualization.

Direct experience in the design, construction, operation, and maintenance of mission-critical facilities, particularly data centers.

Experience as a resident engineer or hands-on design consultant in the field.

Knowledge of building codes and regulations, including Life Safety, IBC, NFPA, NEC, NESC, and OSHA.

Ability to read, interpret, and create construction drawings, specifications, and submittal documents.

Ability to carry design concepts through exploration, development, and into deployment/mass production.

Ability to research new designs, technologies, construction methods, and innovative operational procedures for data center equipment and facilities.

Ability to critically audit and provide customer-representative feedback on design concepts throughout exploration, development, deployment/construction, and operations.


Willingness to think creatively and innovatively to enhance reliability through improved quality, dependability, and maintainability.


Ability to perform complex business case analyses to justify technical decisions and present findings to management during high-level reviews.

Excellent communication and writing skills, attention to detail, and commitment to maintaining high-quality standards.

In-depth understanding of both mechanical and electrical equipment/design related to data centers (including but not limited to: uninterruptible power supplies, diesel generators, electrical switchgear, power distribution units, variable frequency drives, automatic/static transfer switches, chillers [air-cooled and water-cooled], pumps, cooling towers, heat exchangers, air handlers, economizers, etc.).

Experience with EPMS/SCADA/BMS control systems (software and/or hardware).

Registered Professional Engineer.

Advanced degree in engineering, business, or a related field.

Experience with large-scale technical operations or large-scale compute facilities.

Amazon is committed to fostering a diverse and inclusive workplace.

We are an equal opportunity employer and do not discriminate based on race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

For individuals with disabilities who would like to request an accommodation, please visit our website for more information.