Site Reliability Engineer

4 days ago


Austin, United States Nomi Health Full time

We are seeking a Site Reliability Engineer (SRE) to join our team in Austin, TX. You will play a pivotal role in ensuring the reliability, performance, and scalability of our services. You will collaborate with cross-functional teams to design, implement, and manage infrastructure that is robust and resilient. Your focus will be on developing and refining processes that enhance our operational efficiency while leveraging your technical expertise to address complex challenges.

How you will make an impact

    • Infrastructure as Code (IaC): Design, implement, and manage infrastructure using Terraform, ensuring all resources are defined and managed through code.
    • Cloud Management: Deploy, manage, and optimize services primarily on AWS (experience with other cloud providers is also valuable).
    • Monitoring and Observability: Utilize Datadog to set up comprehensive monitoring and logging, ensuring the visibility of system performance and health.
    • Incident Management and Response: Lead incident management processes using PagerDuty, ensuring swift and effective resolution of issues to minimize downtime.
    • Process Improvement: Develop and refine operational processes to enhance reliability, performance, and scalability of services. Implement automation and tooling to streamline workflows.
    • Collaboration: Work closely with development and operations teams to implement best practices in system architecture and deployment activities.
    • Performance Tuning: Identify and address performance bottlenecks in infrastructure and applications, ensuring optimal performance.
    • Documentation: Create and maintain comprehensive documentation for infrastructure, processes, and incident response procedures.
    • On-Call Rotation: Eventually participate in on-call rotations, providing reliable support and response to critical issues outside of regular business hours.
What we are looking for
    • This is a hybrid role based in our Austin office, requiring in-person three days a week: Tuesday, Wednesday, and Thursday.
    • 3-5 years of relevant experience in DevOps, Infrastructure Engineering, or similar roles.
    • Proficient in Terraform and cloud services (AWS or similar).
    • Experience with monitoring/logging tools (e.g., Datadog, ELK Stack, Prometheus) and incident management (e.g., PagerDuty).
    • Familiar with container orchestration (Kubernetes) and Infrastructure as Code (IaC) tools (e.g., Ansible, Chef).
    • Knowledge of CI/CD pipelines and related tools.
    • Strong focus on automation to improve efficiency and reliability.
    • Solid understanding of incident management best practices.
    • Excellent problem-solving, communication, and collaboration skills.
    • Ability to work effectively under pressure during incidents and drive solutions quickly.


  • Austin, United States TEACHER RETIREMENT SYSTEM Full time

    The Site Reliability Engineer(Microsoft Exchange) Associate assists in maintaining the reliability, scalability, and performance of TRSs IT infrastructure. The incumbent will assist in supporting the management of a hybrid Exchange environment, integrating Proofpoint as the Email Gateway, and using PowerShell scripts for automation. This position will work...


  • Austin, United States Farm Credit Bank of Texas Full time

    Job DescriptionWho we are: Farm Credit Bank of Texas is a $38.2 billion wholesale bank that has been financing agriculture and rural America for over 100 years. Headquartered in Austin, Texas, we provide funding and services to rural lending associations in five states, and we are active in the nation's capital markets. While you may not be familiar with...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job SummaryWe are seeking a highly skilled Reliability & Compliance Solutions Engineer to join our team at the Electric Reliability Council of Texas. This role will play a critical part in ensuring the reliability and compliance of our operations, working closely with subject matter experts to meet or exceed performance requirements.Main...


  • Austin, United States CV Library Full time

    Job DescriptionAs a part of the Product Reliability Engineering (PRE) Organization of VISA , you will be responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. In this role, your time will be split between operations/on-call duties and developing systems and software that help...


  • Austin, Texas, United States Jabil Full time

    About the RoleJabil is seeking an experienced Site Reliability Engineering Lead to contribute to the transformative growth within our Intelligent Infrastructure division. The Site Reliability Lead Engineer plays a vital role in ensuring the quality and reliability of the test network infrastructure of the Intelligent Infrastructures factories on a global...


  • Austin, Texas, United States The Electric Reliability Council of Texas (ERCOT) Full time

    We are seeking a talented Grid Reliability and Compliance Engineer to join our team at The Electric Reliability Council of Texas (ERCOT). As a key member of our team, you will be responsible for ensuring that ERCOT ISO meets or exceeds its reliability performance requirements.Your primary responsibilities will include monitoring and reporting ERCOT ISO and...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, United States CV Library Full time

    Job DescriptionWe’re looking for a Staff Site Reliability Engineer to join Procore’s Project Execution Group. In this role, you’ll lead, collaborate, partner and develop solutions to maintain the health of the core platform. The goal is to ensure the chosen design and architecture is highly available, performant and reliable as this team is directly...


  • Austin, Texas, United States Unreal Gigs Full time

    Job DescriptionWe are seeking a skilled Senior Manager of DevOps and Site Reliability to join our team at Unreal Gigs. This role is responsible for leading the development, maintenance, and enhancement of our user-facing application and internal tools.About UsWe are a fully remote engineering team that values collaboration, innovation, and continuous...


  • Austin, United States Visa Full time

    Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • Austin, Texas, United States AutoRABIT Holding Inc. Full time

    About AutoRABITAutoRABIT is a hyper-growth SaaS software company and the leading provider of Salesforce DevSecOps platform for regulated industries such as financial institutions, insurance, and healthcare.About the RoleAs a Senior Site Reliability/DevOps Engineer at AutoRABIT, you will play a critical role in developing, scaling, and operating our cloud...


  • Austin, United States AutoRABIT Holding Inc. Full time

    About AutoRABIT: AutoRABIT is a hyper-growth SaaS software company and the leading provider of Salesforce DevSecOps platform for regulated industries such financial institutions, insurance, and healthcare. AutoRABIT solutions enable developers to automate their daily tasks to be more productive and increase the release velocity for their development team,...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, United States Visa Full time

    Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • Austin, United States Visa Full time

    Company DescriptionVisa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • Austin, Texas, United States Appko, Inc. Full time

    **Job Overview:**We are looking for an experienced Site Reliability Engineer to join our team at Appko, Inc. As a SRE, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure and applications.The ideal candidate will have a strong background in DevOps, cloud computing, and software engineering,...


  • Austin, Texas, United States Electric Reliability Council of Texas Full time

    Job OverviewAt the Electric Reliability Council of Texas, we are seeking a highly skilled Power System Engineer to join our team. As a key member of our organization, you will play a crucial role in ensuring the reliable operation of the electric power grid.Key ResponsibilitiesPerform complex engineering studies, including power flow, voltage security, and...


  • Austin, United States Charles Schwab Full time

    Position Type: RegularYour opportunityAt Schwab, you are empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.  As a Principal Site Reliability Engineer for Schwab's Technology Solutions organization, you will be responsible...


  • Austin, Texas, United States Apple Full time

    Job TitleStaff Site Reliability Engineer, Kubernetes ASEAbout the RoleThis is a pivotal position in our Service Engineering team at Apple, where you will play a key role in shaping the future of our products and services. As an SRE, you will be responsible for supporting and scaling cloud services for thousands of development and operations engineers.Key...


  • Austin, Texas, United States Teacher Retirement System of Texas Full time

    The TRS is seeking a highly skilled Microsoft Exchange Engineer to join our team in a hybrid position. As a key member of our IT staff, you will be responsible for designing, implementing, and maintaining the reliability, scalability, and performance of our IT infrastructure.The ideal candidate will have a strong background in Microsoft Exchange...