Site Reliability Engineer

4 weeks ago


Austin, United States JobRialto Full time
Description:

The Client Site Reliability team is responsible for the operations and infrastructure of all consumer-facing production systems and developer-facing systems at Client Games, including NBA Client game services, customer-facing account services, and websites. This team handles systems and services spanning multiple datacenters both terrestrial and cloud-based.

What We Need:

We are looking for an expert engineer who is passionate about building multi-datacenter infrastructure and services. Robust systems and problem-solving skills are required as we develop solutions for game studios and support data centers around the world alongside a group of outstanding engineers. In this role, you will collaborate with network engineers, systems architects, and development staff to support our gamers and the needs of the business.

What you will do

What We Do

Build and operate highly resilient systems in a multi-datacenter and cloud global environment serving game and consumer services

Develop tools for the management and automation of the systems and service infrastructure

Define and implement standards that will impact systems, services, and multiple software environments

Diagnose and resolve technical issues from both internal and external customers and drive improvements to prevent them from recurring

Participate in Site Reliability Engineering's on-call rotation

Who We Believe Will Be an Outstanding Fit

You are eager to work in a fast-paced environment with other highly skilled engineers who are passionate about service availability and health

If the idea of building data center infrastructure services from greenfield to implementation moves you

Required Qualifications

6+ years of demonstrated influence across one or more teams for large scale projects that drive impact and improvement across the organization

6+ years of experience in an SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resiliency

6+ years of developing tools for automation of processes or augmenting off the shelf tool functionality

6+ years of AWS and/or GCP cloud experience running highly elastic mission critical workloads

6+ years of coding experience in at least one or more of Python, Ruby, Java, or Go and a good understanding of code management

6+ years of experience using Infrastructure as Code tools like Terraform, Pulumi, or others

Extensive knowledge of software build, test, and deploy processes using Git, Jenkins, Puppet, Ansible, Docker/containers, and Kubernetes

Experience with system analysis and troubleshooting

Serve as a mentor to junior engineers and provide technical leadership to the organization.

Bonus Points

Prior hands-on experience running large scale multiplayer video games at scale

Experience designing and crafting software for systems and network automation

Debugging, code optimization, and routine task automation skills

Demonstrated ability to decompose sophisticated problems. Ability to engage in lateral investigations.

Must Haves:

3 to 5 years exp. Kubernetes, Data Dog, cloud services, large scale systems, AWS&GCP, minor Azure

GKE, home strung clusters on prem, and AKS (Very Small), EKS

Consistent upgrades across all the clusters and clouds

Education: Bachelors Degree

Additional client information:

  • Austin, United States Virtu Financial Full time

    Virtu is a leading financial firm that leverages cutting edge technology to deliver liquidity to the global markets and innovative, transparent trading solutions to our clients. As a market maker, Virtu provides deep liquidity that helps to create more efficient markets around the world. Our market structure expertise, broad diversification, and execution...


  • Austin, United States ClickHouse Full time

    We are committed to providing our customers with reliable and secure services so we are building out our newly formed Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance...


  • Austin, United States Zenoss Careers Full time

    Description We are hiring a Site Reliability Engineer to support, configure, and build our SaaS offerings. You will be troubleshooting and administering multiple environments including performance and quality of disaster recovery. You will support all aspects of the technical infrastructure by troubleshooting system configuration, installation, and other...


  • Austin, Texas, United States Procore Technologies Full time

    Job Description What if you could use your technology skills to develop a product that impacts the way communities’ hospitals, homes, sports stadiums, and schools across the world are built? Construction impacts the lives of nearly everyone in the world, and yet it’s also one of the world’s least digitized industries. That’s why we’re looking for...


  • Austin, United States Frontline Education Full time

    Posting Details Job Details Description Location Requirements: This role is Hybrid to one of our offices: Austin, Naperville or Wayne.  Overview : We are looking for an outgoing and dynamic  Site Reliability Engineer  to manage the successful operation and support of Frontline application environments. This position is responsible...


  • Austin, Texas, United States Visa Full time

    Job Description As a part of the Product Reliability Engineering (PRE) Organization of VISA , you will be responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. In this role, your time will be split between operations/on-call duties and developing systems and software that...


  • Austin, United States Oracle Full time

    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...


  • Austin, United States Adva IT Services, Inc.. Full time

    Job SummaryWe are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both Apple s internal services as well as services that Apple users directly use. As an Operations Engineer, you will play a crucial role in helping ensure our systems...


  • Austin, United States VeeAR Projects Inc. Full time

    Position: Site Operations EngineerLocation: Austin, TX (Hybrid)Duration: 12+ Months Contract with possible extensionJob Description:We are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both Apple’s internal services as well as...


  • Austin, United States VeeAR Projects Inc. Full time

    Position: Site Operations EngineerLocation: Austin, TX (Hybrid)Duration: 12+ Months Contract with possible extensionJob Description:We are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both Apple’s internal services as well as...


  • Austin, United States Veear Full time

    Position: Site Operations EngineerLocation: Austin, TX (Hybrid)Duration: 12+ Months Contract with possible extension Job Description:We are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both Apple s internal services as well as...

  • Site Operations Engineer

    41 minutes ago


    Austin, United States Beth Page tech Full time

    Job DescriptionJob DescriptionRole: Site Operations EngineerLocation Austin, TX / Santa Clara, CAJob SummaryWe are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both Apple's internal services as well as services that Apple users...


  • Austin, United States Zyxware Technologies Full time

    Title: Site Operations Engineer (Only W2)Location AST or SCV (AST = Austin, TX and SCV = Santa Clara Valley)Duration: 6 MonthsJob SummaryWe are looking for an operations engineer to join the Crypto Services SRE team. The Crypto Services SRE team is responsible for systems and services that support a vast number of both internal services as well as services...


  • Austin, United States Oracle Full time

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critically important stack, with focus on...


  • Austin, Texas, United States NinjaOne Full time

    Senior Database Reliability Engineer (DBRE) About the Role At NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior Database Reliability Engineer (DBRE) to join our SRE team in the Platform Engineering organization and help us scale our products to millions of...


  • Austin, United States Sumo Logic Full time

    Location: Ideally Austin, TX. We will, however, also look at 100% remote talent based elsewhere in the USA and Canada. Summary of role Own availability, the most important product feature, by continually striving for sustained operational excellence of Sumo's planet-scale observability and security products. Work with your global SRE team to optimize...


  • Austin, United States NinjaOne Full time

    Senior Database Reliability Engineer (DBRE) About the Role At NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior Database Reliability Engineer (DBRE) to join our SRE team in the Platform Engineering organization and help us scale our products to millions of...


  • Austin, United States Netspend Full time

    About the Company: Ouro is dedicated to delivering financial empowerment to millions of Americans, leveraging a proprietary payments technology platform that fuels its fintech product innovations. From prepaid, credit and debit account solutions, to digital account and money movement services, Ouro has a broad suite of products and technologies that deliver...


  • Austin, United States Pinnacle Group Full time

    SRE with Java Development, AWS Day 1 Onsite Austin, TX Hybrid - 3 Days / Week Duration: Long term contract Job details - We are actively looking for SRE engineer with strong Java development + AWS background exp. Minimum 8 - 12 years exp needed. Pay Range: $65/hr - $70/hr The specific compensation for this position will be determined by a number of factors,...


  • Austin, United States Pinnacle Group, Inc. Full time

    SRE with Java Development, AWS Day 1 Onsite Austin, TXHybrid - 3 Days / WeekDuration: Long term contractJob details -We are actively looking for SRE engineer with strong Java development + AWS background exp.Minimum 8 - 12 years exp needed.Pay Range: $65/hr - $70/hrThe specific compensation for this position will be determined by a number of factors,...