Site Reliability Engineer, Lead

1 month ago


Madison, United States TekStream Solutions Full time

Overview Our client is a remote-first company with team members across the globe Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the tools an organization needs to create, manage, track, and improve highly personalized learning experiences for customers, partners, and employees.

Successful Candidate SaaS experience

Experienced and able to thrive in a small-medium high-growth environment

Invested in upskilling, learning new tech

Deeply curious, creative, and innovative

Flexible in working hours/ability to collaborate in different time zones

Lead Site Reliability Engineer The Lead Site Reliability Engineer has a pivotal role at the forefront of our engineering operations, responsible for guiding the Platform Team toward achieving exceptional standards of reliability, performance, and stability across all our applications. The successful candidate will possess deep expertise in these core areas and will be instrumental in defining and implementing industry-leading practices. As a key leader, this role will not only shape the strategic direction of our platform operations but also establish the benchmarks and processes by which our engineering excellence is measured.

Responsibilities

Lead the SRE Team, setting clear goals and priorities in line with business objectives.

In collaboration with the department Director develop and execute strategies that enhance technological capabilities across the company.

Ensure all platforms and systems operate smoothly and remain highly available, scalable, and fault-tolerant.

Implement best practices for continuous monitoring, preventive maintenance, and rapid response.

Continuously assess system performance, identify bottlenecks, and make data-driven recommendations for infrastructure enhancements.

Ensure that developers have access to the best tools and platforms to facilitate efficient coding practices and understand the performance of applications.

Educate the rest of engineering about best practices for writing performant code and troubleshoot problematic areas.

Develop and refine incident management protocols.

Lead efforts to troubleshoot and resolve high-impact issues, minimizing downtime and preventing future occurrences.

Work closely with other engineering teams and departments to understand their needs and ensure platform initiatives support overall company goals.

Monitor virtual infrastructure and be part of a 24x7 on-call rotation to respond to alerts.

Requirements

8+ years of experience as a software engineer

5+ years of experience working with Ruby on Rails

Proven experience leading SRE teams

3+ years of experience working in infrastructure and operations

Expertise with SQL databases such as PostgreSQL

Experience with Cloud computing (Amazon Web Services and/or Google Cloud)

Ability to dig into unfamiliar code bases

Ability to document solutions and train operational teams on supportability

A sense of comfort working in a team-oriented and collaborative environment

Can communicate clearly and seek help and support proactively

Takes ownership of tasks and leads them to completion

Desired Experience

Experience in developing solutions using server automation tools such as Ansible

Experience writing and maintaining CI/CD pipelines and services

Education Bachelor’s degree in Computer Science or related technical field

#J-18808-Ljbffr



  • Madison, Wisconsin, United States TekStream Solutions Full time

    OverviewTekStream Solutions is a remote-first organization with a diverse team spread across the globe, specializing in a SaaS-based Learning Management System that empowers leading educational initiatives worldwide.We assist prominent brands and agile companies in boosting revenue, enhancing customer loyalty, and reducing support expenses through external...


  • Madison, United States Xcede Full time

    Site Reliability Engineering Manager is required by a global financial technology organisation. In this newly created role, the Site Reliability Engineering Manager will be responsible for deploying and managing a suite of enterprise-wide tools used for provisioning, automation, and monitoring as well as technical team leadership. Site Reliability...


  • Madison, United States Formula Recruitment Full time

    ```html Senior Site Reliability Engineer Salary: Up to £120,000 Location: Fully Remote Type: Permanent, Full Time We are partnered with a leading Web3 and Blockchain start-up company who aim to disrupt the crypto eco-system and move away from a chain centric worldview towards an account centric worldview. They are currently looking for a Senior Site...


  • Madison, Wisconsin, United States Xcede Full time

    Position Overview:The Manager of Site Reliability Engineering is sought by a leading global financial technology firm. In this pivotal role, you will oversee the deployment and management of a comprehensive suite of enterprise tools designed for provisioning, automation, and monitoring, alongside providing technical leadership to your team.Key...


  • Madison, United States Talented Recruitment Group Full time

    ```html Are you passionate about crafting robust, fault-tolerant systems that power unforgettable travel experiences? Do you thrive in an environment where innovation and collaboration are valued? If so, we have an incredible opportunity for you! About the Company: We are working with a leading global travel company dedicated to providing exceptional...


  • Madison, Wisconsin, United States Sub-Zero & Wolf Appliance Full time

    We invite you to consider a position with Sub-Zero & Wolf Appliance as a Lead Reliability Engineer. This role is pivotal in ensuring that our luxury kitchen appliances meet the highest standards of quality and performance. About Us: Sub-Zero & Wolf Appliance is a renowned manufacturer of premium kitchen appliances, celebrated for our commitment to design...


  • Madison, United States Fetch Full time

    What we’re building and why we’re building it. There’s a reason Fetch is ranked top 10 in Shopping in the App Store. Every day, millions of people earn Fetch Points buying brands they love. From the grocery aisle to the drive-through, Fetch makes saving money fun. We’re more than just a build-first tech unicorn. We’re a revolutionary shopping...


  • Madison, United States Fetch Full time

    What we’re building and why we’re building it. There’s a reason Fetch is ranked top 10 in Shopping in the App Store. Every day, millions of people earn Fetch Points buying brands they love. From the grocery aisle to the drive-through, Fetch makes saving money fun. We’re more than just a build-first tech unicorn. We’re a revolutionary shopping...


  • Madison, Wisconsin, United States IC Resources Full time

    Position Overview:Our client, a leading firm in the semiconductor industry, is seeking a Lead Reliability Engineer to oversee all aspects of reliability for innovative devices and integrated circuits (ICs). Key Responsibilities:The successful candidate will be tasked with:- Designing and implementing reliability experiments.- Developing and analyzing...


  • Madison, United States Peaple Talent Full time

    Hello Site Reliability Engineers! Having an average day? Well, luckily you've come across an opportunity that might just change that. For this one - you will be part of a team that is building & designing a new serverless architecture. Therefore, you will be comfortable deploying with Terraform, while understanding observability principles. Really know your...


  • Madison, Wisconsin, United States Redline Group Full time

    Job Opportunity: Lead Reliability Engineer - ElectronicsThe Redline Group is excited to present a new role for a Lead Reliability Engineer - Electronics.Our client is a prominent software development firm, actively engaged in the rapidly expanding electric vehicle sector in the UK.This position is ideal for highly skilled and ambitious engineers who are...


  • Madison, United States Total Administrative Services Corporation Full time

    Job DescriptionJob DescriptionAbout Us:Xformative Payment Systems is at the cutting edge of the Fintech industry, specializing in cloud-native payment processing solutions. We are a dynamic, fast-growing company with a small, agile team that thrives in a startup environment. Here, every team member has the opportunity to drive and create impactful work. Our...


  • Madison, United States THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Responsibilities: Configure and maintain observability tooling with Datadog and PagerDuty (Slack channels) Contribute to our IaC...


  • Madison, Wisconsin, United States Xcede Full time

    Position Overview:The Manager of Site Reliability Engineering is sought by a leading global financial technology firm. In this pivotal role, the Manager will oversee the deployment and administration of a comprehensive suite of enterprise tools designed for provisioning, automation, and monitoring, while also providing technical leadership to the team.Key...


  • Madison, Wisconsin, United States Talented Recruitment Group Full time

    Are you excited about building resilient, high-performance systems that enhance travel experiences? Do you excel in a collaborative and innovative environment? If this resonates with you, we have a remarkable opportunity for you.About Talented Recruitment Group:We partner with a prominent global travel organization committed to delivering outstanding...


  • Madison, United States Sub-Zero & Wolf Appliance Full time

    We welcome you to join Sub-Zero, Wolf, and Cove as a Senior Reliability Engineer in Madison, WI location. Sub-Zero, Wolf, and Cove the leading manufacturer of luxury kitchen appliances is a longstanding, family-owned company in the Madison area. Icons of design and paragons of performance and quality, Sub-Zero, Wolf, and Cove are the refrigeration, cooking,...


  • Madison, United States Palmer Group Full time

    One of the leading appliance manufacturers in the world is searching for a Senior Reliability Engineer. This person will be responsible for establishing design assurance and reliability standards to ensure products consistently meet customer expectations for quality and performance throughout their lifecycle. This role requires specific responsibilities,...


  • Madison, United States Sub-Zero & Wolf Appliance Full time

    We welcome you to join Sub-Zero, Wolf, and Cove as a Senior Reliability Engineer in Madison, WI location. Sub-Zero, Wolf, and Cove the leading manufacturer of luxury kitchen appliances is a longstanding, family-owned company in the Madison area. Icons of design and paragons of performance and quality, Sub-Zero, Wolf, and Cove are the refrigeration, cooking,...


  • Madison, United States Intapp Full time

    The Intapp Cloud Platform is a rapidly growing collection of cloud services. As part of a global team, the ideal candidate will be able to quickly move between architecture, design, and daily operations with an emphasis on scalability and automation. You will dive deep into operational issues; from the software, systems, automation, and process perspectives....


  • Madison, Wisconsin, United States Talented Recruitment Group Full time

    Are you driven by the challenge of building resilient and dependable systems that enhance travel experiences? Do you excel in a collaborative and innovative atmosphere? If this resonates with you, we have an exciting opportunity for you.About Talented Recruitment Group:We partner with a premier global travel organization committed to delivering outstanding...