Site Reliability Engineering Director

2 weeks ago


Newton, United States Bright Horizons Full time

The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance complex systems and applications' performance, scalability, and reliability. The Director of SRE is responsible for developing and implementing strategies to optimize our technologys reliability and uptime, managing incident response, and ensuring consistent use of best practices in automation, monitoring, and incident management. This role requires a deep understanding of cloud technologies, distributed systems, DevOps, Software Engineering, Automation / Scripting, Observability, App Support / Monitoring, and a proactive approach to preventing and mitigating potential issues. The Director of SRE must also foster a culture of innovation, continuous improvement, and collaboration within the team to meet the organization's evolving needs and deliver a superior digital experience to users.

What you will be doing:

  • Strategy and Planning: Develop and implement a comprehensive strategy for site reliability, encompassing scalability, performance, and reliability improvements. Align SRE objectives with overall business goals and technology roadmaps. Foster the spirit of continuous improvement to the SRE and position it to benefit the organizational objectives.
  • Leadership and Team Management: Provide strong leadership to the Site Reliability Engineering (SRE) team, fostering a culture of collaboration, innovation, and continuous improvement. Recruit, mentor, and develop a high-performing team of SRE professionals. Engrave a can do attitude into the team out of the box, combined with a passion for automation and engineering excellence.
  • Operational Excellence: Oversee day-to-day operations of the SRE team, ensuring the reliability and availability of digital infrastructure. Establish and enforce best practices for incident response, monitoring, automation, and system reliability. Do so by incorporating tools and technologies that create a 36-degree view of the SRE efficiency, including but not limited to DevOps, App Support, Monitoring, Incident Management, Observability, Network/Infra/InfoSec, and Enterprise Architecture.
  • Collaboration: Collaborate with teams across our lines of business, including development, DevOps, App Support, Monitoring, Network/Infra/InfoSec, and Enterprise Architecture, to drive a unified approach to site reliability that optimizes the work of all those teams and improves time-to-market for all respective objectives. Foster strong relationships with the leadership and partnering delivery organizations to align SRE efforts with organizational goals.
  • Monitoring and Alerting: Implement robust monitoring and alerting systems to proactively identify potential issues, analyze system performance, and facilitate quick response to incidents.
  • Automation and Efficiency: Drive the development and implementation of automation solutions to streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
  • System Capacity Planning: Work closely with infrastructure and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand. Anticipate growth and scalability requirements.
  • Incident Management: Establish and oversee effective SRE-focused incident response processes, ensure timely incident resolution, and conduct post-mortems to identify root causes and implement preventive measures.

What we hope you will bring to this role?

  • Bachelor's degree in computer science, Engineering, or related field.
  • A minimum of 10 years of experience, including at least 3 years in the SRE or DevOps field, with a proven track record of progressively increasing responsibilities and leadership roles.
  • Demonstrated ability to think strategically and develop a vision for site reliability engineering aligned with the organization's business objectives.
  • Strong leadership and people management skills, including experience leading and developing high-performing teams.
  • A "can do" attitude is necessary, combined with a deep belief that everything can be automated and systems must always be functional.
  • Strong experience and understanding of software engineering, scripting, build/deployment pipelines, Infrastructure as Code, and SLA/SLO/SLIs.
  • Strong understanding of cloud computing platforms (Azure required, Google Cloud a plus), including lift-and-shift environments (VMs, etc.) and cloud-native setups (AKS, serverless, etc.).
  • Strong understanding and experience in automation tools and programming/scripting/descriptive languages (e.g., C#, PowerShell, Python, Bash, Terraform, JavaScript) to develop and implement automated system reliability and performance solutions.
  • Strong understanding of observability, monitoring, and alerting tools (e.g., Azure AppInsights, Data Dog, Splunk, etc.) and the ability to design and implement effective monitoring strategies.
  • Technical leadership skills, including technical collaboration/communication, problem-solving, and project management, are needed to lead the SRE team in delivering its objectives.
  • Preference may be given to candidates with relevant certifications demonstrating cloud and reliability engineering expertise.


  • Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    3 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    4 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    4 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    4 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...


  • Newton, United States Bright Horizons Family Solutions Full time

    What you will be doing:The Site Reliability Engineer role involves a comprehensive understanding of application architecture, infrastructure, and non-functional requirements to identify and address production workloads effectively. Responsibilities include monitoring systems in both production and non-production environments, troubleshooting and resolving...


  • Newton, United States Bright Horizons Family Solutions Full time

    What you will be doing:The Senior Manager, Site Performance Engineering will lead our efforts to enhance the performance and overall reliability of our software applications. This role combines aspects of software engineering with IT operations and testing to oversee a team responsible for designing, implementing, and maintaining scalable systems. You will...


  • Newton, United States City of Newton (City of Newton) Full time

    **Primary purpose**: The mission of the Health and Human Services Department is to protect, promote and sustain the health, harmony, and wellbeing of all Newton residents, in a culturally competent, customer-friendly manner with a focus on prevention. The Department is organized into three different divisions: School Health, Public Health, and Social...


  • Newton, United States Symbotic Full time

    Who we are With its A.I.-powered robotic technology platform, Symbotic is changing the way consumer goods move through the supply chain. Intelligent software orchestrates advanced robots in a high-density, end-to-end system - reinventing warehouse automation for increased efficiency, speed and flexibility. What we need We are looking for a Senior Software...

  • Design Engineer I

    2 days ago


    Newton, United States Terex Full time

    Job Description: About CBI: Since 1988, CBI machines have been purpose-built to outproduce, outperform, and outlast anything in the market. CBI's tradition of quality runs through complete lines of horizontal grinder and industrial woodchippers, attachments for composting, forestry, biomass recovery, and wood-waste processing. We offer a compassionate,...


  • Newton, United States CEI Group Full time

    Lead Infrastructure Engineer Hybrid – 2/3 days per week onsite Job Description: As a Lead Infrastructure Engineer, you will oversee the design, implementation, and maintenance of our company's IT infrastructure while providing leadership and guidance to a team of infrastructure engineers. You will collaborate closely with cross-functional teams to ensure...

  • Design Engineer I

    1 month ago


    Newton, United States Terex Full time

    Description : About CBI: Since 1988, CBI machines have been purpose-built to outproduce, outperform, and outlast anything in the market. CBI’s tradition of quality runs through complete lines of horizontal grinder and industrial woodchippers, attachments for composting, forestry, biomass recovery, and wood-waste processing. We offer a...


  • Newton, United States City of Newton (City of Newton) Full time

    **Primary purpose**: The mission of the Health and Human Services Department is to protect, promote and sustain the health, harmony, and wellbeing of all Newton residents, in a culturally sensitive, customer-friendly manner. The focus is on prevention. The Department is organized into five different divisions: Environmental Health, School Health, Public...


  • Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing EngineerWeir ESCOLocation : Newton, MississippiOnsitePurpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • Newton, United States WS Development Full time

    Overview The Tenant Construction team at WS is seeking a Director of Tenant Construction to join our team! This is an outstanding leadership position that will manage an assigned portfolio of projects and mentor, support and develop a team. This position will report to the VP of Tenant Construction. Our ideal candidate will be a strategic thinker who can...


  • Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing Engineer Weir ESCO Location : Newton, Mississippi Onsite Purpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • Newton, United States META Full time

    Meta is seeking a data center Critical Facility Engineer to join our Data Center Facility Operations team. Our data centers serve as the foundation upon which our software operates to meet the demands of our customers. The Critical Facility Engineer will be a part of the Facility Operations team responsible for operating and maintaining critical systems in...


  • Newton, United States Burns & McDonnell Full time

    Description The Assistant Electrical Engineer (or Power & Utilities Analyst) will assist our clients by helping execute a variety of engineering and power system studies, to bring together the technical and economic knowledge needed to develop strategic roadmaps. Our goal is to help our clients create a strong, smart, and sustainable electric grid. The Power...


  • Newton, United States Northland Full time

    Position Purpose: Working closely with executive management, the Director of Marketing prepares and executes the strategic marketing programs, social media and public relations efforts for Northland's portfolio of multifamily communities. Essential Functions: The following list of essential job functions is not exhaustive and may be supplemented or modified,...


  • Newton, United States gpac Full time

    Job Description Position: Engineering Manager Reports to: Vice President of Preconstruction Job Summary: We are seeking an experienced Engineering Manager to oversee all design functions and manage a team of designers. The ideal candidate will enhance our design process, deliver high-quality designs, stay current with the latest trends and technologies in...