Site Reliability Engineering Director

1 month ago


Newton, United States Bright Horizons Full time

The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance complex systems and applications' performance, scalability, and reliability. The Director of SRE is responsible for developing and implementing strategies to optimize our technology’s reliability and uptime, managing incident response, and ensuring consistent use of best practices in automation, monitoring, and incident management. This role requires a deep understanding of cloud technologies, distributed systems, DevOps, Software Engineering, Automation / Scripting, Observability, App Support / Monitoring, and a proactive approach to preventing and mitigating potential issues. The Director of SRE must also foster a culture of innovation, continuous improvement, and collaboration within the team to meet the organization's evolving needs and deliver a superior digital experience to users.


What you will be doing:


  • Strategy and Planning: Develop and implement a comprehensive strategy for site reliability, encompassing scalability, performance, and reliability improvements. Align SRE objectives with overall business goals and technology roadmaps. Foster the spirit of continuous improvement to the SRE and position it to benefit the organizational objectives.
  • Leadership and Team Management: Provide strong leadership to the Site Reliability Engineering (SRE) team, fostering a culture of collaboration, innovation, and continuous improvement. Recruit, mentor, and develop a high-performing team of SRE professionals. Engrave a “can do” attitude into the team out of the box, combined with a passion for automation and engineering excellence.
  • Operational Excellence: Oversee day-to-day operations of the SRE team, ensuring the reliability and availability of digital infrastructure. Establish and enforce best practices for incident response, monitoring, automation, and system reliability. Do so by incorporating tools and technologies that create a 36-degree view of the SRE efficiency, including but not limited to DevOps, App Support, Monitoring, Incident Management, Observability, Network/Infra/InfoSec, and Enterprise Architecture.
  • Collaboration: Collaborate with teams across our lines of business, including development, DevOps, App Support, Monitoring, Network/Infra/InfoSec, and Enterprise Architecture, to drive a unified approach to site reliability that optimizes the work of all those teams and improves time-to-market for all respective objectives. Foster strong relationships with the leadership and partnering delivery organizations to align SRE efforts with organizational goals.
  • Monitoring and Alerting: Implement robust monitoring and alerting systems to proactively identify potential issues, analyze system performance, and facilitate quick response to incidents.
  • Automation and Efficiency: Drive the development and implementation of automation solutions to streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
  • System Capacity Planning: Work closely with infrastructure and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand. Anticipate growth and scalability requirements.
  • Incident Management: Establish and oversee effective SRE-focused incident response processes, ensure timely incident resolution, and conduct post-mortems to identify root causes and implement preventive measures.


What we hope you will bring to this role?

  • Bachelor's degree in computer science, Engineering, or related field.
  • A minimum of 10 years of experience, including at least 3 years in the SRE or DevOps field, with a proven track record of progressively increasing responsibilities and leadership roles.
  • Demonstrated ability to think strategically and develop a vision for site reliability engineering aligned with the organization's business objectives.
  • Strong leadership and people management skills, including experience leading and developing high-performing teams.
  • A "can do" attitude is necessary, combined with a deep belief that everything can be automated and systems must always be functional.
  • Strong experience and understanding of software engineering, scripting, build/deployment pipelines, Infrastructure as Code, and SLA/SLO/SLIs.
  • Strong understanding of cloud computing platforms (Azure required, Google Cloud a plus), including lift-and-shift environments (VMs, etc.) and cloud-native setups (AKS, serverless, etc.).
  • Strong understanding and experience in automation tools and programming/scripting/descriptive languages (e.g., C#, PowerShell, Python, Bash, Terraform, JavaScript) to develop and implement automated system reliability and performance solutions.
  • Strong understanding of observability, monitoring, and alerting tools (e.g., Azure AppInsights, Data Dog, Splunk, etc.) and the ability to design and implement effective monitoring strategies.
  • Technical leadership skills, including technical collaboration/communication, problem-solving, and project management, are needed to lead the SRE team in delivering its objectives.
  • Preference may be given to candidates with relevant certifications demonstrating cloud and reliability engineering expertise.



  • Newton, United States Bright Horizons Full time

    The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance...


  • Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    3 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    2 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    3 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    3 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Design Engineer I

    4 weeks ago


    Newton, United States Terex Full time

    Description : About CBI: Since 1988, CBI machines have been purpose-built to outproduce, outperform, and outlast anything in the market. CBI’s tradition of quality runs through complete lines of horizontal grinder and industrial woodchippers, attachments for composting, forestry, biomass recovery, and wood-waste processing. We offer a...


  • Newton, United States City of Newton (City of Newton) Full time

    **Primary purpose**: This senior leadership position provides critical management of the social services division to implement programs and identify resources to maximize assistance to Newton residents. The division is designed to be wide ranging and the Director must be able to address various types of needs including but not limited to: housing, financial...


  • Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing EngineerWeir ESCOLocation : Newton, MississippiOnsitePurpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing Engineer Weir ESCO Location : Newton, Mississippi Onsite Purpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • Newton, United States Project Self-Sufficiency Full time

    The Program Coordinator will coordinate all aspects of the **Smart Parents, Smart Communities** program, will supervise the Counselor/Parent Coach, Social Media Coordinator, and parent trainers, and will report to the Program Supervisor. In addition, the Program Coordinator will collect and use data to inform ongoing monitoring and improvement of the...


  • Newton, United States Thorlabs Full time

    Photonics Solutions Engineer (PSE), a key technical resource responsible for supporting our sales team in our technical sales activities.  The PSE also uses their technical expertise to act as a liaison between our customers and our engineering and manufacturing teams. Although the location of the position is in Newton, NJ, from time to time it may be...


  • Newton, United States Thorlabs Full time

    Photonics Solutions Engineer (PSE), a key technical resource responsible for supporting our sales team in our technical sales activities. The PSE also uses their technical expertise to act as a liaison between our customers and our engineering and manufacturing teams. Although the location of the position is in Newton, NJ, from time to time it may be...


  • Newton, United States Thorlabs Full time

    Job DescriptionJob DescriptionPhotonics Solutions Engineer (PSE), a key technical resource responsible for supporting our sales team in our technical sales activities. The PSE also uses their technical expertise to act as a liaison between our customers and our engineering and manufacturing teams.Although the location of the position is in Newton, NJ, from...


  • Newton, United States Partners Healthcare System Full time

    About Us Newton-Wellesley Hospital is a comprehensive medical center located in Newton. We provide the services and expertise of a major medical facility with the convenience and personal attention of a community hospital. Our team is committed to delivering high-quality, safe, efficient medical care to each of our patients. We are a proud member of Mass...


  • Newton, United States The Bradbury Company Full time

    Job DescriptionJob DescriptionThe Bradbury Co., Inc. is recognized around the world as the leader of quality roll forming solutions. We are a family-owned company located in Moundridge, KS. Our headquarters in Moundridge employs around 300 employees. Join our team!SUMMARYAn Electrical Designer at The Bradbury Company will develop schematics, specify...

  • Maintenance Manager

    2 weeks ago


    Newton, United States Graphic Packaging International, LLC Full time

    Job functions include but are not limited to the following. Works closely to communicate with the operational leader Provide direct supervision, technical direction, administrative management, and leadership in maintenance for the department. Respons Maintenance Manager, Maintenance, Reliability Engineer, Manager, Project Management, Industrial Engineer,...


  • Newton, United States InsideHigherEd Full time

    William James College seeks an experienced and skilled leader who is driven by the opportunity to develop organizational best practices that effectively communicate and support the College’s mission and strategic goals. The Director of Talent, Equity, and Culture will advance the College’s strategic plans as they relate to diversity, equity, inclusion,...


  • Newton, United States Project Self-Sufficiency Full time

    Job DescriptionJob DescriptionThe Program Coordinator will coordinate all aspects of the Smart Parents, Smart Communities program, will supervise the Counselor/Parent Coach, Social Media Coordinator, and parent trainers, and will report to the Program Supervisor. In addition, the Program Coordinator will collect and use data to inform ongoing monitoring and...


  • Newton, United States PaxeraHealth Full time

    SUMMARY: The Senior Image Processing Engineer works at a high level with the entire life cycle of data classification algorithms and image processing, specifically for medical imaging, including analyzing data, writing reports, developing methods for feature extraction and classification, and researches basic to advanced image processing algorithms, gives...