Site Reliability Engineering Director

2 weeks ago


Newton MA, United States Bright Horizons Full time

The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance complex systems and applications' performance, scalability, and reliability. The Director of SRE is responsible for developing and implementing strategies to optimize our technologys reliability and uptime, managing incident response, and ensuring consistent use of best practices in automation, monitoring, and incident management. This role requires a deep understanding of cloud technologies, distributed systems, DevOps, Software Engineering, Automation / Scripting, Observability, App Support / Monitoring, and a proactive approach to preventing and mitigating potential issues. The Director of SRE must also foster a culture of innovation, continuous improvement, and collaboration within the team to meet the organization's evolving needs and deliver a superior digital experience to users.

What you will be doing:

  • Strategy and Planning Develop and implement a comprehensive strategy for site reliability, encompassing scalability, performance, and reliability improvements. Align SRE objectives with overall business goals and technology roadmaps. Foster the spirit of continuous improvement to the SRE and position it to benefit the organizational objectives.
  • Leadership and Team Management Provide strong leadership to the Site Reliability Engineering (SRE) team, fostering a culture of collaboration, innovation, and continuous improvement. Recruit, mentor, and develop a high-performing team of SRE professionals. Engrave a can do attitude into the team out of the box, combined with a passion for automation and engineering excellence.
  • Operational Excellence Oversee day-to-day operations of the SRE team, ensuring the reliability and availability of digital infrastructure. Establish and enforce best practices for incident response, monitoring, automation, and system reliability. Do so by incorporating tools and technologies that create a 36-degree view of the SRE efficiency, including but not limited to DevOps, App Support, Monitoring, Incident Management, Observability, Network/Infra/InfoSec, and Enterprise Architecture.
  • Collaboration Collaborate with teams across our lines of business, including development, DevOps, App Support, Monitoring, Network/Infra/InfoSec, and Enterprise Architecture, to drive a unified approach to site reliability that optimizes the work of all those teams and improves time-to-market for all respective objectives. Foster strong relationships with the leadership and partnering delivery organizations to align SRE efforts with organizational goals.
  • Monitoring and Alerting Implement robust monitoring and alerting systems to proactively identify potential issues, analyze system performance, and facilitate quick response to incidents.
  • Automation and Efficiency Drive the development and implementation of automation solutions to streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
  • System Capacity Planning Work closely with infrastructure and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand. Anticipate growth and scalability requirements.
  • Incident Management Establish and oversee effective SRE-focused incident response processes, ensure timely incident resolution, and conduct post-mortems to identify root causes and implement preventive measures.

What we hope you will bring to this role?

  • Bachelor's degree in computer science, Engineering, or related field.
  • A minimum of 10 years of experience, including at least 3 years in the SRE or DevOps field, with a proven track record of progressively increasing responsibilities and leadership roles.
  • Demonstrated ability to think strategically and develop a vision for site reliability engineering aligned with the organization's business objectives.
  • Strong leadership and people management skills, including experience leading and developing high-performing teams.
  • A 'can do' attitude is necessary, combined with a deep belief that everything can be automated and systems must always be functional.
  • Strong experience and understanding of software engineering, scripting, build/deployment pipelines, Infrastructure as Code, and SLA/SLO/SLIs.
  • Strong understanding of cloud computing platforms (Azure required, Google Cloud a plus), including lift-and-shift environments (VMs, etc.) and cloud-native setups (AKS, serverless, etc.).
  • Strong understanding and experience in automation tools and programming/scripting/descriptive languages (e.g., C#, PowerShell, Python, Bash, Terraform, JavaScript) to develop and implement automated system reliability and performance solutions.
  • Strong understanding of observability, monitoring, and alerting tools (e.g., Azure AppInsights, Data Dog, Splunk, etc.) and the ability to design and implement effective monitoring strategies.
  • Technical leadership skills, including technical collaboration/communication, problem-solving, and project management, are needed to lead the SRE team in delivering its objectives.
  • Preference may be given to candidates with relevant certifications demonstrating cloud and reliability engineering expertise.

by Jobble

#J-18808-Ljbffr

  • Newton, United States Bright Horizons Full time

    The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance...


  • Newton, United States Bright Horizons Full time

    The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance...


  • Newton, United States Bright Horizons Full time

    The Director of Site Reliability Engineering (SRE) will play a pivotal role in ensuring the seamless and reliable operation of consumer and customer-facing digital infrastructure across our lines of business. This leadership position involves overseeing a team of skilled SRE professionals and collaborating closely with cross-functional teams to enhance...

  • Reliability Engineer

    2 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    2 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    2 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...

  • Reliability Engineer

    2 weeks ago


    Newton, United States Movement Search & Delivery Full time

    Job Overview: We are seeking a highly skilled and experienced Reliability Engineer to join our team. In this role, you will provide expert consultation and guidance on complex processes and procedures to ensure operational reliability and efficiency. You will be responsible for conducting detailed calculations, analyzing costs, and making cost-effective...


  • Still River, MA, 01467, Worcester County, MA, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...

  • Director of Sales

    4 weeks ago


    Newton, United States Benchmark Senior Living Full time

    There is an exciting opportunity for a sales professional to join the team at Adelaide of Newton Centre in Newton, MA as their Director of Community Relations. As a Sales Director for Benchmark Senior Living, say goodbye to transactional relationships with prospects, telephone call blitzes and sales urgency built around promotions. Instead you will drive...


  • NEWTON, MA, United States Burns & McDonnell Full time

    DescriptionBurns & McDonnell is a company comprised of more than 14,000 engineers, architects, construction professionals, scientists, consultants, and entrepreneurs with offices across the country and throughout the world. Burns & McDonnell’s Mission Critical team was ranked #2 by the ENR in 2023. We have exciting opportunities for those interested in...


  • Marlborough, MA, 01752, Middlesex County, MA, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • NEWTON, MA, United States Burns & McDonnell Full time

    DescriptionThe Senior Mechanical Engineer will lead mechanical design project teams to create world class designs for new projects, alterations and redevelopments on a variety of projects including power, process, corporate, healthcare, pipeline, airports, institutional, industrial, manufacturing, government and military facilities. The Senior Mechanical...

  • DevOps Engineer/SRE

    6 days ago


    Boston, MA, United States codeforce360 Full time

    Required Skills:Elastic Search, LogStash Monitoring tool--Datadog, Splunk, Dynatrace, Prometheus AWSBasic QualificationsStrong communication required Elasticsearch/Logstash (ELK stack) Monitoring/alert dashboard integrationJob Description:Client has an immediate opening for a Site Reliability/DevOps Engineer with a leading client.Our client's Advanced...

  • Design Engineer I

    3 weeks ago


    Newton, United States Terex Full time

    Description : About CBI: Since 1988, CBI machines have been purpose-built to outproduce, outperform, and outlast anything in the market. CBI’s tradition of quality runs through complete lines of horizontal grinder and industrial woodchippers, attachments for composting, forestry, biomass recovery, and wood-waste processing. We offer a...


  • Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing EngineerWeir ESCOLocation : Newton, MississippiOnsitePurpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • Lowell, MA, United States Charles River Laboratories Full time

    Job SummaryThe Senior RN Manager, Site Operations at the Lowell site works under the supervision of the Executive Director, Donor Room Operations. This individual will manage daily operational activities and be an active member of the donor room using apheresis skills to help safely screen and collect consented donors, working within quality guidelines set...

  • Manufacturing Engineer

    10 hours ago


    Newton, United States The Weir Group PLC Full time

    Sr. Manufacturing Engineer Weir ESCO Location : Newton, Mississippi Onsite Purpose of Role: This job will engage and drive engineering improvements to the facility in terms of process and/or equipment enhancement and drive technology advancements into the daily operations. A person in this role will connect with all departments to understand processes and...


  • United States, MA, Norwell Clean Harbors Full time

    Clean Harbors located in Norwell, MA is looking for a Licensed Site Professional to join our team. The Licensed Site Professional is responsible for managing response actions involving site assessment and remediation. The successful candidate will be responsible for overseeing and documenting response actions involving Immediate Response Actions, Release...


  • Burlington, MA, United States Keurig Dr Pepper Full time

    Job OverviewThe Coffee Supply Chain Engineering team is focused on providing broad, strategic technical vision for Keurig Dr Pepper’s (KDP) coffee supply chain by identifying breakthrough technologies, developing supply chain technical strategies and initiating/maintaining engineering standards and specifications to facilitate growth and value capture. ...