Lead Site Reliability Engineer

4 weeks ago


Atlanta, Georgia, United States Bose Full time
Job Title: Lead Site Reliability Engineer

At Bose, we're passionate about making sound matter. Our Information Technology team is dedicated to delivering valuable and reliable business and technology solutions. We're seeking a Lead Site Reliability Engineer to join our team and lead the way in ensuring the reliability and performance of our systems.

Key Responsibilities:
  • Lead and mentor a team of Site Reliability Engineers, providing guidance and support to ensure the team's success.
  • Foster a culture of collaboration, continuous improvement, and innovation within the team.
  • Define and communicate clear goals and objectives for the SRE team, aligning with overall business objectives.
  • Develop and execute strategies to improve system reliability, availability, and performance.
  • Drive the adoption of best practices and standards for SRE across the organization.
  • Participate in and lead strategic planning for capacity management, disaster recovery, and infrastructure investments.
  • Lead post-incident reviews to identify root causes and implement preventive measures.
  • Develop and enforce incident response procedures and runbooks.
  • Collaborate with engineering and architecture teams to design scalable and resilient system architectures.
  • Optimize system performance and reliability through proactive monitoring, tuning, and enhancements.
  • Evaluate and implement new technologies and tools to improve system capabilities and efficiency.
  • Drive the automation of operational processes to improve efficiency and reduce manual intervention.
  • Oversee the development and maintenance of tools for deployment, monitoring, and configuration management.
  • Promote the use of Infrastructure-as-Code (IaC) and Continuous Integration/Continuous Deployment (CI/CD) practices.
  • Lead efforts in capacity planning to ensure infrastructure can support current and future business needs.
  • Design and implement scaling strategies to handle variations in demand and growth.
  • Monitor and optimize resource utilization to balance performance and cost-effectiveness.
  • Work closely with cross-functional teams, including development, operations, and product management, to ensure alignment on reliability and performance goals.
  • Communicate effectively about system status, performance metrics, and ongoing improvements to stakeholders.
  • Provide technical guidance and support to other teams as needed.
  • Ensure thorough documentation of systems, processes, and procedures.
  • Create and maintain operational runbooks, knowledge base articles, and training materials.
  • Share knowledge and best practices with the team and organization through training sessions and workshops.
Requirements:
  • Advanced proficiency in scripting and programming languages such as Python, Go, Bash, or Java.
  • Extensive experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog).
  • In-depth knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Strong familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud).
  • Expertise in configuration management and Infrastructure-as-Code tools (e.g., Terraform, Ansible).
  • Strong understanding of networking, distributed systems, and databases.
  • Proven ability to lead and manage technical teams effectively.
  • Excellent problem-solving, analytical, and communication skills.
Experience Requirements:
  • Experience: 5+ years of experience in Site Reliability Engineering, Systems Engineering, or related roles, with at least 2 years in a leadership or management capacity.
Education/Certification Requirements:
  • Education: Bachelor's degree in Computer Science, Engineering, or a related field. Advanced degree or relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) preferred.

Bose is an equal opportunity employer that is committed to inclusion and diversity. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, genetic information, national origin, age, disability, veteran status, or any other legally protected characteristics.

Please note, the company's pay transparency is available at Bose Pay Transparency. We are committed to working with and providing reasonable accommodations to individuals with disabilities. If you need a reasonable accommodation because of a disability for any part of the application or employment process, please send an e-mail to [email protected] and let us know the nature of your request and your contact information.



  • Atlanta, Georgia, United States Navtech Full time

    Job Title: Site Reliability EngineerJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Navtech. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our production systems.Key Responsibilities:Provide L4 technical support for production 24x7Design and...


  • Atlanta, Georgia, United States Geotab Full time

    Job Title: Site Reliability EngineerGeotab is a global leader in IoT and connected transportation, and we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based infrastructure. You will work closely with our development team to...


  • Atlanta, Georgia, United States Ditto Job Board Full time

    Job Title: Site Reliability EngineerAt Ditto, we're on a mission to unleash the full power of edge devices by removing all the plumbing required to build amazing applications. As a Site Reliability Engineer, you'll play a critical role in helping us achieve this goal.About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Federal...


  • Atlanta, Georgia, United States Jonas Software UK Full time

    About the Role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Jonas Software UK. As a key member of our technical operations team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • Atlanta, Georgia, United States JobRialto Full time

    Job SummaryThe Site Reliability Engineer is responsible for ensuring the availability, scalability, and performance of critical services and systems. This role requires expertise in OpenShift and CloudFormation, along with a deep understanding of site reliability principles, container technologies, monitoring tools, and automation.Key ResponsibilitiesEnsure...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    We are seeking a highly skilled Senior Site Reliability Engineer to join our Windows Servicing and Delivery team at Microsoft Corporation.The ideal candidate will have a strong background in software engineering, network engineering, or systems administration, with a proven track record of delivering high-quality solutions that meet customer needs.As a...


  • Atlanta, Georgia, United States Geotab Full time

    About GeotabGeotab is a global leader in IoT and connected transportation, certified as a Great Place to WorkTM. We are a company of diverse and talented individuals who work together to help businesses grow and succeed, and increase the safety and sustainability of our communities.Our team is growing, and we're looking for people who follow their passion,...


  • Atlanta, Georgia, United States Diverse Lynx Full time

    Role Overview: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a key member of our technical team, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities: * Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into...


  • Atlanta, Georgia, United States Resource Informatics Group Inc Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group Inc. As a key member of our SRE team, you will be responsible for designing and implementing automated solutions to ensure the reliability and scalability of our applications.Key Responsibilities:Design and implement automated pipelines and...


  • Atlanta, Georgia, United States Della Infotech Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Della Infotech. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using AWS...


  • Atlanta, Georgia, United States STORD Full time

    About the RoleStord is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing and implementing scalable, efficient, and secure infrastructure and platform solutions.You will collaborate with cross-functional teams to deliver high-quality products and services to our...


  • Atlanta, Georgia, United States Kobiton Full time

    About the RoleKobiton is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our systems and services.You will work closely with development and operations teams to build and maintain robust infrastructure, automate...


  • Atlanta, Georgia, United States Now100 Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Now100. As a Site Reliability Engineer, you will be responsible for building and supporting the platform/application infrastructure of one of the largest retailers in the world.Key Responsibilities:Maintain high site uptime/availability while embracing rapid change...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Microsoft Corporation. As a key member of our Windows Servicing and Delivery team, you will be responsible for ensuring the reliability and performance of our product offerings, including Windows client, Windows Update, and Windows Autopatch.Key Responsibilities...


  • Atlanta, Georgia, United States Capgemini Engineering Full time

    Job Title: Site Reliability EngineerAt Capgemini Engineering, we're seeking a seasoned Site Reliability Engineer to join our Trade Distribution System (TDS) software development team. As a key member of our team, you'll be responsible for advancing and enhancing reliability practices, with a strong focus on testing, monitoring, and maintaining system...


  • Atlanta, Georgia, United States Cynet Systems Full time

    Job Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Cynet Systems. The ideal candidate will have a strong background in application development, architecture, and consulting, with a proven track record of performing assessments and providing roadmaps with project plans.The successful candidate will have a good...


  • Atlanta, Georgia, United States Now100 Full time

    Job Title: Site Reliability EngineerNow100 is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our product team, you will be responsible for building and growing the skillsets of junior engineers while maintaining high site uptime and availability.Key Responsibilities:Design and implement scalable and reliable...


  • Atlanta, Georgia, United States Now100 Full time

    Job Title: Site Reliability Engineer - Cloud Infrastructure SpecialistCompany Overview: Now100 is a leading provider of technology solutions, committed to delivering exceptional results for our clients. We match thoroughly vetted resources to contract, contract-to-hire, and permanent positions in all industries.Job Description: We are seeking a highly...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    About the RoleMicrosoft Corporation is seeking a highly skilled Senior Site Reliability Engineering Manager to lead the delivery of critical features in Office 365 government cloud offerings. As a key member of the Office 365 team, you will be responsible for combining your passion for quality, reliability, and creativity to drive evolution in the continuous...


  • Atlanta, Georgia, United States UKG (Ultimate Kronos Group) Full time

    About the RoleAs a Principal Site Reliability Engineer at UKG, you will play a critical role in ensuring the reliability and efficiency of our cloud-based services. You will be responsible for designing, implementing, and maintaining scalable and highly available systems, as well as developing software solutions to enhance our service delivery processes.You...