Lead Site Reliability Engineer

4 weeks ago


BJ's Club Support Center Marlborough MA, United States BJ's Wholesale Club Full time

Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.

The Benefits of working at BJ’s

•        BJ’s pays weekly

•        Eligible for free BJ's Inner Circle and Supplemental membership(s)*

•        Generous time off programs to support busy lifestyles* 

                      o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty

•        Benefit plans for your changing needs*

                      o Three medical plans**, Health Savings  Account (HSA), two dental plans, vision plan, flexible spending ​

•        401(k) plan with company match (must be at least 18 years old)

*eligibility requirements vary by position

**medical plans vary by location

As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platform's infrastructure and processes. Leveraging your expertise in observability tools such as New Relic, Scalyr/Splunk, bash scripts, and Python scripts, you will play a pivotal role in ensuring the reliability and performance of our Java microservices-based architecture.

Key Responsibilities:

  • Design and manage Java based microservices, bash scripts, Redis, High-Availability design,  while strictly adhering to Site Reliability Engineering (SRE) principles.
  • Thrive in high-pressure environments, working swiftly and reliably to maintain system integrity and meet service level objectives (SLOs) and service level indicators (SLIs).
  • Proactively identify and address potential issues before they impact operations, utilizing observability tools like New Relic, Scalyr/Splunk, bash scripts, and Python scripts.
  • Lead initiatives to enhance current systems and implement innovative solutions in collaboration with a fast-paced, mission-driven team, focusing on the implementation of SRE best practices.
  • Conduct thorough root-cause analyses for production incidents and generate high-quality RCA reports, leveraging SRE methodologies to prevent recurrence.
  • Apply software engineering principles to rectify operational challenges and optimize system performance, with a specific focus on implementing SRE-driven solutions.
  • Ensure the availability, latency, performance, efficiency, and security of our infrastructure, adhering rigorously to SRE principles and best practices.
  • Design and maintain robust production monitoring systems to ensure timely detection and resolution of issues, following SRE guidelines for effective monitoring and alerting.
  • Utilize a diverse array of tools to troubleshoot performance and stability issues effectively, employing SRE methodologies to identify and mitigate bottlenecks.
  • Evaluate and enhance application and environment security measures, integrating SRE-driven security practices into the development and deployment pipelines.
  • Provide support for globally distributed, multi-cloud (public and/or private) environments, implementing SRE strategies for resilience and fault tolerance.
  • Automate repetitive tasks at scale to streamline operational workflows and enhance efficiency, focusing on the implementation of SRE-driven automation solutions.
  • Adhere to change management processes during implementations and utilize version control for application infrastructure, following SRE principles for reliable and auditable change management.
  • Foster a SRE mindset throughout the organization, promoting collaboration and shared responsibility for reliability and performance

Qualifications:

  • Bachelor's Degree in Computer Science or related field, or foreign equivalent.
  • Demonstrated curiosity and self-drive to tackle complex challenges and drive change in a diverse organizational landscape.
  • Excellent written and verbal communication skills, with the ability to effectively communicate with engineering management, developers, and leadership.
  • Proven ability to adapt to new technologies and learn quickly.
  • Minimum of 5 years of experience in Site Reliability Engineering (SRE) or related roles.

Job Conditions:

  • Collaborate within a diverse and global team environment.
  • Participate in cross-training with other team members across different regions.
  • Rotate in an on-call schedule as required to ensure 24/7 availability and support for critical systems.

In accordance with the Pay Transparency requirements, the following represents a good faith estimate of the compensation range for this position. At BJ’s Wholesale Club, we carefully consider a wide range of non-discriminatory factors when determining salary. Actual salaries will vary depending on factors including but not limited to location, education, experience, and qualifications. The pay range for this position is starting from $109,000.00.

  • Marlborough, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • Marlborough, Massachusetts, United States BJ's Wholesale Club Full time

    Job SummaryBJ's Wholesale Club is seeking a highly skilled Lead Site Reliability Engineer to join our team. As a key member of our ecommerce platform's infrastructure team, you will be responsible for designing, building, and continuously improving our infrastructure and processes.Key ResponsibilitiesDesign and manage Java-based microservices, bash scripts,...


  • Boston, MA , USA, United States Insight Global Full time

    Site Reliability Engineering ManagerA leading retail company in the $7 billion industry is seeking a Site Reliability Engineering Manager to lead a team of 7-10 Site Reliability Engineers in Boston, MA.Key Responsibilities:Lead a team of Site Reliability Engineers in supporting and monitoring production for the eCommerce platform.Develop and implement...

  • Reliability Engineer

    4 weeks ago


    Marlborough, Massachusetts, United States DuPont Full time

    About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at DuPont. As a key member of our maintenance and reliability team, you will play a critical role in ensuring the success of our Process Safety Management (PSM)/MIQA and reliability programs.Key ResponsibilitiesDevelop and implement programs and systems to improve equipment...


  • Newton, MA, USA, United States Intelliswift Software Inc Full time

    Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our dynamic team at Intelliswift Software Inc. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance.Key Responsibilities:System Monitoring and Incident Response: Monitor...

  • Reliability Engineer

    4 weeks ago


    Marlborough, Massachusetts, United States DuPont Full time

    Job Title: Reliability EngineerWe are seeking a highly skilled Reliability Engineer to join our team at DuPont. As a Reliability Engineer, you will play a critical role in ensuring the success of our Process Safety Management (PSM) and Mechanical Integrity Quality Assurance (MIQA) programs.Key Responsibilities:Develop and implement programs and systems to...


  • Newton, MA, USA, United States Intelliswift Full time

    Job Title: Site Reliability EngineerJob Summary:We are seeking a skilled Site Reliability Engineer to join our dynamic team at Intelliswift. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance.Key Responsibilities:System Monitoring and Incident...


  • Newton, MA, USA, United States Cypress HCM Full time

    Job SummaryWe are seeking a skilled Site Reliability Engineer to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance.Key Responsibilities:System Monitoring and Incident Response: Monitor system health, performance metrics, and...


  • Newton, MA, United States Intelliswift Full time

    Site Reliability Engineer 2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team at Intelliswift. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance.Key Responsibilities: System Monitoring and Incident Response:...


  • Newton, MA, United States Intelliswift Software Full time

    Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...

  • Reliability Engineer

    4 weeks ago


    Marlborough, Massachusetts, United States DuPont Full time

    Job SummaryWe are seeking a highly skilled Reliability Engineer to join our team at DuPont. As a key member of our Process Safety Management (PSM) and Mechanical Integrity Quality Assurance (MIQA) teams, you will be responsible for ensuring the success of our equipment reliability programs.Your primary focus will be on improving site Predictive/Preventive...


  • Newton, MA, USA, United States CyberArk Full time

    About CyberArkCyberArk is the global leader in Identity Security, providing the most comprehensive security offering for any identity - human or machine - across business applications, distributed workforces, hybrid cloud workloads, and throughout the DevOps lifecycle. The world's leading organizations trust CyberArk to help secure their most critical...


  • Worcester, MA, United States CapstoneONE Search Full time

    We are representing a globally recognized industrial manufacturing organization who is actively seeking a Lead Reliability Engineer/Plant Engineer due to a recently announced retirement. Reporting to the Director of Engineering, this position will be responsible for strategically leading reliability programs, projects, and department for the plant. This is a...

  • Reliability Engineer

    3 weeks ago


    Marlborough, Massachusetts, United States DuPont Full time

    Reliability Engineer Job DescriptionAt DuPont, we are working on things that matter; whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world.We are excited to share that on May 22, 2024,...

  • SR IT Cloud Engineer

    2 months ago


    BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...


  • Newton, MA, USA, United States Software Guidance and Assistance, Inc. Full time

    Job Title: Site Reliability EngineerSoftware Guidance and Assistance, Inc. (SGA) is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure.Key Responsibilities:System Monitoring and Incident Response:...


  • BJ's Club Support Center Marlborough, MA #5997, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service...