Lead Site Reliability Engineer

1 month ago


BJ's Club Support Center Marlborough MA, United States BJ's Wholesale Club Full time

Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.

The Benefits of working at BJ’s

•        BJ’s pays weekly

•        Generous time off programs to support busy lifestyles* 

                      o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty

•        Benefit plans for your changing needs*

                      o Three medical plans**, Health Reimbursement Account (HRA), Health Savings  Account (HSA), two dental plans, flexible spending

*eligibility requirements vary by position

**medical plans vary by location

As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platform's infrastructure and processes. Leveraging your expertise in observability tools such as New Relic, Scalyr/Splunk, bash scripts, and Python scripts, you will play a pivotal role in ensuring the reliability and performance of our Java microservices-based architecture.

Key Responsibilities:

  • Design and manage Java based microservices, bash scripts, Redis, High-Availability design,  while strictly adhering to Site Reliability Engineering (SRE) principles.
  • Thrive in high-pressure environments, working swiftly and reliably to maintain system integrity and meet service level objectives (SLOs) and service level indicators (SLIs).
  • Proactively identify and address potential issues before they impact operations, utilizing observability tools like New Relic, Scalyr/Splunk, bash scripts, and Python scripts.
  • Lead initiatives to enhance current systems and implement innovative solutions in collaboration with a fast-paced, mission-driven team, focusing on the implementation of SRE best practices.
  • Conduct thorough root-cause analyses for production incidents and generate high-quality RCA reports, leveraging SRE methodologies to prevent recurrence.
  • Apply software engineering principles to rectify operational challenges and optimize system performance, with a specific focus on implementing SRE-driven solutions.
  • Ensure the availability, latency, performance, efficiency, and security of our infrastructure, adhering rigorously to SRE principles and best practices.
  • Design and maintain robust production monitoring systems to ensure timely detection and resolution of issues, following SRE guidelines for effective monitoring and alerting.
  • Utilize a diverse array of tools to troubleshoot performance and stability issues effectively, employing SRE methodologies to identify and mitigate bottlenecks.
  • Evaluate and enhance application and environment security measures, integrating SRE-driven security practices into the development and deployment pipelines.
  • Provide support for globally distributed, multi-cloud (public and/or private) environments, implementing SRE strategies for resilience and fault tolerance.
  • Automate repetitive tasks at scale to streamline operational workflows and enhance efficiency, focusing on the implementation of SRE-driven automation solutions.
  • Adhere to change management processes during implementations and utilize version control for application infrastructure, following SRE principles for reliable and auditable change management.
  • Foster a SRE mindset throughout the organization, promoting collaboration and shared responsibility for reliability and performance

Qualifications:

  • Bachelor's Degree in Computer Science or related field, or foreign equivalent.
  • Demonstrated curiosity and self-drive to tackle complex challenges and drive change in a diverse organizational landscape.
  • Excellent written and verbal communication skills, with the ability to effectively communicate with engineering management, developers, and leadership.
  • Proven ability to adapt to new technologies and learn quickly.
  • Minimum of 5 years of experience in Site Reliability Engineering (SRE) or related roles.

Job Conditions:

  • Collaborate within a diverse and global team environment.
  • Participate in cross-training with other team members across different regions.
  • Rotate in an on-call schedule as required to ensure 24/7 availability and support for critical systems.

In accordance with the Pay Transparency requirements, the following represents a good faith estimate of the compensation range for this position. At BJ’s Wholesale Club, we carefully consider a wide range of non-discriminatory factors when determining salary. Actual salaries will vary depending on factors including but not limited to location, education, experience, and qualifications. The pay range for this position is starting from $109,000.00.

  • Marlborough, United States BJ's Wholesale Club Full time

    BJs Wholesale Club Lead Site Reliability Engineer BJ's Club Support Center Marlborough , Massachusetts Apply Now Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team...


  • Marlborough, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ's Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we're committed to providing outstanding service and...


  • Marlborough, United States BJ's Wholesale Club Full time

    Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ's Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we're committed to providing outstanding service and...


  • Michigan Center, United States Diverse Lynx Full time

    Site Reliability Engineer (SRE) Lead - Public Sector Core Framework team Remote About the Role Client is seeking a Lead Software Engineer to join our Public Sector Core Framework platform team and play a critical role as a Site Reliability Engineer (SRE) within our Azure/Kubernetes ecosystem. In this role, you will be responsible for ensuring the stability,...


  • Boston, MA, United States Dice Full time

    Dice is the leading career destination for tech experts at every stage of their careers. Our client, Motion Recruitment Partners, LLC, is seeking the following. Apply via Dice today! We are partnered with a a dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join...


  • Boston, MA, United States Biofourmis Full time

    Position Overview: Biofourmis is seeking a talented and experienced Site Reliability Engineer to join our dynamic global team. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and performance of our digital health platform. You will collaborate closely with cross-functional teams to design,...


  • Boston, MA, United States Soteriare Full time

    Apply locations Merrimack, NH Boston, MA time type Full time posted on Posted 5 Days Ago job requisition id 2093756 Job Description: As a member of the TechOps SRE team, you'll work closely with our engineering partners to help enable and drive initiatives from design to implementation. This is a phenomenal opportunity to have a direct impact on the...


  • Boston, MA, United States Motion Recruitment Partners LLC Full time

    We are partnered with aa dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join their grown DevOps team to ensure the reliability and performance of their highly scalable systems. You will work closely with software engineers to automate tooling and migrate...


  • Marlborough, MA, United States Raytheon Careers Full time

    *Date Posted:* 2024-03-13*Country:* United States of America*Location:* MA802: Marlborough, MA Building 1 1001 Boston Post Road Building 1, Marlborough, MA, 01752 USA*Position Role Type:* Hybrid*Collins Aerospace is looking for a Principal Systems Specialty Engineer Reliability, Maintainability & Testability to join our team in Marlborough, MA*. We are...

  • Reliability Engineer

    2 weeks ago


    Boston, MA, United States Sequoia Biotech Consulting Full time

    Responsibilities The GxP Reliability Engineer will provide reliability engineering support for all facilities, utilities systems and equipment including analytical instrumentation, R&D lab support equipment and systems. This role will facilitate the deployment of Maintenance and Reliability Best Practices for new and existing equipment, facilities, and...

  • Reliability Engineer

    2 weeks ago


    Boston, MA, United States Takeda Pharmaceutical Company Ltd Full time

    The ideal candidate will be responsible for improving the reliability of equipment, utilities, critical systems and maintenance processes by applying the principles of Reliability Centered Maintenance. In this role, you will lead reliability-driven actions that require independent and/or collaborative judgment to support capital projects. You will be working...


  • Boston, MA, United States Intelletec Full time

    About the Position We are looking for experienced engineers who understand AI systems, and are excited about becoming global leaders in a completely novel field. We need people that can work independently as part of a small team. You will be responsible for building the industry’s first end-to-end AI evaluation platform, starting with an offline...


  • Boston, MA, United States startus Full time

    WHAT YOU’LL DO As a member of a small cross functional squad, you’ll own a particular infrastructure challenge at Spotify Design and document systems, including writing and reviewing code, to automate away problems within your squad’s domain Undertake measured, methodical, troubleshooting of complicated systems under pressure Partake in an on-call...


  • Marlborough, United States Raytheon Full time

    Date Posted: 2024-03-13 Country: United States of America Location: MA802: Marlborough, MA Building 1 1001 Boston Post Road Building 1, Marlborough, MA, 01752 USA Position Role Type: Hybrid Collins Aerospace is looking for a Principal Systems Specialty Engineer Reliability, Maintainability & Testability to join our team in Marlborough, MA. We are seeking...


  • Marlborough, United States Raytheon Full time

    Date Posted: 2024-03-13 Country: United States of America Location: MA802: Marlborough, MA Building 1 1001 Boston Post Road Building 1, Marlborough, MA, 01752 USA Position Role Type: Hybrid Collins Aerospace is looking for a Principal Systems Specialty Engineer Reliability, Maintainability & Testability to join our team in Marlborough, MA. We are seeking...


  • Boston, MA, United States Takeda Pharmaceutical Company Ltd Full time

    We are currently seeking a Reliability Engineer III to join our team. The ideal candidate will be responsible for improving the reliability of equipment, utilities, critical systems and maintenance processes by applying the principles of Reliability Centered Maintenance. In this role, you will lead reliability-driven actions that require independent and/or...

  • Reliability Engineer

    2 weeks ago


    Raynham Center, United States Actalent Full time

    Qualifications A minimum of a bachelor's degree in Mechanical or Electrical Engineering. Statistical knowledge is essential to plan for proper statistical tests and test methods. TMV experience is a desirable Ability to perform statistical analysis and develop predictive statistical models is desirable. Experience working with/in NPD environment Experience...


  • Boston, MA, United States Alarm.com Full time

    Do you love working with the latest technologies? Excited about helping maintain, improve, and scale an environment that supports millions of customers and IoT devices? Passionate about code at scale? If the above holds true for you, then we would love to talk to you! Alarm.com is looking for a versatile Site Reliability Engineer to work on our Platform...


  • Andover, MA, United States Raytheon Careers Full time

    *Date Posted:* 2024-01-12*Country:* United States of America*Location:* MA112: Andover MA 358 Lowell St Dukes 358 Lowell Street Dukes, Andover, MA, 01810 USA*Position Role Type:* OnsiteAt Raytheon, the foundation of everything we do is rooted in our values and a higher calling – to help our nation and allies defend freedoms and deter aggression. We bring...


  • Harvard Square, MA, United States Takeda Pharmaceutical Full time

    By clicking the “Apply” button, I understand that my employment application process with Takeda will commence and that the information I provide in my application will be processed in line with Takeda’s Privacy Notice and Terms of Use . I further attest that all information I submit in my employment application is true to the best of my...