Lead Site Reliability Engineer

2 months ago


West Columbia, United States BJ's Wholesale Club Full time

Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.

The Benefits Of Working At BJ’s

BJ’s pays weekly Eligible for free BJ's Inner Circle and Supplemental membership(s)* Generous time off programs to support busy lifestyles* Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty Benefit plans for your changing needs* Three medical plans**, Health Savings Account (HSA), two dental plans, vision plan, flexible spending eligibility requirements vary by position medical plans vary by location

As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platform's infrastructure and processes. Leveraging your expertise in observability tools such as New Relic, Scalyr/Splunk, bash scripts, and Python scripts, you will play a pivotal role in ensuring the reliability and performance of our Java microservices-based architecture.

Key Responsibilities :

Design and manage Java based microservices, bash scripts, Redis, High-Availability design, while strictly adhering to Site Reliability Engineering (SRE) principles. Thrive in high-pressure environments, working swiftly and reliably to maintain system integrity and meet service level objectives (SLOs) and service level indicators (SLIs). Proactively identify and address potential issues before they impact operations, utilizing observability tools like New Relic, Scalyr/Splunk, bash scripts, and Python scripts. Lead initiatives to enhance current systems and implement innovative solutions in collaboration with a fast-paced, mission-driven team, focusing on the implementation of SRE best practices. Conduct thorough root-cause analyses for production incidents and generate high-quality RCA reports, leveraging SRE methodologies to prevent recurrence. Apply software engineering principles to rectify operational challenges and optimize system performance, with a specific focus on implementing SRE-driven solutions. Ensure the availability, latency, performance, efficiency, and security of our infrastructure, adhering rigorously to SRE principles and best practices. Design and maintain robust production monitoring systems to ensure timely detection and resolution of issues, following SRE guidelines for effective monitoring and alerting. Utilize a diverse array of tools to troubleshoot performance and stability issues effectively, employing SRE methodologies to identify and mitigate bottlenecks. Evaluate and enhance application and environment security measures, integrating SRE-driven security practices into the development and deployment pipelines. Provide support for globally distributed, multi-cloud (public and/or private) environments, implementing SRE strategies for resilience and fault tolerance. Automate repetitive tasks at scale to streamline operational workflows and enhance efficiency, focusing on the implementation of SRE-driven automation solutions. Adhere to change management processes during implementations and utilize version control for application infrastructure, following SRE principles for reliable and auditable change management. Foster a SRE mindset throughout the organization, promoting collaboration and shared responsibility for reliability and performance

Qualifications :

Bachelor's Degree in Computer Science or related field, or foreign equivalent. Demonstrated curiosity and self-drive to tackle complex challenges and drive change in a diverse organizational landscape. Excellent written and verbal communication skills, with the ability to effectively communicate with engineering management, developers, and leadership. Proven ability to adapt to new technologies and learn quickly. Minimum of 5 years of experience in Site Reliability Engineering (SRE) or related roles.

Job Conditions :

Collaborate within a diverse and global team environment. Participate in cross-training with other team members across different regions. Rotate in an on-call schedule as required to ensure 24/7 availability and support for critical systems.

In accordance with the Pay Transparency requirements, the following represents a good faith estimate of the compensation range for this position. At BJ’s Wholesale Club, we carefully consider a wide range of non-discriminatory factors when determining salary. Actual salaries will vary depending on factors including but not limited to location, education, experience, and qualifications. The pay range for this position is starting from $109,000.00. #J-18808-Ljbffr



  • Columbia, United States Prestige Staffing Full time

    We are looking for a Site Reliability Engineer to work on different projects to maintain the network and infrastructure of internal and external systems ranging from providing site hosting design, maintenance, scalability, and reliability. This position provides performance, operability, troubleshooting, and network connectivity for our systems. Site...


  • Columbia, United States Prestige Staffing Full time

    We are looking for a Site Reliability Engineer to work on different projects to maintain the network and infrastructure of internal and external systems ranging from providing site hosting design, maintenance, scalability, and reliability. This position provides performance, operability, troubleshooting, and network connectivity for our systems. Site...


  • Columbia, United States Geon Technologies Full time

    Geon Technologies is a rapidly growing small business that provides signal processing and sensor system integration services to the United States Government (USG) and the industry base that supports them.  Geon seeks to be known for “signals, sensors, and systems”.  Geon has expertise in the science and development of signal processing techniques...


  • West Columbia, United States Eastman Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Founded in 1920, Eastman is a global specialty materials company that produces a broad range of products found in items people use every day. With the purpose of enhancing the quality of life in a material way, Eastman works with customers to deliver innovative products...


  • Columbia, United States Geon Technologies Full time

    Geon Technologies is a rapidly growing small business that provides signal processing and sensor system integration services to the United States Government (USG) and the industry base that supports them.  Geon seeks to be known for “signals, sensors, and systems”.  Geon has expertise in the science and development of signal processing techniques...


  • West Palm Beach, Florida, United States CaptivateIQ Full time

    The Site Reliability Engineering team in CaptivateIQ operates horizontally across the engineering organization, supporting our development teams by providing them with the tools and processes they need to operate in a frictionless manner.5+ years of experience in an SRE or DevOps roleExperience with Infrastructure as Code tools such as TerraformExperience...


  • Columbia, United States Edwards Full time

    Functional area: Research and Development Country: Korea, Republic of On-Site/Remote: On-Site Company name: Edwards Korea Ltd Date of Posting: July 2, 2024 [Main Responsibilities] The individual’s responsibilities will include, though not be limited to, all of the followings: - Prepares reliability schemes for products/components for fit, form and function...

  • Reliability Engineer

    1 month ago


    Columbia, United States Jones Hamilton Co. Full time

    Jones-Hamilton Co., a manufacturer and distributor of chemical products used in food processing, agriculture, consumer, and industrial product markets is seeking full time Reliability Engineer for the Richburg, SC facility. The Reliability Engineer will ensure the ongoing reliability of the Richburg site by working with Operations and Material Handling to...

  • Reliability Engineer

    1 month ago


    West Point, United States Avature Full time

    Reliability Engineer – West Point, VA The Opportunity: The Mechanical Reliability Engineer will act in both a supervisory and staff capacity to promote and advance the Reliability program across the entire mill in the following manner: Use equipment failure analyses to overcome repetitive maintenance and repair problems to reduce maintenance costs. Use...


  • Columbia, United States CyberCore Technologies Full time

    Site Reliability Engineer 2 - is expected to have a good understanding of the software development lifecycle, know automation tools for developing digital pipelines (CI-Continuous Integration / CD - Continuious Deployment), and have classical system administration experience. They are expected to work across departments with managers, developers, and...

  • Reliability Engineer

    3 months ago


    Helena-West Helena, United States Troy Corporation Full time

    Overview Arxada is a pioneering leader with a powerful legacy: Over 120 years of creating specialty chemicals and solutions that potentiate the performance of our customers and their products. Our solutions, our expertise and our support help them to perform better - to be more efficient, more effective and more sustainable. We are passionate...


  • West Chester, United States Siri InfoSolutions Inc Full time

    Job DescriptionJob DescriptionJOB DESCRIPTIONEXPERIENCEDesired: 8 to 12 years of experience in field of Reliability Engineering preferably from medical devices / Industrial/Automotive/Aero domains. EDUCATION / CERTIFICATIONBachelor's degree in Electronics & Communication or Electrical & Electronics Engineering or Mechanical Engineering or Mechatronics...


  • West Chester, Pennsylvania, United States Siri InfoSolutions Inc Full time

    Position OverviewJOB SUMMARYWe are seeking a seasoned professional with a robust background in Reliability Engineering, particularly within the realms of medical devices, industrial applications, automotive, or aerospace sectors.EXPERIENCE REQUIREDIdeal candidates will possess between 8 to 12 years of relevant experience.EDUCATIONAL BACKGROUNDBachelor's...


  • West Bountiful, United States System One Full time

    System One is seeking a Reliability Engineer II for a long-term contract opportunity. We’re seeking an experienced Fixed Equipment Engineer with 3-5 years of refinery experience to support the company's refinery reliability and turnaround groups. This on-site role follows a 9/80 schedule. Education : Minimum of a Bachelor’s Degree in Engineering Skills...


  • Columbia, South Carolina, United States Geon Technologies Full time

    About Geon Technologies: We are a dynamic small enterprise focused on signal processing and sensor system integration services, primarily catering to the United States Government and its industry collaborators. Our mission is to push the boundaries in the realms of signals, sensors, and systems.Role Overview: As a Site Reliability Engineer, you will be...


  • West Columbia, South Carolina, United States Terracon Full time

    Job OverviewGeneral Responsibilities:Are you passionate about delivering engineering and advisory services across a diverse range of construction initiatives? This position involves leading field assessments, conducting observations and inspections, drafting, preparing, and reviewing reports for the services rendered, and acting as the project advisor for...

  • Civil Engineer

    2 weeks ago


    West Columbia, United States Lincoln Search Consultants Inc Full time

    Civil Engineer / Site Development Columbia, SC Site Design for Single Family Residential, Commercial, Industrial, Multifamily CLIENT INFORMATION: Our client is a regional leader in civil engineering, transportation, land surveying, and landscape architecture services. They have been providing their services since 1963. This is a privately held company...


  • West Chester, United States Astrix Full time

    Our client is a trusted worldwide brand, servicing the Oil and Gas, Personal Care, Household Products, and Institutional industries. This is a great place to gain real world experience in the chemical manufacturing space while being provided an opportunity to learn and grow in your career. They are seeing a Manufacturing Maintenance & Reliability Engineer to...


  • West Bountiful, United States Airswift Full time

    Location: West Bountiful, Utah, United States Function: Maintenance and Reliability Job Type: Contract Date Added: July 24, 2024 On-site Schedule: 9/80 Job Description: We are looking for an experienced Fixed Equipment Engineer to support the refinery reliability and turn-around groups, with a preferred 3-5 years refinery experience. Turnaround Experience is...


  • Columbia, Missouri, United States MidwayUSA Full time

    Are you an experienced Site Reliability Engineer dedicated to maintaining the integrity, uptime, and efficiency of complex software systems and infrastructure? If you have a keen interest in the outdoors and a passion for the Hunting and Shooting sector, we invite you to explore the Site Reliability Engineer opportunity at MidwayUSA. This role is based at...