Lead Site Reliability Engineer
1 month ago
Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.
The Benefits of working at BJ’s
• BJ’s pays weekly
• Generous time off programs to support busy lifestyles*
o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty
• Benefit plans for your changing needs*
o Three medical plans**, Health Reimbursement Account (HRA), Health Savings Account (HSA), two dental plans, flexible spending
*eligibility requirements vary by position
**medical plans vary by location
As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platform's infrastructure and processes. Leveraging your expertise in observability tools such as New Relic, Scalyr/Splunk, bash scripts, and Python scripts, you will play a pivotal role in ensuring the reliability and performance of our Java microservices-based architecture.
Key Responsibilities :
- Design and manage Java based microservices, bash scripts, Redis, High-Availability design, while strictly adhering to Site Reliability Engineering (SRE) principles.
- Thrive in high-pressure environments, working swiftly and reliably to maintain system integrity and meet service level objectives (SLOs) and service level indicators (SLIs).
- Proactively identify and address potential issues before they impact operations, utilizing observability tools like New Relic, Scalyr/Splunk, bash scripts, and Python scripts.
- Lead initiatives to enhance current systems and implement innovative solutions in collaboration with a fast-paced, mission-driven team, focusing on the implementation of SRE best practices.
- Conduct thorough root-cause analyses for production incidents and generate high-quality RCA reports, leveraging SRE methodologies to prevent recurrence.
- Apply software engineering principles to rectify operational challenges and optimize system performance, with a specific focus on implementing SRE-driven solutions.
- Ensure the availability, latency, performance, efficiency, and security of our infrastructure, adhering rigorously to SRE principles and best practices.
- Design and maintain robust production monitoring systems to ensure timely detection and resolution of issues, following SRE guidelines for effective monitoring and alerting.
- Utilize a diverse array of tools to troubleshoot performance and stability issues effectively, employing SRE methodologies to identify and mitigate bottlenecks.
- Evaluate and enhance application and environment security measures, integrating SRE-driven security practices into the development and deployment pipelines.
- Provide support for globally distributed, multi-cloud (public and/or private) environments, implementing SRE strategies for resilience and fault tolerance.
- Automate repetitive tasks at scale to streamline operational workflows and enhance efficiency, focusing on the implementation of SRE-driven automation solutions.
- Adhere to change management processes during implementations and utilize version control for application infrastructure, following SRE principles for reliable and auditable change management.
- Foster a SRE mindset throughout the organization, promoting collaboration and shared responsibility for reliability and performance
Qualifications :
- Bachelor's Degree in Computer Science or related field, or foreign equivalent.
- Demonstrated curiosity and self-drive to tackle complex challenges and drive change in a diverse organizational landscape.
- Excellent written and verbal communication skills, with the ability to effectively communicate with engineering management, developers, and leadership.
- Proven ability to adapt to new technologies and learn quickly.
- Minimum of 5 years of experience in Site Reliability Engineering (SRE) or related roles.
Job Conditions :
- Collaborate within a diverse and global team environment.
- Participate in cross-training with other team members across different regions.
- Rotate in an on-call schedule as required to ensure 24/7 availability and support for critical systems.
-
Site Reliability Engineer
14 hours ago
New York, United States Apollo Solutions Full timeSite Reliability Engineer - Web3 Apollo Solutions have partnered with an innovative web3 start-up backed by top tier venture capital with a strong runway. They are looking to revolutionize the way way we with about the application of web3 and have already made significant inroads into the gaming, entertainment and finance industries. In this role, you will...
-
Site Reliability Engineer
10 hours ago
New York, United States Apollo Solutions Full timeSite Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...
-
Site Reliability Engineering Manager
2 weeks ago
New York, United States developrec Full timeSRE Lead/Manager | San Diego, CA | Full-time Role Overview: As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently standing at around one million. You'll spearhead our cloud infrastructure strategy...
-
Site Reliability Engineer
7 hours ago
New York, United States EVONA Full timeJoin Our Client's Team as a Site Reliability Engineer (SRE) Are you passionate about ensuring the reliability and stability of cutting-edge infrastructure? Do you thrive in collaborative environments where your ideas are valued and your contributions make a real impact? If so, we invite you to apply for the position of Site Reliability Engineer (SRE) with...
-
Site Reliability Engineer
1 week ago
New York, United States InterEx Group Full timeSenior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
3 weeks ago
New York, United States InterEx Group Full timeSenior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
3 weeks ago
New York, United States Unreal Gigs Full timeJob DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...
-
Site Reliability Engineer
2 weeks ago
New York, United States Unreal Gigs Full timeJob Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...
-
Site Reliability Engineer
2 weeks ago
New York, United States The Judge Group, LLC Full timeContract: 6+ months Hybrid: Riverwoods, IL W2 ONLY - NO C2C Job Responsibilities: Guide full stack developers on the importance of SRE principles. Analyze, design, and deploy new functionality and enhancements with high quality (security, reliability, operations) to production. Build new and analyze current monitoring for applications for...
-
Site Reliability Engineer
1 month ago
New York, United States InterEx Group Full timeSenior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...
-
Lead Site Reliability Engineer
3 weeks ago
New Town, United States BJ's Wholesale Club Full timeJoin our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJs Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, were committed to providing outstanding service and...
-
Site Reliability Engineer
6 days ago
New York, United States Citadel Securities Americas Services LLC Full timeSite Reliability Engineer (Citadel Securities Americas Services LLC - New York, NY); Multiple positions available: Collaborate with cross-functional teams, including trading, quantitative, and software engineering teams, to support and enhance Citadel's core suite of trading applications with the latest, most cutting edge technology in order to proactively...
-
Senior Site Reliability Engineer
5 days ago
New York, United States Mondrian Alpha Full timeA leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies. The right candidate will come...
-
Senior Site Reliability Engineer
7 days ago
New York, United States Mondrian Alpha Full timeA leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies.The right candidate will come...
-
Senior Site Reliability Engineer
1 day ago
New York, United States Mondrian Alpha Full timeA leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies. The right candidate will come...
-
Senior Site Reliability Engineer
1 week ago
New York, United States InterEx Group Full timeROLE: Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission-critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in...
-
Site Reliability Engineer
14 hours ago
New York, United States Old Mission Capital Full timeOld Mission is a global proprietary trading firm that leverages state-of-the-art technology and research to identify and execute profitable trading strategies across multiple asset classes around the world. Our offices in Chicago, New York, and London are all composed of naturally-curious individuals who thrive in a team environment and constantly strive for...
-
Senior Site Reliability Engineer
3 weeks ago
New York, United States InterEx Group Full timeROLE: Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission-critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
4 weeks ago
New York, United States Nationstaff Full timeAbout This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...
-
Site Reliability Engineer
7 days ago
New York, United States Nationstaff Full timeAbout This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...