Director, Site Reliability Engineering
4 weeks ago
The Director, Site Reliability Engineering (SRE) is a pivotal role in the technology infrastructure team, responsible for ensuring the highest levels of reliability, scalability, and performance. This leadership role will set the vision and strategic direction for a skilled SRE team, aligning with the strategic objectives of the IT Infrastructure team, and fostering a culture of continuous improvement and operational excellence. This role will require a deep understanding of cloud-based infrastructure services and technologies, distributed systems, product delivery platforms, DevOps, automation, monitoring and a proactive approach to preventing and mitigating potential issues. The incumbent must also foster a culture of innovation and collaboration within a team of highly skilled engineers to meet the organization's evolving needs and deliver a superior digital experience to our product teams and customers.
*This is a Hybrid, Twice-a-week onsite role at our Greensboro and Raleigh offices.
Leadership & Strategy
- Develop and implement a comprehensive SRE strategy that aligns with the IT Infrastructure team, IT and company objectives.
- Lead the SRE team, setting clear goals and expectations, and providing mentorship and career development opportunities.
- Collaborate with cross-functional teams to enhance system reliability and efficiency.
- Oversee systems related to the availability of our infrastructure ecosystem, including cloud services and internal tooling.
- Ensure the team's deep understanding and expertise in the system architecture, not limited to Kubernetes and OpenShift, but encompassing the entire product delivery stack.
- Manage the SRE team ensuring effective resource allocation and prioritization of POC's and initiative prioritization.
- Drive the adoption of best practices in incident management and post-mortem analysis.
- Be a leader in the response to high-impact infrastructure incidents, ensuring swift resolution and minimal disruption.
- Implement proactive monitoring and measures to prevent future incidents and improve system resilience.
- Articulate the value and accomplishments of the SRE team to stakeholders at all levels.
- Foster a transparent communication environment within the team and across the organization.
- Work closely with shared infrastructure services teams (including other SRE teams) within the corporation to establish a productive and transparent partnership and help establish consistent SRE and Infrastructure practices across the company.
- Proven expertise in large-scale complex system engineering and administration including cloud-based infrastructure in Microsoft Azure.
- Strong leadership skills with the ability to inspire and motivate a high-performing team.
- Excellent problem-solving abilities and data-driven approach to decision-making.
- Technical leadership skills, including collaboration, technical problem-solving, and leading complex, mission critical initiatives.
- In-depth understanding of Kubernetes concepts, components, and APIs with hands-on experience in orchestration of containerized applications using OpenShift (on-premises or in the cloud) Experience with OpenShift's added-value features such as advanced CI/CD pipelines for containerized product delivery.
- Experience with GitHub, GitHub Actions, and/or Argo CD or similar technologies.
- Strong background in working in an agile service delivery methodology arena focusing on iterative service improvement delivery.
- A bachelor's degree in Computer Science, Engineering, or related field; a master's degree is preferred.
- At least 10 years of experience in IT Infrastructure, system administration, or reliability engineering with a minimum of 5 years in a leadership role.
- A track record of managing complex infrastructure initiatives and leading incident response efforts.
#LI-Hybrid
#LI-ZP1
Do you like solving complex business problems, working with talented colleagues and have an innovative mindset? Arch may be a great fit for you. If this job isn't the right fit but you're interested in working for Arch, create a job alert Simply create an account and opt in to receive emails when we have job openings that meet your criteria. Join our talent community to share your preferences directly with Arch's Talent Acquisition team.
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Red Hat Full timeAbout the Job. Red Hat is seeking a Site Reliability Engineer (SRE) to develop, scale, and operate our OpenShift managed cloud services. OpenShift is Red Hats enterprise Kubernetes distribution. As an SRE you will contribute to running OpenShift at Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring, Technology
-
Senior Site Reliability Engineer
4 weeks ago
Raleigh, United States Associates Systems LLC Full timeSite Reliability Engineer Required Experience & Skills: Due to the work you’ll perform and interactions with DoD programs you will need to be a US citizen with the ability to obtain and maintain a DoD Secret Security Clearance BS in Computer Science, Engineering, Applied Mathematics, or a related technical field along with 7-9 years relevant work...
-
Lead Site Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Associates Systems LLC Full timeEssential Qualifications for Site Reliability Engineer:As part of your responsibilities and interactions with defense programs, you must be a US citizen capable of obtaining and maintaining a DoD Secret Security Clearance.A Bachelor’s degree in Computer Science, Engineering, Applied Mathematics, or a similar technical discipline is required, along with 7-9...
-
Senior Site Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Veradigm® Full timeWelcome to Veradigm. Our mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our vision is a connected community of health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Booz Allen Hamilton Full timeThe Opportunity: Everyone is trying to “harness the power of the cloud,” but not everyone knows how. As a site reliability engineer, you know how to build resilient platforms that meet customer needs and take advantage of the power of containerization both in the cloud and on premises. What if you could use your engineering skills to improve warfighter...
-
Sr Site Reliability Engineer
3 months ago
Raleigh, United States Veradigm Full timeWelcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today's healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...
-
Sr Site Reliability Engineer
3 months ago
Raleigh, United States Allscripts Full timeWelcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today’s healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...
-
Site Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Celonis Full timeAbout Celonis: Celonis stands as the global frontrunner in Process Mining technology and is recognized as one of the fastest-growing SaaS companies worldwide. We are dedicated to harnessing the potential of data and intelligence to enhance productivity within business operations, and we invite you to be a part of this journey. Role Overview: Join a...
-
Site Reliability Engineering Lead
1 week ago
Raleigh, North Carolina, United States Ally Full timeGeneral InformationReference Number: 17885Remote Work: NoAbout Ally and Your CareerAt Ally Financial, our success is intrinsically linked to the success of our employees. We prioritize the well-being of our team members, recognizing their diverse interests, families, and aspirations. Our commitment to work-life balance, health, and inclusivity is reflected...
-
Expert Site Reliability Engineer
2 months ago
Raleigh, United States Veradigm® Full timeWelcome to Veradigm! Our Mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our Vision is a Connected Community of Health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...
-
Lead Site Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Citrix Systems Inc Full timeLocation: Fully on-site in Raleigh, NC.About Our TeamAre you passionate about working in a dynamic and agile environment? If you thrive in a setting that encourages innovation and collaboration, we want to hear from you. Our team is embarking on an exciting journey as we transition back to our roots, focusing on our SaaS offerings and positioning ourselves...
-
Site Reliability Engineer in Cary, NC .
4 weeks ago
Raleigh, United States Delta System and Software Full timeJob Title: Site Reliability Engineer Location: Cary, NC Day 1 onsite requirement Permanent hire - Must have good knowledge on Google Cloud Platform (GCP) - Required to have hands-on experience in defining and creating CUJ, SLO, SLI, and Error Budgeting based on NFR - S...
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Cisco Full timeWho We Are Today’s results-oriented business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skillsets to be successful. This is now a mantra for our Cisco...
-
Reliability Engineering Specialist
1 week ago
Raleigh, North Carolina, United States Biogen Idec Full timeJob OverviewPosition SummaryThe Senior Reliability Engineer plays a crucial role in applying Reliability Engineering principles to enhance the design specifications and operational efficiency of essential assets throughout the organization. This position involves the development of analytical techniques to assess the reliability of components, machinery, and...
-
Senior Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Biogen Idec Full timeJob OverviewAbout the PositionThe Senior Reliability Engineer is responsible for implementing Reliability Engineering principles to enhance design specifications and operational efficiency of essential assets throughout the organization. This role involves developing analytical techniques to assess the reliability of components, machinery, and processes. The...
-
Site Reliability Engineer
2 months ago
Raleigh, United States Cisco Full timeWho We Are Today’s results-oriented business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skillsets to be successful. This is now a mantra for our Cisco...
-
Sr. Reliability Engineer I
4 weeks ago
Raleigh, United States Biogen Idec Full timeJob Description About This Role The Sr. Reliability Engineer I applies Reliability Engineering methodologies to optimize design requirements and performance of critical assets across the site. Originates and develops analysis methods for determining reliability of components, equipment and processes. Acquires data and analyzes the data. Prepares and...
-
Lead Reliability Engineer
1 week ago
Raleigh, North Carolina, United States Veradigm® Full timeWelcome to Veradigm. Our mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our vision is a connected community of health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...
-
Reliability Engineer
3 weeks ago
Raleigh, United States Amentum Full timeAmentum is seeking a Reliability Engineer to join our team in Winston Salem, NC! Typical work schedule is 1st Shift, 7:00 am – 3:30 pm; hours may vary based on business demand. Weekend hours may be scheduled to support our 24/7 operation. The Reliability Engineer acts as a Lean Maintenance SME and adds support to maintenance teams with development of...
-
Reliability Engineer
2 months ago
Raleigh, United States Amentum Full timeAmentum is seeking a Reliability Engineer to join our team in Winston Salem, NC! Typical work schedule is 1st Shift, 7:00 am – 3:30 pm.; hours may vary based on business demand. Weekend hours may be scheduled to support our 24/7 operation. The Reliability Engineer acts as a Lean Maintenance SME and adds support to maintenance teams with development of...