Reliability Engineering Leader
4 weeks ago
At Capital One, we're seeking a skilled Reliability Engineer to join our team. As a Reliability Engineer, you'll play a critical role in designing, developing, and implementing technical solutions to ensure the reliability and scalability of our systems.
Key Responsibilities:
- Collaborate with Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies
- Communicate Service Level Objective concepts to product partners and drive agreement on objectives
- Influence the strategic direction of the team, identifying and prioritizing opportunities to improve reliability
- Drive implementation of processes or solutions that improve reliability across multiple platforms
- Identify gaps in automation and develop strategic plans to drive solutions that reduce toil for the platform teams
- Work with other experts to arrive at optimal design and deployment configurations
- Establish standards that improve deployment and system reliability for integration pipelines and recommend approaches for chaos testing a particular system
- Identify and create proactive, automated approaches for system reliability and alerting and identify key performance indicators for a system, including adding, tuning and maintaining alert configurations
- Understand business requirements for system reliability and translate them into implementations such as scaling, failover, timeouts and health checks and work with development teams to test and improve system performance and reliability
Requirements:
- Bachelor's Degree
- At least 4 years of professional software engineering experience (Internship experience does not apply)
- At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)
Preferred Qualifications:
- Master's Degree
- 7+ years of experience in at least one of the following: Java, Scala, Python, Go, or Node.js
- 2+ years of experience with AWS, GCP, Azure, or another cloud service
- 4+ years of experience in open source frameworks
- 1+ years of people management experience
- 2+ years of experience in Agile practices
- 2+ years of experience with blameless incident reviews and post incident responses
- 2+ years of experience with secure coding practices
- 2+ years of experience in creating release documentation
- 2+ years of experience in logging technologies (log4j configuration, Splunk)
- 2+ years of experience in resilient system architecture patterns (Microservices Architecture, Layered Architecture, Event-Driven Architecture)
Capital One offers a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being. For more information, please visit Capital One Careers website.
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Veterans Enterprise Technology Solutions Full timeJob Title: Site Reliability EngineerOverview:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Responsibilities:• Monitor and analyze...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Ankura Full timeJob Summary:Ankura is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a pivotal role in ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design, deploy, and manage cloud infrastructure solutions using leading cloud platforms such as Azure, AWS,...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States MetroStar Corporation Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at MetroStar Corporation. As a key member of our team, you will be responsible for driving improvements in observability, performance, and reliability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to...
-
Reliability Engineering Manager
4 weeks ago
Washington, Washington, D.C., United States Specialized Group Full timeSpecialized Group is a leading quantitative hedge fund and financial technology firm that leverages advanced data science and machine learning to drive investment strategies and innovative solutions.Our company culture is built on cutting-edge research and collaboration, attracting top talent passionate about solving complex problems with data-driven...
-
Senior Site Reliability Engineer
3 weeks ago
Washington, Washington, D.C., United States Verint Systems Full timeAbout the Role:Verint Systems is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our systems and services.Key Responsibilities:Design and implement scalable and reliable systems and servicesCollaborate with cross-functional...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Palantir Technologies Full timeAbout the RoleWe're seeking a skilled Site Reliability Engineer to join our team at Palantir Technologies. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, scalability, and reliability of our cloud and on-premises infrastructure.Key ResponsibilitiesMaintain the availability of cloud and physical Linux servers that...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States VLink Inc Full timeJob Title: Site Reliability Engineer - Cloud ExpertJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at VLink Inc. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Mount Indie Full timeJob OverviewMt. Indie is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our team, you will play a critical role in driving improvements in observability, performance, and reliability.Key Responsibilities:Monitor and analyze platform and containerized applications to identify performance and availability risks and...
-
Site Reliability Engineer
3 weeks ago
Washington, Washington, D.C., United States Harbor Compliance Full timeAbout Harbor ComplianceHarbor Compliance is a leading provider of regulatory compliance solutions for businesses and nonprofits. We are committed to simplifying the regulatory challenges of our clients through innovative technology solutions.Job OverviewThe Site Reliability Engineer will play a critical role in ensuring the availability, scalability, and...
-
Site Reliability Engineer
3 weeks ago
Washington, Washington, D.C., United States Evolent Health Full timeAbout the Role:Evolent Health is seeking a highly skilled Site Reliability Engineer to join our Platform Engineering organization. As a member of this team, you will play a critical role in managing our large application suite and cloud infrastructure.Key Responsibilities:Implement and manage observability solutions using OpenTelemetry to monitor and trace...
-
Electrical Engineering Leader
3 weeks ago
Washington, Washington, D.C., United States WSP Full timeElectrical Engineering LeaderWSP is seeking a highly skilled Electrical Engineering Leader to join our team in Washington DC. As a key member of our Transportation Team, you will be responsible for leading electrical engineering and design work for a range of facilities, including transit, industrial, and commercial structures.Key Responsibilities:Conduct...
-
Site Reliability Manager
3 weeks ago
Washington, Washington, D.C., United States Karsun Solutions Full timeSite Reliability ManagerKarsun Solutions is seeking a highly skilled Site Reliability Manager to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services.The Site Reliability Manager will lead a team of engineers in designing, implementing, and maintaining robust...
-
Site Reliability Engineer
3 weeks ago
Washington, Washington, D.C., United States Erias Ventures Full timeJob SummaryErias Ventures is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the stability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud infrastructure...
-
Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Mount Indie Full timeJob SummaryAs a Site Reliability Engineer at Mount Indie, you will play a critical role in ensuring the reliability, performance, and scalability of our cloud-based infrastructure. This is a unique opportunity to work with a talented team of engineers and contribute to the development of cutting-edge technology solutions.Key Responsibilities* Monitor and...
-
Site Reliability Engineer
3 weeks ago
Washington, Washington, D.C., United States Cape Full timeAbout CapeCape is a pioneering company in the field of privacy-centric telecommunications. Founded in 2022 by a team of experts from Palantir and Anduril, our mission is to revolutionize the way we think about mobile device security and data privacy.We believe that personal privacy and national security interests are not mutually exclusive, and that strong...
-
Staff Site Reliability Engineer
4 weeks ago
Washington, Washington, D.C., United States Zscaler Full timeAbout ZscalerZscaler is a leading cloud security company that protects thousands of enterprise customers worldwide, including 40% of Fortune 500 companies. Our mission is to make the cloud a safe place to do business and provide a seamless experience for enterprise users.Job SummaryWe are seeking an experienced Staff Site Reliability Engineer (Federal) to...
-
Linux Systems Engineer
3 weeks ago
Washington, Washington, D.C., United States ST2 ManTech Advanced Systems Intl Full timeSecure Our Nation, Ignite Your Future with ST2 ManTech Advanced Systems IntlOverviewST2 ManTech Advanced Systems Intl is a dynamic and growing program seeking a motivated, career-oriented Linux Systems Engineer - Security and Reliability to join our team in Ft. Meade, MD or San Antonio, TX.Job DescriptionThis role involves providing support for...
-
Cloud Infrastructure Architect Leader
2 weeks ago
Washington, Washington, D.C., United States Oracle Full timeOverview Oracle is a global technology company that provides enterprise cloud computing, software, and hardware solutions. As a leading provider of cloud services, we empower businesses to innovate and grow in a rapidly changing world. About the Role We are seeking a highly experienced Cloud Infrastructure Architect Leader to join our development team. As a...
-
Senior Lead Reliability Expert
3 weeks ago
Washington, Washington, D.C., United States Orsted Full timeReliability and Compliance ExpertiseAt Ørsted, we're committed to delivering renewable energy reliably and in compliance with NERC regulations. As our Senior Lead Reliability Specialist, you'll be the lead technical authority for the Compliance and Reliability Americas team.The team is comprised of experts in operational compliance, reliability, critical...
-
Channel Systems Engineer
4 weeks ago
Washington, Washington, D.C., United States Palo Alto Networks Full timeJob Title: Channel Systems EngineerAt Palo Alto Networks, we're committed to protecting our digital way of life. As a Channel Systems Engineer, you'll play a critical role in our mission by providing technical expertise and guidance to partners on their journey to becoming key Palo Alto Networks partners.Job Summary:As a Channel Systems Engineer, you'll...