Reliability Engineering Leader

4 weeks ago


Washington, Washington, D.C., United States Capital One Full time

At Capital One, we're seeking a skilled Reliability Engineer to join our team. As a Reliability Engineer, you'll play a critical role in designing, developing, and implementing technical solutions to ensure the reliability and scalability of our systems.

Key Responsibilities:

  • Collaborate with Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies
  • Communicate Service Level Objective concepts to product partners and drive agreement on objectives
  • Influence the strategic direction of the team, identifying and prioritizing opportunities to improve reliability
  • Drive implementation of processes or solutions that improve reliability across multiple platforms
  • Identify gaps in automation and develop strategic plans to drive solutions that reduce toil for the platform teams
  • Work with other experts to arrive at optimal design and deployment configurations
  • Establish standards that improve deployment and system reliability for integration pipelines and recommend approaches for chaos testing a particular system
  • Identify and create proactive, automated approaches for system reliability and alerting and identify key performance indicators for a system, including adding, tuning and maintaining alert configurations
  • Understand business requirements for system reliability and translate them into implementations such as scaling, failover, timeouts and health checks and work with development teams to test and improve system performance and reliability

Requirements:

  • Bachelor's Degree
  • At least 4 years of professional software engineering experience (Internship experience does not apply)
  • At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)

Preferred Qualifications:

  • Master's Degree
  • 7+ years of experience in at least one of the following: Java, Scala, Python, Go, or Node.js
  • 2+ years of experience with AWS, GCP, Azure, or another cloud service
  • 4+ years of experience in open source frameworks
  • 1+ years of people management experience
  • 2+ years of experience in Agile practices
  • 2+ years of experience with blameless incident reviews and post incident responses
  • 2+ years of experience with secure coding practices
  • 2+ years of experience in creating release documentation
  • 2+ years of experience in logging technologies (log4j configuration, Splunk)
  • 2+ years of experience in resilient system architecture patterns (Microservices Architecture, Layered Architecture, Event-Driven Architecture)

Capital One offers a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being. For more information, please visit Capital One Careers website.



  • Washington, Washington, D.C., United States Veterans Enterprise Technology Solutions Full time

    Job Title: Site Reliability EngineerOverview:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Responsibilities:• Monitor and analyze...


  • Washington, Washington, D.C., United States Ankura Full time

    Job Summary:Ankura is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a pivotal role in ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design, deploy, and manage cloud infrastructure solutions using leading cloud platforms such as Azure, AWS,...


  • Washington, Washington, D.C., United States MetroStar Corporation Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at MetroStar Corporation. As a key member of our team, you will be responsible for driving improvements in observability, performance, and reliability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to...


  • Washington, Washington, D.C., United States Specialized Group Full time

    Specialized Group is a leading quantitative hedge fund and financial technology firm that leverages advanced data science and machine learning to drive investment strategies and innovative solutions.Our company culture is built on cutting-edge research and collaboration, attracting top talent passionate about solving complex problems with data-driven...


  • Washington, Washington, D.C., United States Verint Systems Full time

    About the Role:Verint Systems is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our systems and services.Key Responsibilities:Design and implement scalable and reliable systems and servicesCollaborate with cross-functional...


  • Washington, Washington, D.C., United States Palantir Technologies Full time

    About the RoleWe're seeking a skilled Site Reliability Engineer to join our team at Palantir Technologies. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, scalability, and reliability of our cloud and on-premises infrastructure.Key ResponsibilitiesMaintain the availability of cloud and physical Linux servers that...


  • Washington, Washington, D.C., United States VLink Inc Full time

    Job Title: Site Reliability Engineer - Cloud ExpertJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at VLink Inc. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable...


  • Washington, Washington, D.C., United States Mount Indie Full time

    Job OverviewMt. Indie is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our team, you will play a critical role in driving improvements in observability, performance, and reliability.Key Responsibilities:Monitor and analyze platform and containerized applications to identify performance and availability risks and...


  • Washington, Washington, D.C., United States Harbor Compliance Full time

    About Harbor ComplianceHarbor Compliance is a leading provider of regulatory compliance solutions for businesses and nonprofits. We are committed to simplifying the regulatory challenges of our clients through innovative technology solutions.Job OverviewThe Site Reliability Engineer will play a critical role in ensuring the availability, scalability, and...


  • Washington, Washington, D.C., United States Evolent Health Full time

    About the Role:Evolent Health is seeking a highly skilled Site Reliability Engineer to join our Platform Engineering organization. As a member of this team, you will play a critical role in managing our large application suite and cloud infrastructure.Key Responsibilities:Implement and manage observability solutions using OpenTelemetry to monitor and trace...


  • Washington, Washington, D.C., United States WSP Full time

    Electrical Engineering LeaderWSP is seeking a highly skilled Electrical Engineering Leader to join our team in Washington DC. As a key member of our Transportation Team, you will be responsible for leading electrical engineering and design work for a range of facilities, including transit, industrial, and commercial structures.Key Responsibilities:Conduct...


  • Washington, Washington, D.C., United States Karsun Solutions Full time

    Site Reliability ManagerKarsun Solutions is seeking a highly skilled Site Reliability Manager to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services.The Site Reliability Manager will lead a team of engineers in designing, implementing, and maintaining robust...


  • Washington, Washington, D.C., United States Erias Ventures Full time

    Job SummaryErias Ventures is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the stability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud infrastructure...


  • Washington, Washington, D.C., United States Mount Indie Full time

    Job SummaryAs a Site Reliability Engineer at Mount Indie, you will play a critical role in ensuring the reliability, performance, and scalability of our cloud-based infrastructure. This is a unique opportunity to work with a talented team of engineers and contribute to the development of cutting-edge technology solutions.Key Responsibilities* Monitor and...


  • Washington, Washington, D.C., United States Cape Full time

    About CapeCape is a pioneering company in the field of privacy-centric telecommunications. Founded in 2022 by a team of experts from Palantir and Anduril, our mission is to revolutionize the way we think about mobile device security and data privacy.We believe that personal privacy and national security interests are not mutually exclusive, and that strong...


  • Washington, Washington, D.C., United States Zscaler Full time

    About ZscalerZscaler is a leading cloud security company that protects thousands of enterprise customers worldwide, including 40% of Fortune 500 companies. Our mission is to make the cloud a safe place to do business and provide a seamless experience for enterprise users.Job SummaryWe are seeking an experienced Staff Site Reliability Engineer (Federal) to...


  • Washington, Washington, D.C., United States ST2 ManTech Advanced Systems Intl Full time

    Secure Our Nation, Ignite Your Future with ST2 ManTech Advanced Systems IntlOverviewST2 ManTech Advanced Systems Intl is a dynamic and growing program seeking a motivated, career-oriented Linux Systems Engineer - Security and Reliability to join our team in Ft. Meade, MD or San Antonio, TX.Job DescriptionThis role involves providing support for...


  • Washington, Washington, D.C., United States Oracle Full time

    Overview Oracle is a global technology company that provides enterprise cloud computing, software, and hardware solutions. As a leading provider of cloud services, we empower businesses to innovate and grow in a rapidly changing world. About the Role We are seeking a highly experienced Cloud Infrastructure Architect Leader to join our development team. As a...


  • Washington, Washington, D.C., United States Orsted Full time

    Reliability and Compliance ExpertiseAt Ørsted, we're committed to delivering renewable energy reliably and in compliance with NERC regulations. As our Senior Lead Reliability Specialist, you'll be the lead technical authority for the Compliance and Reliability Americas team.The team is comprised of experts in operational compliance, reliability, critical...


  • Washington, Washington, D.C., United States Palo Alto Networks Full time

    Job Title: Channel Systems EngineerAt Palo Alto Networks, we're committed to protecting our digital way of life. As a Channel Systems Engineer, you'll play a critical role in our mission by providing technical expertise and guidance to partners on their journey to becoming key Palo Alto Networks partners.Job Summary:As a Channel Systems Engineer, you'll...