Senior Site Reliability Engineer

3 weeks ago


Washington, United States Sparibis Full time
Location: 100% remote

Years' Experience: 10+ Year's of experience

Education: Bachelor's degree

Work Authorization: United States Citizenship is required as part of the eligibility criteria to be able to obtain a security clearance.

Clearance: Applicants must be able to obtain and maintain a Public Trust security clearance.

Key Skills:
  • Must experience serving as a SRE
  • Prior leadership and experience with leading a team
  • Deep understanding of SRE principles for highly scalable and reliable systems.
  • Configuration Management and Infrastructure as Code expertise
Responsibilities
  • Responsible for incident response, monitoring, alerting, triaging and closing of real problems
  • Ensure platform stability and availability
  • Responsible for the metrics reporting and tracking, evaluation of proper function, support to the teams for enhance performance
  • Design and implement end-to-end continuous delivery pipelines.
  • Leverage extensive AWS cloud experience in a production environment (e.g., network, security, deployment, automation, serverless technologies).
  • Utilize a deep understanding of SRE principles for highly scalable and reliable systems.
  • Leverage extensive experience with Configuration Management and Infrastructure as Code.
  • Works with application teams to document application internal/external interface requirements for Development, Testing, Staging and Production environments
  • Works with application teams to ensure compliance with High Availability and Disaster Recovery related concept of operations.
  • Build service level requirements for SLA's
  • Implements middleware application specific requirements as needed
  • Implements migration efforts with application teams, including data migration
  • Serve as a thought leader for agile development teams.
  • Establish clarity of direction and a shared vision of success that is championed by team members, stakeholders, and product owners.
  • Build relationships, and work in collaboration with team members, stakeholders, product owners, and technical team leads.
  • Help enhance processes, communication, and delivery through new norms that improve how work is done - from discovery to delivery.
  • Provides technical guidance to application teams to take advantage of cloud technologies, and implement cloud infrastructure, as needed.
Qualifications
  • 10+ years of software engineering and DevOps experience
  • Bachelor degree or higher education required
  • Must be able to obtain and maintain a Public Trust security clearance
  • Must have experience with highly scalable and reliable systems by implementing and maintaining processes and tools
  • Incident response, monitoring performance and releases, alerting, and triaging expertise
  • ServiceNow, AWS Insight, Splunk, VictorOPS, CloudWatch, New Relic, and Confluence expertise preferred
  • Experience in designing and implementing end-to-end continuous delivery pipelines.
  • A deep AWS cloud experience in a production environment (e.g., network, security, deployment, automation, serverless technologies).
  • Experience and understanding in SRE principles for highly scalable and reliable systems.
  • A strong experience with Configuration Management and Infrastructure as a Code.
  • Experience designing and implementing end to end CI/CD pipelines
  • AWS Cloud experience in the production environment (ie. network, security, deployment, automation, serverless technologies)
  • Experience designing and building web application environments on AWS including services such as EC2, S3, Lambda, ELB, ECS etc.
  • Experience in deploying of the cloud resources using IaC tools like Terraform.
  • Experience with monitoring and logging tools such as Cloud Watch, App Dynamics and Splunk. Create CloudWatch rules to capture the apps alerts and send notifications
  • Previous experience migrating application teams from on-prem to cloud infrastructure (AWS, Azure) preferred.
  • Experience with CI/CD frameworks (ie. Jenkins, Docker, Ansible, Chef, Puppet, Git)
  • Experience in at least one automation and scripting tool experience (ie. Bash, Python, Shell, Perl)
  • Experience in designing and building of CIFS and NFS on-premises File share migration using AWS Datasync and VPC endpoints to AWS storage services S3, EFS or FSx.
  • Experience in creating build plans for AWS deployment by listing out compute resources, Security groups, LB, target group, NACL and all other components for various environments (Dev, TQA, and Prod etc.)
  • Experience maintaining and administering configuration management systems such as Enterprise GitHub.
  • Experience maintaining and administering software build systems such as Jenkins.
  • Experience maintaining and administering artifact repository systems such as Artifactory.
  • Ability to automate workflows through scripting or other technologies such as Ansible or Puppet.
  • Expertise in Agile and DevSecOps approaches


About Sparibis

Sparibis LLC is a professional solution firm that Clients rely on to access the best talent to drive their business success.

Sparibis is an equal opportunity employer that values diversity at all levels. All individuals, regardless of personal characteristics, are encouraged to apply.

  • Washington, United States Allscripts Full time

    Welcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today’s healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...


  • Washington, United States Red Frog Solutions Full time

    Site Reliability Engineer - SRE - (TS/SCI) Full Time Perm Washington D.C. (Hybrid - 3 days onsite, 2 days remote) $180K - $200K Salary Plus Competitive Benefits As a Site Reliability Engineer (SRE), you will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the...


  • Washington, United States ALTA IT Services Full time

    Site Reliability EngineerWashington, DC – 100% ONSITEActive TS/SCI clearance is required to start As a Site Reliability Engineer (SRE), you’ll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the federal government. What you’ll do:• Monitor platform and containerized...


  • Washington, United States Mount Indie Full time

    Job DescriptionJob DescriptionAs aSite Reliability Engineer (SRE), youll continuously drive improvements in observability, performance, and reliability,with the goal to make an impact across the federal government. This role requires a current TS/SCI that has been obtained within the last 51 months and the ability to pass additional background...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time Remote Advance Your Career with Cutting-Edge Infrastructure at Harbor Compliance Location: Full-time Remote (Excluding CA, CO, MT, NY) About Harbor Compliance: Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time Remote Advance Your Career with Cutting-Edge Infrastructure at Harbor Compliance Location: Full-time Remote (Excluding CA, CO, MT, NY) About Harbor Compliance: Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to...


  • Washington, United States Mount Indie Full time

    Mount Indie is on the search for a Lead Site Reliability Engineering (SRE) to work remotely, focusing on delivering mission critical services that empower end users. The role will involve designing and implementing end to end CI/CD pipelines using AI/ML tooling. Responsibilities: • Design and implement end-to-end CI/CD pipelines. • Employ extensive...


  • Washington, United States Harbor Compliance Full time

    Job DescriptionJob DescriptionSite Reliability Engineer - Full-time RemoteAdvance Your Career with Cutting-Edge Infrastructure at Harbor ComplianceLocation: Full-time Remote (Excluding CA, CO, MT, NY)About Harbor Compliance:Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology...


  • Washington, United States Mount Indie Full time

    Job DescriptionJob DescriptionMount Indie is on the search for a Lead Site Reliability Engineering (SRE) to work remotely, focusing on delivering mission critical services that empower end users. The role will involve designing and implementing end to end CI/CD pipelines using AI/ML tooling.Responsibilities:Design and implement end-to-end CI/CD...


  • Fort Washington, United States JR Technologies Full time

    At JR Technologies, our vision is to create the new customer-centric distribution landscape of tomorrow. Working with us offers many opportunities to experienced professionals who are interested in joining a strong team, learning and mentoring in a dynamic environment, honing professional and technical abilities, and who thrive on new challenges. We provide...


  • Washington, United States OMW Consulting Full time

    Site Reliability Engineer Salary $140k-$200k + Equity Secret Clearance or higher is required My client, a VC-backed organization in the defense tech space, is looking to hire multiple SREs as they build out their DevOps team across the USA. My client has created a modern product which is streamlining processes and saving time in critical areas for the DOD....


  • Washington, United States ALTA IT Services Full time

    Site Reliability Engineering (SRE) Lead100% RemoteUS Citizenship required per government contract Must be able to obtain a DHS Public Trust clearance As a Site Reliability Engineering (SRE) Lead, you'll deliver mission-critical services that empower end users. As the ideal candidate, you'll use your extensive experience designing and implementing end-to-end...


  • Washington, Washington, D.C., United States SAIC Career Site Full time

    Description SAIC is Seeking a motivated, experienced Active Directory (AD) Senior System Engineer with advanced PowerShell scripting capability, responsible for analysis, design, and implementation coordination and Tier-3 level support for complex, enterprise level AD and Cloud solutions in Washington, DC.As a senior member of the engineering team, will...


  • Washington, United States Marriott Full time

    Job Number 24059351 Job Category Information Technology Location Marriott International HQ, 7750 Wisconsin Avenue, Bethesda, Maryland, United States Schedule Full-Time Located Remotely? Y Relocation? N Position Type Management JOB SUMMARY Lead role in the Monitoring and Performance Management function at Marriott. Performs detailed performance analysis of...


  • Washington, United States Mechanicode.io Full time

    We are looking for a Lead Azure Site Reliability Engineer (SRE) to enable efficient monitoring and observability of the CDC Azure infrastructure and and applications. The SRE will lead operations of the cloud environment with observability, IAC, and cloud-native best practices. The engineer will be part of a larger effort to modernize the CDC DevOps...


  • Washington, United States KMS Solutions Full time

    Reliability Engineer KMS Solutions, LLC is a technical management/solutions company that specializes in engineering, analysis, and cyber security. Founded in 2005, KMS is a certified small business with over a decade and a half of experience supporting the Department of Defense as well as many other departments and programs critical to our Nations security...


  • Washington, United States Palantir Technologies Full time

    Site Reliability Engineer - Security Infrastructure Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Our products support...


  • Washington, United States MetroStar Full time

     As a Site Reliability Engineering (SRE) Lead, you'll deliver mission-critical services that empower end users. As the ideal candidate, you'll use your extensive experience designing and implementing end-to-end continuous delivery pipelines and experience in AI/ML. You will also use your experience working closely with developers and other engineers to...

  • Reliability Engineer

    6 hours ago


    Washington, United States FirstEnergy Full time

    FirstEnergy at a Glance We are a forward-thinking electric utility powered by a diverse team of employees committed to making customers’ lives brighter, the environment better and our communities stronger. FirstEnergy (NYSE: FE) is dedicated to integrity, safety, reliability, and operational excellence. Headquartered in Akron, Ohio, FirstEnergy includes...


  • Washington, United States Jacobs Full time

    Your Impact: Challenging Today. Reinventing Tomorrow. We're invested in you and your success. Everything we do is more than just a project. It's our challenge as human beings, too. That's why we bring a thoughtful and collaborative approach to every one of our partnerships. At Jacobs, we challenge the status quo and redefine how to solve the world's...