Site Reliability Manager

1 week ago


Washington, United States Karsun Solutions Full time

We are seeking a highly skilled and experienced Site Reliability Manager to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services. They will lead a team of engineers in designing, implementing, and maintaining robust infrastructure and automation solutions. The ideal candidate must reside in the Washington DC area and be available to work on site in downtown Washington DC as required. Responsibilities: Lead a service delivery team of 8-20 people (Service Support specialist, DevSecOps and Site reliability engineers) Define and implement best practices for infrastructure as code, deployment automation, and monitoring Collaborate with cross-functional teams to design scalable and fault-tolerant architectures. Develop and maintain service level objectives (SLOs) and key performance indicators (KPIs) to measure system reliability and performance. Conduct post-mortems and root cause analyses for incidents and implement preventive measures to mitigate future incidents. Drive continuous improvement initiatives to enhance the reliability, scalability, and efficiency of our systems and services. Mentor and coach team members to foster a culture of learning and innovation. Required: Bachelor's degree in computer science, Engineering, or a related field; Master's degree preferred. 10+ years of experience in a similar role managing a team of site reliability engineers and delivering in AWS cloud platform. Proven track record of managing high-performance teams. 5+ years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable and high available. Deep understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes). Strong knowledge of infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) and CI/CD pipelines. Experience with monitoring, logging, and observability tools like DataDog, AWS Cloudwatch, ELK, Prometheus, Splunk etc. Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams. Strong problem-solving and analytical skills, with a keen attention to detail. Certifications such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are a plus. Ability to obtain and maintain a Public Trust clearance. Preferred: Understanding of modern architecture, e.g. micro-services, EDA, etc., and cautious against overcomplexity and overengineering. Experience with monitoring and metrics platforms, e.g. New Relic, Prometheus, InfluxDB, Grafana, Splunk, etc. Experience designing and operating distributed systems and cloud infrastructure at scale. In accordance with pay transparency guidelines, the proposed salary range for this position is $140,000.00 to $180,000.00. Final salary will be determined based on various factors such as relevant skills, experience and certifications. Find Your Next at Karsun Solutions and transform your career with the company transforming possible for the US Government. At Karsun, collaboration drives our community. We're committed to building an environment where team members from diverse backgrounds can innovate, learn and grow with us. Here at Karsun, the only limit to your potential is the limit of your curiosity. And because we know well-being empowers us to thrive, we offer robust and comprehensive benefits including: Health, Life & Disability Insurance - Medical, Dental, Life and Disability coverage is paid for by Karsun for full time employees. Paid Parental Leave 401k Retirement Plan - with pre-tax and post-tax ROTH contribution offerings and immediate vesting with a per pay period match Generous time off programs including 11 paid holidays per year Supplemental plans such as Vision, Pet Insurance and 529 Savings Plan Employee Assistance Program with behavioral health, physical wellness and financial advice Employee Discounts & Perks In-house Technical/Skills Training Join Team Karsun and Find Your Next . Karsun Solutions is an Equal Employment Opportunity (EEO) employer. It is the policy of the Company to provide equal employment opportunities to all qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information. Karsun does not accept unsolicited resumes through or from search firms or staffing agencies. All unsolicited resumes will be considered the property of Karsun and Karsun will not be obligated to pay a placement fee. #J-18808-Ljbffr



  • Washington, United States Karsun Solutions Full time

    About the RoleWe are seeking a highly skilled and experienced Site Reliability Engineering Manager to join our team at Karsun Solutions. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services.Key Responsibilities:Lead a team of engineers in designing, implementing, and maintaining robust...


  • Washington, United States Varada Consulting Full time

    Site Reliability EngineerJob Location-Washington, DC; Hybrid Overview:Varada Consulting, LLC is seeking a full-time highly skilled and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications through automation, monitoring, and...


  • Washington, United States Alldus Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...


  • Washington, United States Cinder LLC Full time

    [Full Time] Site Reliability Engineer at Cinder (United States) Site Reliability Engineer Cinder United States Date Posted: 31 Oct, 2022 Work Location: Washington, DC, United States Salary Offered: $110 — $220 yearly Job Type: Full Time Experience Required: 1+ years Remote Work: Yes Stock Options: No Vacancies: 1 available About Cinder Cinder provides a...


  • Washington, United States StaffWorthy Inc. Full time

    We are a leading technology services provider with a rich history of assembling exceptional teams dedicated to delivering outstanding solutions. For over two decades, we have been committed to excellence, with a mission centered around our passion for our people and the value they deliver to our customers. Responsibilities Monitor platform and containerized...


  • Washington, United States StaffWorthy Inc. Full time

    We are a leading technology services provider with a rich history of assembling exceptional teams dedicated to delivering outstanding solutions. For over two decades, we have been committed to excellence, with a mission centered around our passion for our people and the value they deliver to our customers.ResponsibilitiesMonitor platform and containerized...


  • Washington, United States TEKsystems Full time

    **Job Summary**One of the largest financial institutions in Japan is seeking a highly skilled DevOps/Site Reliability Engineer to join a large-scale migration project. As a key member of the team, you will be responsible for designing and implementing the pipeline architecture for the migrations. This is an exciting opportunity to join a leading organization...


  • Washington, United States GitLab Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Gitaly team at GitLab. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our Gitaly service, which is responsible for storing and managing Git data for our users.Key ResponsibilitiesWork with peer SREs to...


  • Washington, United States Mount Indie Full time

    Job DescriptionJob DescriptionAs aSite Reliability Engineer (SRE), youll continuously drive improvements in observability, performance, and reliability,with the goal to make an impact across the federal government. This role requires a current TS/SCI that has been obtained within the last 51 months and the ability to pass additional background...


  • Washington, United States Kansas Action for Children, Inc Full time

    at T-Mobile USA, Inc. in Overland Park, Kansas, United States Job DescriptionBe unstoppable with us!T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as...


  • Washington, United States System One Full time

    Site Reliability Engineer Work Location: 3 days onsite DC - JBAB, 2 days remote Clearance: Active TS/SCI with ability to clear PSD As a Site Reliability Engineer (SRE), you’ll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the federal government. What You’ll Do Monitor platform and...


  • Washington, United States Kansas Action for Children, Inc Full time

    at T-Mobile USA, Inc. in Overland Park, Kansas, United StatesJob DescriptionBe unstoppable with us!T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as...


  • Washington, United States CruitZi, INC Full time

    Job DescriptionJob DescriptionOur Client is currently hiring a full-time Sr. Site Reliability Engineer (SRE), who will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the federal government.This role is Hybrid, requiring travel to downtown Washington, DC, at...


  • Washington, United States Kansas Action for Children, Inc Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Kansas Action for Children, Inc. in Overland Park, Kansas, United States.This is an exciting opportunity for a technical professional who is passionate about innovation and wants to be part of a team that is reshaping the future of technology in the wireless...


  • Washington, United States Veterans Enterprise Technology Solutions Full time

    Job Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our systems and applications.Key Responsibilities:Monitor and analyze system performance to identify...


  • Washington, United States Veterans Enterprise Technology Solutions Full time

    Job Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our infrastructure.Key Responsibilities:Monitor and Maintain Infrastructure: Continuously monitor our...


  • Washington, United States Varada Consulting Full time

    Site Reliability Engineer Job Location: Washington, DC; Hybrid This position is eligible for a $5,000 Sign-on Bonus and Relocation Assistance, if applicable. Overview: Varada Consulting, LLC is seeking a full-time highly skilled and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability,...


  • Washington, United States MetroStar Systems Full time

    ***$25k Sign-On Bonus for this role*** As a Site Reliability Engineer (SRE) , you'll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the highest levels of government. If you think you can see yourself delivering our mission and pursuing our goals with us, then check out the job...


  • Washington, United States Palantir Technologies Full time

    About the RolePalantir Technologies is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.Key ResponsibilitiesCollaborate with cross-functional teams to design, implement, and maintain...


  • Washington, United States Veterans Enterprise Technology Solutions Full time

    Overview: Staffing Pros, a division of VETS Inc., is recruiting for a full-time Site Reliability Engineer. This position will work a rotating hybrid schedule- 3 days onsite at JBAB, 2 days remote. An Active Top Secret SCI clearance is required for this role. If you have additional questions not answered by the information contained within this posting,...