Senior Site Reliability Engineer
3 days ago
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in Washington.
The Senior Site Reliability Engineer will play a critical role in enhancing the reliability, scalability, and performance of large-scale distributed systems. This role blends infrastructure expertise with software development, focusing on automation, observability, and proactive risk mitigation. You will collaborate closely with engineering teams to optimize platforms, improve operational efficiency, and ensure high availability for millions of users. The ideal candidate thrives in high-traffic environments, contributes to open-source projects, and enjoys solving complex technical challenges while mentoring peers and improving team practices. This position offers the opportunity to make a tangible impact on highly visible systems and services.
Accountabilities:- Collaborate with engineering teams to design, build, and maintain resilient, high-performance systems.
- Enhance infrastructure and platform services to support deployment, observability, and operational excellence.
- Develop automation tools to reduce manual tasks, mitigate risks, and improve engineering efficiency.
- Monitor, troubleshoot, and optimize network, system, and service-level performance.
- Participate in sustainable incident response, conducting blameless postmortems and implementing improvements.
- Contribute upstream to open-source projects and implement best practices for scalability and reliability.
- Share on-call responsibilities to ensure continuous system availability and performance.
- 5+ years of experience in Software Engineering, Site Reliability Engineering, or a development-focused DevOps role.
- Proficiency in one or more programming languages, preferably Go and Python.
- Experience with Kubernetes, cloud systems, and distributed systems development.
- Familiarity with observability and monitoring tools such as Prometheus, Thanos, Grafana, Vector, Clickhouse, Otel, and Loki.
- Strong skills in debugging, optimizing code, and troubleshooting across applications, networking (TCP/IP), and systems.
- Solid working knowledge of Linux and containerization technologies.
- Excellent collaboration, communication, and problem-solving abilities.
- Comprehensive healthcare coverage including medical, dental, and vision.
- 401(k) program with employer matching.
- Home office setup support and remote workspace benefits.
- Personal and professional development funds.
- Flexible vacation policies and global wellness days.
- Paid parental leave and family planning support.
- Paid volunteer time off.
- Equity opportunities in the form of restricted stock units.
Why Apply Through Jobgether?
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1
-
Senior Site Reliability
6 days ago
Washington, Washington, D.C., United States Canonical - Jobs Full timeCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
-
Site Reliability Engineer
2 weeks ago
Washington, Washington, D.C., United States Cyrad Solutions Full time $150,000 - $250,000 per yearWashington, D.C., USAFull-time Strategic Site Reliability Engineer: Global Network Orchestration PlatformThe Opportunity: Design the core reliability platform for the final frontier of space Mesh networking. This is a strategic, high-impact mandate within a high-growth, fast-paced startup, building the next generation of software-defined networks for...
-
Senior Site Reliability Engineer
4 days ago
Washington, Washington, D.C., United States Anduril Industries Full timeIf you are an experienced SRE who is passionate about building the platform that delivers capabilities to directly improve the experience of the warfighter, this is the job for you. Site Reliability Engineers (SRE) work with technical leaders and System Deployment Engineers to determine technical direction and deliver with thorough analysis, designs and...
-
Site Reliability Engineer
2 days ago
Washington, Washington, D.C., United States Anduril Industries Full timeAnduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century's most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril's...
-
Senior Reliability Engineer
1 week ago
Washington, Washington, D.C., United States Washington Metropolitan Area Transit Authority (WMATA) Full timeJob DescriptionTHIS JOB OPENING IS BEING USED TO FILL CURRENT AND FUTURE VACANT POSITIONS.Minimum QualificationsEducationBachelor's Degree in reliability, electrical, mechanical, or related engineering field.In lieu of a Bachelor's degree, a High School Diploma or GED and four (4) years of experience in an electrical or mechanical maintenance role in the...
-
Lead Site Reliability Engineer
3 days ago
Washington, Washington, D.C., United States e-ca75-4acb-94ee-4750bc2fa55b Full timeAbout Bridge Defense. Bridge Defense is redefining how modern defense technology is delivered. Based in Washington, D.C., we are built for the dynamic mission environment facing the Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national security solutions that combine secure infrastructure,...
-
Site Reliability Engineer, Platform Discovery
2 weeks ago
Washington, Washington, D.C., United States Anduril Industries Full timeABOUT THE TEAMThe Platform Discovery team at Anduril is at the forefront of incubating and maturing high-potential, software-defined, AI-native offerings that meet the toughest, newest challenges across hardware, software, space, and cyber domains. We're the architects of mission autonomy and mesh networking, delivering scalable hardware solutions that meet...
-
Site Reliability Engineer
6 days ago
Washington, Washington, D.C., United States MetroStar Full timeAs Site Reliability Engineer, you'll have strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab while also maintaining Jira and Confluence. The ideal candidate will proactively drive improvements, implement automation, and advocate for broader adoption of GitLab across the organization.We know that you...
-
Site Reliability Engineer
2 weeks ago
Washington, Washington, D.C., United States Metrostar Systems Full time $120,000 - $150,000 per yearAs Site Reliability Engineer, you'll have strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab while also maintaining Jira and Confluence. The ideal candidate will proactively drive improvements, implement automation, and advocate for broader adoption of GitLab across the organization.We know that you...
-
Site Reliability Operations Analyst
2 weeks ago
Washington, Washington, D.C., United States Palantir Technologies Full time $93,000 - $160,000 per yearA World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role As a Site Reliability Operations Analyst you are...