Site Reliability Engineer
2 weeks ago
Site Reliability Engineer Twenty is seeking a Site Reliability Engineer for an on‑site position at Fort Meade, MD to ensure the reliability, performance, and availability of our mission‑critical cyber technologies that protect democracies worldwide. We're looking for someone with 5+ years of experience in site reliability engineering, DevOps, and cloud operations, with deep expertise in AWS, Docker containerization, and secure enclave environments. In this role, you'll be the guardian of our AI‑powered graph database applications running in closed AWS environments, ensuring operational readiness for systems that process real‑time cyber operation data at machine speed. You'll monitor, troubleshoot, and optimize our containerized microservices architecture, implement robust monitoring and alerting systems, and serve as the critical link between our Arlington engineering teams and the operational requirements at Fort Meade. You'll join a world‑class product and engineering team that delivers mission‑critical solutions for U.S. national security, working in highly secure environments to maintain systems that operate at the speed of cyber warfare. If you're passionate about ensuring system reliability in high‑stakes environments while making a direct impact on national security, we want to talk to you. About the Company At Twenty, we're taking on one of the most critical challenges of our time: defending democracies in the digital age. We develop revolutionary technologies that operate at the intersection of cyber and electromagnetic domains, where the speed of operations exceeds human sensing and complexity transcends conventional boundaries. Our team doesn't just solve problems – we deliver game‑changing outcomes that directly impact national security. We're pragmatic optimists who understand that while our mission of protecting America and its allies is challenging, success is possible. System Reliability & Operations Ensure availability and performance of AI‑powered cyber applications running in secure AWS enclaves at Fort Meade Monitor and maintain Docker containerized microservices architecture across development, staging, and production environments Implement and manage comprehensive monitoring, logging, and alerting systems to proactively identify and resolve issues before they impact operations Infrastructure Management & Optimization Manage and optimize AWS infrastructure within closed enclave environments, ensuring compliance with government security requirements Automate deployment pipelines and infrastructure provisioning using Infrastructure as Code (IaC) principles Perform capacity planning and scaling operations to ensure systems can handle real‑time cyber operation data loads Lead incident response efforts for system outages or performance degradation, coordinating with Arlington engineering teams as needed Conduct root cause analysis for system failures and implement preventive measures to avoid recurrence Maintain detailed runbooks and documentation for operational procedures and emergency response protocols Serve as the primary technical liaison between Fort Meade operations and Twenty's Arlington engineering teams Work closely with government stakeholders to understand operational requirements and ensure system configurations meet mission needs Provide technical support and training to end users on system functionality and troubleshooting procedures Qualifications Technical Skills & Experience 5+ years of professional experience in site reliability engineering, DevOps, or cloud operations Expert‑level proficiency with Amazon Web Services (AWS) including EC2, ECS, RDS, CloudWatch, and networking services Advanced experience with Docker containerization and container orchestration platforms Strong knowledge of Linux/Unix systems administration and command‑line tools Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or similar) Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or similar) Knowledge of CI/CD pipelines and automated deployment practices Understanding of networking concepts, security groups, and VPC configurations in AWS environments Security & Compliance Experience working in secure, air‑gapped, or enclave environments Understanding of government security requirements and compliance frameworks Knowledge of container security best practices and vulnerability management Familiarity with logging and auditing requirements for government systems Operational Skills Strong troubleshooting and problem‑solving skills with ability to work under pressure Experience with incident management and on‑call responsibilities Proven ability to write clear technical documentation and runbooks Understanding of database administration and performance tuning (particularly graph databases like Neo4j) Education Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent practical experience Security Requirements Must have a TS/SCI security clearance Ability to work on‑site at Fort Meade, MD with occasional travel to Arlington, VA Distinguishing Qualifications Previous experience supporting mission‑critical systems in government or defense environments Background in cyber operations or intelligence systems support Experience with graph databases (Neo4j) and GraphQL APIs Knowledge of AI/ML system operations and monitoring Certifications in AWS, Kubernetes, or site reliability engineering Experience with NATS or other message queue systems Additional Skills Experience with Agile development methodologies and cross‑functional collaboration Knowledge of performance testing and load testing methodologies Understanding of disaster recovery and business continuity planning Scripting experience in Python, Bash, or Go Experience with configuration management tools (Ansible, Chef, Puppet) Familiarity with service mesh technologies and microservices patterns #J-18808-Ljbffr
-
Site Reliability Engineer
5 days ago
Arlington, United States Saxon Global Full timeTitle: Site Reliability EngineerLocation: Arlington, TX || Hybrid || Local onlyDuration: Contract-to-Hire || W2Job SummaryWe are looking for a strong Site Reliability Engineer with excellent hands-on experience in software development, Release Engineering, DevOps, SRE practices, and Azure Cloud. This role supports a major technology modernization project...
-
Site Reliability Engineer
5 days ago
Arlington, United States Prudent Technologies and Consulting, Inc. Full timeJob Title: Site Reliability Engineer IIILocation: Arlington, TX – 2 days OnsiteContract to hire – Visa Independent consultants JOB SUMMARY Job Scope:Collaboration / Architecture / Development – Partnering with Architecture / Development Teams, Ensuring Applications Highly Available / Reliable / Performant at Global ScaleReliability Guidance –...
-
Site Reliability Engineer
22 hours ago
Arlington, United States Prudent Technologies and Consulting, Inc. Full timeJob Title: Site Reliability Engineer IIILocation: Arlington, TX – 2 days OnsiteContract to hire – Visa Independent consultants JOB SUMMARY Job Scope:Collaboration / Architecture / Development – Partnering with Architecture / Development Teams, Ensuring Applications Highly Available / Reliable / Performant at Global ScaleReliability Guidance –...
-
Site Reliability Engineer
4 days ago
Arlington, TX, United States Diverse Lynx Full timeJob Title : Site Reliability Engineer (SRE) Location : Arlington, TX (Day-1 Onsite) Type : Contract Position Job Description: Responsible for building self-service capabilities through automation, creating and documenting standards for SRE practices, and conducting compliance reviews and remediation of Prisma Cloud and Azure Policy violations. Main...
-
Lead Site Reliability Engineer
5 days ago
Arlington, TX, United States Prudent Technologies and Consulting, Inc. Full time $120,000 - $180,000 per yearJob Title: Lead Site Reliability Engineer (SRE)Location: Hybrid – Arlington, TX (2 days onsite / 3 days remote)Job Type: Contract-to-Hire (conversion expected within 6–12 months)Job Requirements:Programming / Scripting Background – Java / C# (.NET MVC / .NET Core) / Go | PowerShell / BashSite Reliability Engineer – Identifying / Delivering Automation...
-
Site Reliability Engineer
22 hours ago
Arlington, TX, United States Prudent Technologies and Consulting, Inc. Full timeJob Title : Site Reliability Engineer III Location : Arlington, TX 2 days Onsite Contract to hire Visa Independent consultants JOB SUMMARY Job Scope: Collaboration / Architecture / Development Partnering with Architecture / Development Teams, Ensuring Applications Highly Available / Reliable / Performant at Global Scale Reliability Guidance Collaborating...
-
Lead Site Reliability Engineer
3 weeks ago
Arlington, United States Prudent Technologies and Consulting, Inc. Full timeLead Site Reliability Engineer (Hybrid – Arlington, TX) We are hiring a Lead Site Reliability Engineer (SRE) to help drive this transformation. This is a technical leadership role (no people management) where you will guide engineering teams in reliability, automation, observability, and cloud-native architecture using Azure. What You’ll Do Lead...
-
Site Reliability Engineer II
1 week ago
Arlington, TX, United States GM Financial Full timeJob DescriptionWhy GMF Technology? Innovation isn't just a talking point at GM Financial, it's how we operate. From generative AI and cloud-native technologies to peer-led learning and hackathons, our tech teams are building real solutions that make a difference. We're committed to AI-powered transformation, using advanced machine learning and automation to...
-
Site Reliability Engineer I
16 hours ago
Arlington, United States GM Financial Full timeWhy GMF Technology?Innovation isn't just a talking point at GM Financial, it's how we operate. From generative AI and cloud-native technologies to peer-led learning and hackathons, our tech teams are building real solutions that make a difference. We're committed to AI-powered transformation, using advanced machine learning and automation to help us...
-
Senior Site Reliability Engineer
3 weeks ago
Arlington, United States Ten Mile Square Technologies Full timeCompany DescriptionTen Mile Square Technologies is a high-end technology consulting firm based in the Northern Virginia area. Our customers routinely call upon us to solve some of the largest scale and hardest problems in computer science and software development. If you have a solid grounding in software engineering, continuous delivery, and computer...