Senior Site Reliability Engineer
3 days ago
Role:
Northwood is looking for a Senior Site Reliability Engineer to architect and lead the monitoring and reliability systems that keep satellites connected to Earth. As we rapidly scale our ground station network across multiple continents, you'll design and build the observability infrastructure that ensures our space communications systems operate 24/7 for customers ranging from commercial satellite operators to national security missions.
This is a high-impact leadership role where you'll architect global-scale reliability platforms while mentoring junior engineers and establishing SRE practices across the organization. You'll work directly with our founding engineering team and department heads to define the monitoring, alerting, and deployment strategies that will scale with us from startup to enterprise. If you're excited about space technology and want to architect infrastructure that directly supports mission-critical satellite operations while building and leading technical teams, this role offers that opportunity.
Responsibilities:
- Architect and maintain enterprise observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) monitoring ground stations, satellite communications, and multi-region AWS infrastructure
- Design SRE practices, error budgets, and SLO/SLI frameworks for mission-critical satellite systems with 99.9%+ uptime requirements
- Build advanced AWS infrastructure with Terraform, implementing multi-region reliability, automated scaling, and disaster recovery for ground station operations
- Lead CI/CD pipeline architecture using GitLab and ArgoCD with advanced deployment strategies for mission-critical software releases
- Mentor junior engineers and establish reliability standards across the growing engineering organization
- Design comprehensive Kubernetes deployments with Helm, focusing on high availability and zero-downtime operations
- Lead incident response, conduct post-mortems, and drive systematic reliability improvements
Basic Qualifications
- 5-8 years of production infrastructure and SRE experience with demonstrated leadership in reliability improvements and team mentorship
- Expert-level experience with Kubernetes, Docker, and container orchestration in large-scale production environments
- Strong background in infrastructure as code (Terraform) and advanced CI/CD practices with experience mentoring others on these technologies
- Advanced AWS experience including multi-region architectures, networking, security, and cost optimization, with demonstrated ability to architect complex cloud solutions
- Proven track record of leading technical projects from conception to production in fast-moving, high-growth environments
- Deep understanding of SRE principles, error budgets, SLOs/SLIs, and experience implementing reliability frameworks across engineering organizations
Preferred Qualifications
- Production experience architecting and scaling observability tools (Vector, Loki, Grafana, Prometheus, VictoriaMetrics) in high-throughput environments
- Advanced experience with HashiCorp Vault, Okta, and enterprise identity/secrets management systems including policy design and implementation
- Previous experience scaling infrastructure and leading technical teams at high-growth companies (startup to 500+ employees)
- AWS Professional certification or equivalent demonstrated expertise with advanced cloud networking, security, and compliance frameworks
- Strong Linux system administration and networking expertise with experience troubleshooting complex distributed systems
- Background in aerospace, telecommunications, defense contracting, or other mission-critical, highly regulated industries
- Experience with ITAR, NIST , or other defense/aerospace compliance requirements
Compensation Range: $140K - $170K
-
Senior Site Reliability Engineer
1 week ago
Los Angeles, California, United States Rockwoods Inc Full time $200,000 - $250,000 per yearJob Title: Senior Site Reliability Engineer (SRE) – Healthcare DomainLocation:Pleasanton, CA (Onsite – 5 days/week)Type:ContractIndustry:Healthcare / Medical DevicesPosition SummaryWe are seeking a highly skilledSenior Site Reliability Engineer (SRE)to support launch readiness and post-launch operations for healthcare applications. This role requires...
-
Senior Site Reliability Engineer
1 day ago
Los Angeles, California, United States northwoodspace Full timeRole:Northwood is looking for a Senior Site Reliability Engineer to architect and lead the monitoring and reliability systems that keep satellites connected to Earth. As we rapidly scale our ground station network across multiple continents, you'll design and build the observability infrastructure that ensures our space communications systems operate 24/7...
-
Site Reliability Engineer
4 days ago
Los Angeles, California, United States Axiom Software Solutions Limited Full time $60,000 - $120,000 per yearRole: Site Reliability Engineer (SRE)Location: Miami FL – OnsitePosition Type: ContractRequired Skills & Qualifications• years of experience in Site Reliability Engineering, DevOps, or similar role.• Strong experience with Linux/Unix systems administration and troubleshooting.• Proficiency in at least one scripting or programming language...
-
Los Angeles, California, United States LTIMindtree Full time $120,000 - $180,000 per yearAbout Us:LTIMindtreeis a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology expertise...
-
Site Reliability Engineer
7 days ago
Los Angeles, California, United States Longbridge Securities Full time $120,000 - $180,000 per yearAbout UsLongbridge is a fast-growing online brokerage platform on a mission to make investing smarter, simpler, and more accessible for everyone.As part of our global expansion, we're looking for ahands-on Site Reliability Engineer (SRE)to design, scale, and safeguard the reliability of our next-generation financial platforms. This is a high-impact role...
-
Los Angeles, California, United States LTIMindtree Full time $200,000 - $250,000 per yearAbout Us:LTIMindtreeis a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology expertise...
-
Senior Site Reliability Engineer
7 days ago
Los Angeles, California, United States Lambda Full time $150,000 - $250,000 per yearLambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda's mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU.If you'd like to build the world's best deep learning cloud, join us. *Note: This position requires presence in our...
-
Senior Reliability Engineer
2 days ago
Los Angeles, California, United States ZEMLOCK LLC Full timeWhere You Will Work Our global headquarters is in Phoenix, Arizona. Several hundred employees support global operations in finance, human resources, information technology, planning and more from the main office, satellite offices or online. As a Hybrid employee, you'll engage in virtual collaboration as well as attend in-person meetings at our...
-
Reliability Engineer
4 days ago
Los Angeles, California, United States Northrop Grumman Full timeRELOCATION ASSISTANCE: Relocation assistance may be availableCLEARANCE TYPE: SecretTRAVEL: Yes, 10% of the TimeDescriptionAt Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at...
-
Product Reliability Engineer
3 days ago
Los Angeles, California, United States Divergent Full time $106,000 - $168,300 per yearDivergent is a technology company that has architected, invented, built, and commercialized an end-to-end factory system called the Divergent Adaptive Production System (DAPS) that comprehensively uses machine learning to optimally engineer, additively manufacture, and flexibly assemble complex integrated vehicle structures and subsystems. Products created...