Sr. IT Linux Site Reliability Engineer

2 weeks ago

Town of Florida, United States SpaceX Full time

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. SR. LINUX SITE RELIABILITY ENGINEER SpaceX is looking for an experienced engineer with deep working knowledge of Kubernetes and related containerized technologies. This employee will be a member of the Information Technology Linux Infrastructure team and will provide expertise in Kubernetes design, maintenance, scaling and optimization in support of critical business functions. The ideal candidate will be flexible and flourish in a fast paced and challenging environment. They should be a self-starter, self-motivator and possess ingenuity to excel at this position. RESPONSIBILITIES: Install, manage, scale and optimize Kubernetes and RKE clusters using Ansible, Terraform and adjacent technologies in production environments. Work closely with other SpaceX engineers to gather requirements, research, evaluate, design, plan, deploy, and support software platforms and related technologies running in Kubernetes within a world‑class environment that meets the needs of the demanding SpaceX engineering teams. Build highly resilient, high‑performance, scalable, and robust systems. Exercise a high degree of personal responsibility for the processes, systems, and tools you create and manage; all supporting the goal of making humanity an interplanetary species. Make recommendations, justify, and implement improvements using an accepted change control methodology. Work within a diverse group to design and deliver creative solutions and resolve problems in a timely and proactive manner by interacting with internal business units. Define, document and follow standards and best practices for systems design, testing, and implementation. Foster an environment of collaboration and cross‑training, upskilling the team in Kubernetes expertise and ensuring peers are developed into capable engineers. Drive scripting, self‑service and automation to develop solutions to reduce administrative overhead and TOIL. Participate in on‑call rotation to handle urgent after‑hours work when necessary. BASIC QUALIFICATIONS: Bachelor's degree in Computer Science or a STEM discipline and 5+ years of systems engineering experience; OR 7+ years of systems engineering experience in lieu of a degree. Experience deploying and supporting Linux servers in physical and virtualized environments (e.g., VMware via automation). Experience with the Linux shell as well as configuring and extending Linux instances (e.g., kernel modules, cgroups, pki, iptables, interfaces). Experience supporting and scaling containerized applications in Linux environments. Experience using automation frameworks (e.g., Ansible, Terraform) to manage provisioning and post‑provisioning lifecycles of infrastructure and Kubernetes installations. PREFERRED SKILLS AND EXPERIENCE: Expertise in creating repeatable, reliable, scalable systems architectures, with high availability, fault tolerance, performance tuning, monitoring, and statistics/metrics collection. Expertise in source code version control tools such as Git and Subversion and collaborating on source code via Pull Requests and other Git‑based workflows. Strong understanding of Linux Container Runtime. Experience implementing configuration management provisioning and workflow automation solutions via Infrastructure as Code, CI/CD and GitOps (e.g., Ansible, AWX/Tower, Vagrant, Puppet, Redfish, Jenkins, cloud‑init, ArgoCD, etc). Experience writing test automation to ensure backwards compatibility of feature and change development for automation processes and Kubernetes deployments. Experience with programming and scripting languages such as Python and Golang to develop software solutions and integrate with external systems to implement automation against RESTful API services. Experience installing, configuring and troubleshooting Kubernetes internals, CNI, CRI and CSI plugins (e.g., Docker, Cri‑O, Ceph, Cilium), load balancing (e.g., MetalLB), Service Mesh (e.g., Istio) and software‑defined storage (e.g., rook‑ceph) in cloud or on‑premise environments. Experience developing solutions using Kubernetes patterns to extend system functionality and solve custom use cases (e.g., webhooks, controllers, operators, sidecars). Experience implementing proactive alert/monitoring workflows and dashboards for Linux systems and Kubernetes deployments using Prometheus, Grafana, InfluxDB or similar technologies. Experience with dynamic system configuration templating using Jinja, Jsonnet, YAML and Helm. ADDITIONAL REQUIREMENTS: Must be willing to work extended hours and weekends as needed. Ability to pass Air Force background check for Cape Canaveral. ITAR REQUIREMENTS: To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. * 1157, or (iv) Asylee under 8 U.S.C. * 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here. SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status. Applicants wishing to view a copy of SpaceX's Affiantive Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application/interview process should reach out to #J-18808-Ljbffr

Senior Linux SRE

2 weeks ago

Town of Florida, United States SpaceX Full time

A leading aerospace manufacturer is seeking a Sr. Linux Site Reliability Engineer in New York. The successful candidate will manage Kubernetes clusters, support critical systems, and collaborate with cross-functional teams. Ideal for self-starters with strong automation skills and a background in Linux system administration. Applicants should possess a...
Site Reliability Engineer

3 weeks ago

Town of Texas, United States Longbridge Securities Full time

Longbridge is a fast-growing online brokerage platform on a mission to make investing smarter, simpler, and more accessible for everyone. Overview We are looking for a hands-on Site Reliability Engineer (SRE) to design, scale, and safeguard the reliability of our next-generation financial platforms. This is a high-impact role where you’ll partner closely...
Blockchain Site Reliability Engineer

3 weeks ago

Town of Texas, United States Medium Full time

Job Position: Blockchain Site Reliability Engineer Location: Dallas, TX, USA (Remote Acceptable - USA Applicants Only) Company: Contact: About Company InfStones is an advanced, enterprise-grade Platform as a Service (PaaS) blockchain infrastructure provider trusted by the top blockchain companies in the world. InfStones’ AI-based infrastructure provides...
Site Reliability Engineer

3 weeks ago

Town of Florida, United States Optomi Full time

Overview This range is provided by Optomi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $145,000.00/yr - $160,000.00/yr Cloud & Infrastructure Technical Recruiter @ Optomi | Bachelor of Science Site Reliability Engineer Optomi, in partnership with a leading global media organization...
Sr. Site Reliability Engineer, 100% Remote Work

3 weeks ago

Town of Poland, United States PRIMUS Global Technologies Pvt Ltd Full time

Sr. Site Reliability Engineer, 100% Remote Work (Poland) 4 days ago Be among the first 25 applicants Sr. Site Reliability Engineer, 100% Remote Work 6 months contract to hire Bill Rate: $49.00/hr. USD (From Apex to PRIMUS US) – Cannot go above this bill rate Client is ABBYY Interview Process: 2 Technical Video Interview IMP NOTE: Candidates must be in...
Site Reliability Engineer

3 weeks ago

City of Albany, United States Canonical Full time

Join to apply for the Site Reliability Engineer role at Canonical Canonical is a leading provider of open source software and operating systems for the global enterprise and technology markets. Ubuntu, the company’s flagship platform, is widely used in breakthrough initiatives such as public cloud, data science, AI, engineering innovation, and IoT. With...
Site Reliability Engineer

3 weeks ago

Town of Florida, United States SS&C Technologies Full time

Job Description As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world’s largest companies to small and mid‑market firms, rely on SS&C for expertise, scale, and...
Platform - Site Reliability Engineer II (Networking)

3 weeks ago

Town of Florida, United States Elastic Full time

Platform - Site Reliability Engineer II (Networking) Join to apply for the Platform - Site Reliability Engineer II (Networking) role at Elastic . Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale—unleashing the potential of businesses and people. The Elastic Search AI Platform, used...
Site Reliability

3 weeks ago

City of Albany, United States Canonical Full time

Site Reliability / Gitops Engineer Apply for the Site Reliability / Gitops Engineer role at Canonical . Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering...
Site Reliability Engineer II

3 weeks ago

Town of Poland, United States Akamai Technologies GmbH Full time

Do you have a passion for cutting edge technologies and tackling system problems? Are you a self-starting professional who thrives in a dynamic environment? Join our highly skilled Site Reliability team Our Team builds and delivers highly secure network security frameworks to protect our customers. We collaborate to create next-generation initiatives...

Americas

Europe

Asia / Oceania

Africa

Sr. IT Linux Site Reliability Engineer