Site Reliability Engineer

1 week ago


Washington, United States Palantir Technologies Full time
About the Role

We are seeking a highly skilled Site Reliability Engineer to join our team at Palantir Technologies. As a Site Reliability Engineer, you will play a critical role in building, operating, and maintaining high-performance, scalable, and reliable services for our production infrastructure, across both cloud and on-prem environments.

Key Responsibilities
  1. Maintain Infrastructure Uptime: Ensure the availability of cloud and physical Linux servers that power the Palantir platform in air-gapped production environments.
  2. Design and Deploy Infrastructure: Design, deploy, and operate infrastructure to support customer and product requirements via modern orchestration and monitoring platforms.
  3. Collaborate with Product Teams: Work closely with product teams on requirements and SLOs for deploying software into air-gapped environments.
  4. Troubleshoot and Solve Issues: Identify, troubleshoot, and solve network and systems issues.
  5. Automate Routine Tasks: Script to automate away routine operational tasks.
What We Value
  1. Active Security Clearance: Possess an active US Security clearance, or eligibility and willingness to obtain a US Security clearance.
  2. Troubleshooting Skills: Demonstrate confidence in troubleshooting complex systems issues independently using stack traces and observability and systems tools.
  3. Infrastructure Management: Comfort with managing large-scale production systems and technologies with configuration management, load balancing, monitoring and alerting infrastructure, and container orchestration.
  4. Continuous Learning: Demonstrate the ability to continuously learn and work independently, making decisions with minimal supervision while working in secure facilities.
  5. Container and Orchestration Experience: Experience with containers (Docker/Podman) and orchestration (OpenShift/Kubernetes) at scale is a plus.
  6. Preferred Certifications: DOD 8570 IAT Level II or greater (CISSP, Sec+), Unix/Linux Computing Environment (e.g Linux+, RHCE).
  7. Scripting Skills: Proficiency with scripting in Python or Go is a plus.
Requirements
  1. Linux System Administration: 5+ years of experience with Linux system administration (RHEL or equivalent preferred).
  2. Cloud and Hardware Experience: Experience with cloud-based hosting platforms like AWS, Azure, or GCP and/or experience with hardware-based environments.
  3. Monitoring Systems: Familiarity with monitoring systems using tools like Prometheus and writing health checks.


  • Washington, United States Cinder LLC Full time

    [Full Time] Site Reliability Engineer at Cinder (United States) Site Reliability Engineer Cinder United States Date Posted: 31 Oct, 2022 Work Location: Washington, DC, United States Salary Offered: $110 — $220 yearly Job Type: Full Time Experience Required: 1+ years Remote Work: Yes Stock Options: No Vacancies: 1 available About Cinder Cinder provides a...


  • Washington, United States Varada Consulting Full time

    Site Reliability EngineerJob Location-Washington, DC; Hybrid Overview:Varada Consulting, LLC is seeking a full-time highly skilled and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications through automation, monitoring, and...


  • Washington, United States Alldus Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...


  • Washington, United States StaffWorthy Inc. Full time

    We are a leading technology services provider with a rich history of assembling exceptional teams dedicated to delivering outstanding solutions. For over two decades, we have been committed to excellence, with a mission centered around our passion for our people and the value they deliver to our customers. Responsibilities Monitor platform and containerized...


  • Washington, United States System One Full time

    Site Reliability Engineer Work Location: 3 days onsite DC - JBAB, 2 days remote Clearance: Active TS/SCI with ability to clear PSD As a Site Reliability Engineer (SRE), you’ll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the federal government. What You’ll Do Monitor platform and...


  • Washington, United States StaffWorthy Inc. Full time

    We are a leading technology services provider with a rich history of assembling exceptional teams dedicated to delivering outstanding solutions. For over two decades, we have been committed to excellence, with a mission centered around our passion for our people and the value they deliver to our customers.ResponsibilitiesMonitor platform and containerized...


  • Washington, United States Kansas Action for Children, Inc Full time

    at T-Mobile USA, Inc. in Overland Park, Kansas, United States Job DescriptionBe unstoppable with us!T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as...


  • Washington, United States Mount Indie Full time

    Job DescriptionJob DescriptionAs aSite Reliability Engineer (SRE), youll continuously drive improvements in observability, performance, and reliability,with the goal to make an impact across the federal government. This role requires a current TS/SCI that has been obtained within the last 51 months and the ability to pass additional background...


  • Washington, United States CruitZi, INC Full time

    Job DescriptionJob DescriptionOur Client is currently hiring a full-time Sr. Site Reliability Engineer (SRE), who will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the federal government.This role is Hybrid, requiring travel to downtown Washington, DC, at...


  • Washington, United States Kansas Action for Children, Inc Full time

    at T-Mobile USA, Inc. in Overland Park, Kansas, United StatesJob DescriptionBe unstoppable with us!T-Mobile is synonymous with innovation-and you could be part of the team that disrupted an entire industry! We reinvented customer service, brought real 5G to the nation, and now we're shaping the future of technology in wireless and beyond. Our work is as...


  • Washington, United States Karsun Solutions Full time

    About the RoleWe are seeking a highly skilled and experienced Site Reliability Engineering Manager to join our team at Karsun Solutions. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services.Key Responsibilities:Lead a team of engineers in designing, implementing, and maintaining robust...


  • Washington, United States Veterans Enterprise Technology Solutions Full time

    Job Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our systems and applications.Key Responsibilities:Monitor and analyze system performance to identify...


  • Washington, United States TEKsystems Full time

    **Job Summary**One of the largest financial institutions in Japan is seeking a highly skilled DevOps/Site Reliability Engineer to join a large-scale migration project. As a key member of the team, you will be responsible for designing and implementing the pipeline architecture for the migrations. This is an exciting opportunity to join a leading organization...


  • Washington, United States Veterans Enterprise Technology Solutions Full time

    Job Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Veterans Enterprise Technology Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our infrastructure.Key Responsibilities:Monitor and Maintain Infrastructure: Continuously monitor our...


  • Washington, United States GitLab Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Gitaly team at GitLab. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our Gitaly service, which is responsible for storing and managing Git data for our users.Key ResponsibilitiesWork with peer SREs to...


  • Washington, United States Kansas Action for Children, Inc Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Kansas Action for Children, Inc. in Overland Park, Kansas, United States.This is an exciting opportunity for a technical professional who is passionate about innovation and wants to be part of a team that is reshaping the future of technology in the wireless...


  • Washington, United States Red Frog Solutions Full time

    Site Reliability Engineer - SRE - (TS/SCI) Full Time Perm Washington D.C. (Hybrid - 3 days onsite, 2 days remote) $180K - $200K Salary Plus Competitive Benefits As a Site Reliability Engineer (SRE), you will play a vital role in continuously driving improvements in observability, performance, and reliability, aiming to make a substantial impact across the...


  • Washington, United States Karsun Solutions Full time

    We are seeking a highly skilled and experienced Site Reliability Manager to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our systems and services. They will lead a team of engineers in designing, implementing, and maintaining robust infrastructure and automation solutions. The ideal...


  • Washington, United States Tik Tok Full time

    About the RoleTikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our platform.Key ResponsibilitiesCollaborate with infrastructure, product, and platform engineering teams to operate and...


  • Washington, United States MetroStar Systems Full time

    ***$25k Sign-On Bonus for this role*** As a Site Reliability Engineer (SRE) , you'll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the highest levels of government. If you think you can see yourself delivering our mission and pursuing our goals with us, then check out the job...