Site Reliability Engineer

2 days ago


Washington, Washington, D.C., United States Alldus Full time
Site Reliability Engineer

Alldus is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.

Key Responsibilities:
  • Perform root cause analysis to identify and resolve system or application issues in a timely and effective manner.
  • Design and implement automated tests to ensure system reliability and performance.
  • Build scalable and cost-effective observability patterns in Datadog or other monitoring providers.
  • Monitor and analyze SLIs to ensure adherence to SLAs and SLOs.
  • Collaborate with development and operations teams to improve system reliability and developer experience.
  • Develop and maintain monitoring and alerting systems to proactively address issues.
  • Implement best practices for incident management and disaster recovery.
  • Plan and implement capacity upgrades, ensuring scalability and performance.
  • Define, monitor, and manage SLAs, ensuring service levels meet or exceed expectations.
  • Ensure systems comply with security and regulatory requirements.
Requirements:
  • Experience in Kubernetes and Helm.
  • Expertise in observability and monitoring tools such as Prometheus, Grafana, Datadog, or Elk.
  • Experience in Azure cloud.
  • Strong understanding of microservices architecture, including Postgres and AI systems.
  • Expertise in automated testing frameworks and tools.
  • Experience with monitoring and analytics tools to track SLIs, SLAs, and SLOs.
  • Excellent problem-solving skills and attention to detail. Tenacious attitude.
  • Proficiency in programming languages such as TypeScript and Python.
  • Strong scripting skills in Bash, PowerShell, or similar.
  • Understanding of networking principles and experience with network troubleshooting.
What We Offer:
  • Salary: $140k – $175k.
  • Stock options.
  • Benefits package.


  • Washington, Washington, D.C., United States MetroStar Corporation Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at MetroStar Corporation. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to...


  • Washington, Washington, D.C., United States System One Full time

    Job Title: Site Reliability EngineerAt System One, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, performance, and scalability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to identify...


  • Washington, Washington, D.C., United States Veterans Enterprise Technology Solutions Full time

    Job Title: Site Reliability EngineerOverview:Veterans Enterprise Technology Solutions is seeking a highly skilled Site Reliability Engineer to join our team. This role will be responsible for ensuring the reliability and performance of our cloud-based infrastructure. The ideal candidate will have a strong understanding of SRE principles and experience with...


  • Washington, Washington, D.C., United States MetroStar Systems Full time

    Job Title: Site Reliability EngineerAt MetroStar Systems, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Monitor and analyze system performance to identify areas...


  • Washington, Washington, D.C., United States Varada Consulting, LLC Full time

    Job Title: Site Reliability EngineerVarada Consulting, LLC is seeking a highly skilled and experienced Site Reliability Engineer to join our team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications through automation, monitoring, and infrastructure improvements.Key...


  • Washington, Washington, D.C., United States MetroStar Systems Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at MetroStar Systems. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to identify...


  • Washington, Washington, D.C., United States Alldus Full time

    Site Reliability EngineerAlldus is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems.Key Responsibilities:Perform root cause analysis to identify and resolve system or application issues in a timely and...


  • Washington, Washington, D.C., United States Veterans Enterprise Technology Solutions Full time

    Job Title: Site Reliability EngineerOverview:Veterans Enterprise Technology Solutions is seeking a highly skilled Site Reliability Engineer to join our team. This role will involve working on a rotating hybrid schedule, with 3 days onsite at JBAB and 2 days remote. An Active Top Secret SCI clearance is required for this position.Responsibilities:Monitor and...


  • Washington, Washington, D.C., United States Tik Tok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our software systems.ResponsibilitiesWork with infrastructure, product, and platform engineering teams to operate and deploy software platforms, capacity planning,...


  • Washington, Washington, D.C., United States CloudFit Software Full time

    Job Title: Site Reliability EngineerCloudFit Software is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the quality, performance, and reliability of our CloudFit Managed Applications and Services systems.Key Responsibilities:Collaborate with cross-functional teams...


  • Washington, Washington, D.C., United States Cinder LLC Full time

    About Cinder LLCCinder LLC provides a cutting-edge investigation platform to protect the internet.Our software helps Trust and Safety teams at the world's most influential companies innovate and adapt quickly to emerging threats.Job Title: Site Reliability EngineerWe're seeking an experienced Site Reliability Engineer to lead the development and deployment...


  • Washington, Washington, D.C., United States MetroStar Corporation Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at MetroStar Corporation. As a key member of our team, you will be responsible for driving improvements in observability, performance, and reliability of our systems.Key Responsibilities:Monitor and analyze platform and containerized applications to...


  • Washington, Washington, D.C., United States Cinder LLC Full time

    About Cinder LLCCinder LLC is a cutting-edge investigation platform that protects the internet. Our software helps Trust and Safety teams at influential companies innovate and adapt quickly to emerging threats.We're seeking an experienced Site Reliability Engineer to lead the development and deployment of our robust infrastructure.Job...


  • Washington, Washington, D.C., United States Microsoft Full time

    Job Title: Site Reliability Engineer IIMicrosoft is seeking a highly skilled Site Reliability Engineer II to join our team. As a Site Reliability Engineer II, you will be responsible for designing, developing, and delivering software engineering solutions to serve and protect O365 government clouds.Key Responsibilities:Design, develop, and deploy software...


  • Washington, Washington, D.C., United States Palantir Technologies Full time

    {"title": "Site Reliability Engineer", "description": "Job SummaryPalantir Technologies is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications.Key ResponsibilitiesCollaborate with cross-functional teams...


  • Washington, Washington, D.C., United States Palantir Technologies Full time

    About the RoleWe're looking for a skilled Site Reliability Engineer to join our team at Palantir Technologies. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesMaintain the availability of cloud and physical Linux servers that power...


  • Washington, Washington, D.C., United States MetroStar Systems Full time

    Transforming Government Services with Reliability and PerformanceAs a Site Reliability Engineer at MetroStar Systems, you will play a pivotal role in driving improvements in observability, performance, and reliability across high-level government platforms. Your expertise will be instrumental in making a lasting impact.Key Responsibilities:Monitor and...


  • Washington, Washington, D.C., United States MetroStar Corporation Full time

    MetroStar Corporation is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our organization, you will play a critical role in driving improvements in observability, performance, and reliability across our systems.**Key Responsibilities:*** Monitor and analyze platform and containerized applications to identify...


  • Washington, Washington, D.C., United States DataRobot Full time

    Job Title: Director of Site Reliability Engineering Job Summary: DataRobot is seeking a highly skilled and experienced Director of Site Reliability Engineering to lead our SRE team. As a key member of our engineering organization, you will be responsible for ensuring the reliability, scalability, and performance of our platform. Key Responsibilities: *...


  • Washington, Washington, D.C., United States Oracle Full time

    Job DescriptionOracle Health Applications & Infrastructure (OHAI) is seeking a highly skilled Site Reliability Engineer to join its OHAI Platform & Production Engineering organization.This is a unique opportunity to work on a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment.As a Site...