Site Reliability Engineer

1 month ago


Washington, United States Alldus International Consulting Ltd Full time

Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry.

Responsibilities:

  1. As the Site Reliability Engineer, you will perform root cause analysis to identify and resolve system or application issues in a timely and effective manner.
  2. You will design and implement a broad range of automated tests to ensure system reliability and performance.
  3. Building scalable and cost-effective observability patterns in Datadog or other monitoring providers.
  4. Monitor and analyze SLIs to ensure adherence to SLAs and SLOs.
  5. Collaborate with development and operations teams to improve system reliability and developer experience.
  6. Develop and maintain monitoring and alerting systems to proactively address issues.
  7. Implement best practices for incident management and disaster recovery.
  8. Plan and implement capacity upgrades, ensuring scalability and performance.
  9. Define, monitor, and manage SLAs, ensuring service levels meet or exceed expectations.
  10. Ensure systems comply with security and regulatory requirements.

Skillset:

  1. Experienced in Kubernetes and Helm.
  2. Expertise in observability and monitoring tools such as Prometheus, Grafana, Datadog, or Elk.
  3. Experience in Azure cloud.
  4. Strong understanding of microservices architecture, including Postgres and AI systems.
  5. Expertise in automated testing frameworks and tools.
  6. Experience with monitoring and analytics tools to track SLIs, SLAs, and SLOs.
  7. Excellent problem-solving skills and attention to detail. Tenacious attitude.
  8. Proficiency in programming languages such as TypeScript and Python.
  9. Strong scripting skills in Bash, PowerShell, or similar.
  10. Understanding of networking principles and experience with network troubleshooting.

This is a full-time, remote position and is only open to US Citizens due to potential security clearance requirements.

Benefits:

  1. Salary: $140k – $175k.
  2. Stock options.
  3. Benefits package.

Interested? Apply now in the link below or email your resume directly to matthew@alldus.com for consideration.

44985

#J-18808-Ljbffr

  • Washington, United States OpenAI Full time

    About the Team Join the engineering teams that bring OpenAI's ideas safely to the world!! The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time RemoteAdvance Your Career with Cutting-Edge Infrastructure at Harbor ComplianceAbout Harbor Compliance:Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to grow, we seek a Site Reliability Engineer who is passionate...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time RemoteAdvance Your Career with Cutting-Edge Infrastructure at Harbor ComplianceAbout Harbor Compliance:Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to grow, we seek a Site Reliability Engineer who is passionate...


  • Washington, DC, United States Alldus International Consulting Ltd Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...


  • Washington, United States ZipRecruiter Full time

    Job DescriptionJob DescriptionWe are seeking a skill, legally authorized to work in the US Cloud Site Reliability Engineer. Do you have an interest in Infrastructure Engineering, software architecture design and cloud computing? SRE/Cloud Engineers are responsible for creating infrastructure designs and guiding the development and implementation of cloud...


  • Washington, United States ZipRecruiter Full time

    Job DescriptionWe are seeking a skilled Cloud Site Reliability Engineer, legally authorized to work in the US. Do you have an interest in Infrastructure Engineering, software architecture design, and cloud computing? SRE/Cloud Engineers are responsible for creating infrastructure designs and guiding the development and implementation of cloud applications,...


  • Washington, United States Palantir Technologies Full time

    A World-Changing CompanyPalantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.The RoleWe’re looking for Site Reliability Engineers who...


  • Washington, Washington, D.C., United States AlmrStaffing Full time

    Site Reliability Engineer - TS/SCIWashington D.C.Full TimeITMid LevelWe are seeking a skilled Site Reliability Engineer (SRE) to drive continuous improvements in observability, performance, and reliability for our federal government client. The ideal candidate will ensure robust and reliable technology services, enhancing the overall customer experience.The...


  • Washington, United States Sparibis Full time

    About the PositionWe are seeking an experienced Senior Site Reliability Engineer to join our team at Sparibis. As a key member of our technology group, you will be responsible for ensuring the stability and availability of our cloud-based systems. With a strong background in software engineering and DevOps, you will design and implement end-to-end continuous...


  • Washington, United States Palantir Technologies Full time

    A World-Changing CompanyPalantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.The RolePalantir has been selected as the prime contractor...


  • Washington, United States Infoblox Full time

    At Infoblox, we're revolutionizing cloud-first networking and security services. As a Top 25 Cyber Security Company and one of Inc.'s Best Workplaces for 2020, our solutions empower organizations to deliver seamless network experiences. Our customers are among the largest enterprises worldwide, and we're seeking talented individuals to join our Incident...


  • Washington, United States Harbor Compliance Full time

    Harbor Compliance is a leading provider of innovative technology solutions for businesses and nonprofits. We are committed to simplifying regulatory challenges through cutting-edge infrastructure.About the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a key member of our IT Services department, you will be responsible for...


  • Washington, United States Sparibis Full time

    Location: 100% remote Years' Experience: 10+ Year's of experience Education: Bachelor's degree Work Authorization: United States Citizenship is required as part of the eligibility criteria to be able to obtain a security clearance. Clearance: Applicants must be able to obtain and maintain a Public Trust security clearance. Key Skills: Must experience...


  • Washington, United States CoStar Realty Information, Inc. Full time

    Job DescriptionAs a Senior Site Reliability Engineer at CoStar Realty Information, Inc., you will play a crucial role in improving the availability, reliability, and performance of our applications. Our team is responsible for ensuring that our software systems are scalable, secure, and efficient. If you have expertise in designing, analyzing,...

  • Reliability Engineer

    4 weeks ago


    Washington, United States Saint-Gobain Full time

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of safety, environmental impact, quality, service, and efficiency standards within...

  • Reliability Engineer

    4 weeks ago


    Washington, United States Saint Gobain Full time

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of safety, environmental impact, quality, service, and efficiency standards within...

  • Reliability Engineer

    4 weeks ago


    Washington, United States Saint Gobain Glass Full time

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of safety, environmental impact, quality, service, and efficiency standards within...

  • Reliability Engineer

    1 month ago


    Washington, United States Saint-Gobain Full time

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of safety, environmental impact, quality, service, and efficiency standards within...

  • Reliability Engineer

    1 month ago


    Washington, United States Northern Star Mining Services Limited Full time

    Ready to pursue your professional journey with Northern Star? As an ASX 50 global-scale gold miner, we have sizeable operations in Western Australia and Alaska. With unparalleled pathways for advancement and avenues for personal growth, we stand as Australia’s premier gold employer. Your journey starts here.At Northern Star, we live by our STARR Core...

  • Reliability Engineer

    4 weeks ago


    Washington, United States Saint Gobain Glass Full time

    Consistent with CertainTeed Gypsum Vision, Mission, Values and Objectives, the Reliability Engineer identifies and quantifies Line 1 and Line 2 root cause failure(s), and drives permanent solutions to address systemic or chronic mechanical deficiencies to world class levels of safety, environmental impact, quality, service, and efficiency standards within...