Site Reliability Engineer

2 hours ago


Washington DC United States Alldus International Consulting Ltd Full time

Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry.

Responsibilities:

  1. As the Site Reliability Engineer, you will perform root cause analysis to identify and resolve system or application issues in a timely and effective manner.
  2. You will design and implement a broad range of automated tests to ensure system reliability and performance.
  3. Building scalable and cost-effective observability patterns in Datadog or other monitoring providers.
  4. Monitor and analyze SLIs to ensure adherence to SLAs and SLOs.
  5. Collaborate with development and operations teams to improve system reliability and developer experience.
  6. Develop and maintain monitoring and alerting systems to proactively address issues.
  7. Implement best practices for incident management and disaster recovery.
  8. Plan and implement capacity upgrades, ensuring scalability and performance.
  9. Define, monitor, and manage SLAs, ensuring service levels meet or exceed expectations.
  10. Ensure systems comply with security and regulatory requirements.

Skillset:

  1. Experienced in Kubernetes and Helm.
  2. Expertise in observability and monitoring tools such as Prometheus, Grafana, Datadog, or Elk.
  3. Experience in Azure cloud.
  4. Strong understanding of microservices architecture, including Postgres and AI systems.
  5. Expertise in automated testing frameworks and tools.
  6. Experience with monitoring and analytics tools to track SLIs, SLAs, and SLOs.
  7. Excellent problem-solving skills and attention to detail. Tenacious attitude.
  8. Proficiency in programming languages such as TypeScript and Python.
  9. Strong scripting skills in Bash, PowerShell, or similar.
  10. Understanding of networking principles and experience with network troubleshooting.

This is a full-time, remote position and is only open to US Citizens due to potential security clearance requirements.

Benefits:

  1. Salary: $140k – $175k.
  2. Stock options.
  3. Benefits package.

Interested? Apply now in the link below or email your resume directly to for consideration.

44985

#J-18808-Ljbffr

  • Washington, United States Alldus International Consulting Ltd Full time

    Our client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry.Responsibilities:As the Site Reliability Engineer, you will perform...


  • Washington, United States Google Inc. Full time

    Site Reliability Engineering, Transformative Compute Site Reliability Engineeringcorporate_fare Google place Warsaw, PolandApplyMinimum Qualifications:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.5 years of experience building and developing infrastructure, distributed systems or networks, or experience with...


  • Sunnyvale, CA, United States Natcast, Inc. Full time

    Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (NSTC) consortium, established by the CHIPS Act of the U.S. government. Working at Natcast represents an opportunity to help extend America’s leadership in...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time RemoteAdvance Your Career with Cutting-Edge Infrastructure at Harbor ComplianceAbout Harbor Compliance:Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to grow, we seek a Site Reliability Engineer who is passionate...


  • Washington, United States Harbor Compliance Full time

    Site Reliability Engineer - Full-time RemoteAdvance Your Career with Cutting-Edge Infrastructure at Harbor ComplianceAbout Harbor Compliance:Harbor Compliance is committed to simplifying the regulatory challenges of businesses and nonprofits through innovative technology solutions. As we continue to grow, we seek a Site Reliability Engineer who is passionate...


  • Annapolis Junction, MD, United States Maximus Full time

    General information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...


  • Annapolis Junction, MD, United States Maximus Full time

    General information ...


  • Duluth, GA, United States BlueSky Resource Solutions Full time

    Job Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...


  • Miami, FL, United States Royal Caribbean Group Full time

    Site Reliability Engineer Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to be the...


  • Fairfax, VA, United States Apex Systems Full time

    We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...


  • Redwood City, CA, United States C3 AI Full time

    We are looking for an Associate Site Reliability Engineer / Site Reliability Engineer to join our team at our HQ in Redwood City, CA. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services...


  • Newton, MA, United States Intelliswift Software Full time

    Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...


  • Portland, OR, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Portland, OregonDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • Indianapolis, IN, United States BCforward Full time

    Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...


  • Sunnyvale, CA, United States Apple Inc. Full time

    To view your favorites, sign in with your Apple Account. Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products —...


  • Indianapolis, IN, United States BCforward Full time

    Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...


  • Sunnyvale, CA, United States Microsoft Full time

    There has never been a more exciting time to be working in healthcare at Microsoft. Our Health & Life Sciences Solutions organization is an interdisciplinary team of product managers, designers, engineers, and clinicians who are designing, developing and deploying next-generation healthcare solutions powered by the Microsoft Cloud for healthcare...


  • Miami, FL, United States INSPYR Solutions Full time

    Title: Site Reliability Engineer Make sure to apply quickly in order to maximise your chances of being considered for an interview Read the complete job description below. Location: Miami, FL Duration: 6+ months Compensation: $55.00 -60.00 Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S. Site Reliability...


  • Austin, TX, United States Sustainable Talent Full time

    Join Sustainable Talent as an Engineering Technician (Site Reliability Engineer) supporting Nvidia and their IPP Platform Group (Infrastructure, Planning and Process)! This is a W-2 full-time contract with openings in Hillsboro, OR and Austin, TX. We offer competitive pay $35-45/hourly based on factors like experience, education, location, etc. and provide...


  • Washington, DC, United States Palantir Technologies Full time

    Site Reliability Engineer - Security Infrastructure Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Our products...