Site Reliability Engineer

3 weeks ago


San Francisco, United States Writemed Full time

About UsWould you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care pathways? WriteMed.AI helps Biopharma and Life Sciences companies reduce time to write medical publications and regulatory paperwork.Site Reliability EngineerLocation: Atlanta, GA; Miami, FL; Cambridge, MA; San Francisco, CA; Towson, MDRole OverviewOur technical team supports our customers’ missions with a spirit of innovation across all technologies, including AI, GenAI, LLM, Compute, Storage, Database, Big Data, Application-level Services, Networking, Serverless, Deployment, Security, and more. This is an opportunity to partner with our principal AI Architects, Data Scientists, and Engineers to maintain a robust and secure technical foundation for our customers, ranging from small Biotech companies to large Pharmaceutical firms.QualificationsPassionate about learning and evolving with current technological trendsEngineering degree or related technical discipline, or equivalent work experienceExperience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java)Knowledge of Cloud-based applications & Containerization TechnologiesUnderstanding of metric generation, log aggregation, time-series databases, and distributed tracingExperience with industry standards like Terraform, AnsibleFundamentals in Network Design, Cloud architecture, Security, or Computer ScienceAt least 5 years of hands-on experience in Engineering or CloudMinimum 5 years of experience with cloud platforms (e.g., GCP, AWS, Azure)At least 3 years of experience in configuration and maintenance of applications or systems infrastructure for large-scale customer-facing companiesExperience with distributed system design and architectureResponsibilitiesDevelop software solutions to support service delivery processesBuild and manage CI/CD pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering, and auto-remediationInnovate relentlessly to ensure a flawless customer experienceEngage in the lifecycle of services from conception to EOL, including system designProvide consulting and capacity planningDefine and deploy standards related to System Architecture, Service Delivery, metrics, and operational automationSupport services, product, and engineering teams with tooling and frameworks to increase availability and incident responseImprove system performance and efficiency through automation and process refinementCollaborate with engineering teams to deliver reliable systemsIncrease operational efficiency and quality by treating operational challenges as software engineering problemsMentor junior team members and champion Site Reliability EngineeringParticipate in incident response, including on-call dutiesPartner with stakeholders to influence technical and business outcomesBenefitsComprehensive benefits supporting your personal and professional growth, including wellness programs, tuition reimbursement, expense programs, student loan repayment, childcare, and pet insuranceInclusive culture with active employee resource groups and supportive leadershipSalary range: $140,300 to $191,550, with variations based on skills, experience, and locationEligibility for short-term and long-term incentives as part of total compensation #J-18808-Ljbffr



  • San Francisco, United States Alchemy Full time

    Join to apply for the Site Reliability Engineer role at Alchemy Join to apply for the Site Reliability Engineer role at Alchemy Our Mission Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs,...


  • San Francisco, United States Rivago Infotech Inc Full time

    Staff Site Reliability Engineer (SRE) Job Responsibilities As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include: Design, implement, and lead large-scale, cross-functional projects to improve the reliability, performance, and efficiency of our core services and infrastructure (10×...


  • San Francisco, United States Workos Full time

    About WorkOS 🚀WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across North American time zones. We’re well-funded, having raised an $80M Series B. Our fast-growing customer base includes hundreds of rapidly...


  • San Francisco, United States Reddit Full time

    Engineering Manager, Site ReliabilityAs an Engineering Manager for Site Reliability, you will be responsible for ensuring the reliability, performance, efficiency, and resilience of your team's systems and services, as well as working to ensure that the experience of your customers other internal engineering teams steadily improves. This includes...


  • San Francisco, United States Air Apps Full time

    Join to apply for the Site Reliability Engineer (SRE) role at Air AppsJoin to apply for the Site Reliability Engineer (SRE) role at Air AppsGet AI-powered advice on this job and more exclusive features.About Air AppsAt Air Apps, we believe in thinking bigger—and moving faster. We’re a family-founded company on a mission to create the world’s first...


  • San Francisco, United States Runloop Full time

    About Runloop Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform enables teams to experiment, iterate, and deploy their projects without the friction of environment setup and dependencies. We are a...


  • San Francisco, United States SOLANA FOUNDATION Full time

    Our MissionIncrease your chances of reaching the interview stage by reading the complete job description and applying promptly.Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools...


  • San Francisco, United States Runloop AI, Inc Full time

    About Runloop Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform enables teams to experiment, iterate, and deploy their projects without the friction of environment setup and dependencies. We are a...


  • San Francisco, United States Cypress HCM Full time

    Site Reliability EngineerAs a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principles to automate operational tasks, monitor system health, and participate in incident response. This role is execution-focused, supporting...


  • San Francisco, United States ConductorOne Full time

    ConductorOne is the first AI-native identity security platform that protects every identity: human, non-human, and AI. With powerful automation, platform-level AI, and out-of-the-box connectors, it centralizes access visibility, enforces fine-grained controls, enables just-in-time access, and automates user access reviews across all apps. It's easy to use,...