Staff Site Reliability Engineer

7 days ago

Seattle, Washington, United States Crisis Text Line, Inc. Full time

Job Description Job Description

This is a remote position.

Role

Job Summary: As a Staff Site Reliability Engineer (SRE), reporting to the Senior Engineering Manager of SRE/Infrastructure, you will be a key technical leader ensuring the reliability, scalability, and security of our platform. In this role, you will play a strategic part in architecting, building, and maintaining the tooling that empowers our software engineering teams and managing the infrastructure that supports our staff and volunteers in delivering the Crisis Text Line service. You will collaborate closely with developers to drive performance optimization, implement best practices, and ensure a secure environment. With a significant focus on enhancing engineer productivity through automation and streamlined workflows, you'll directly contribute to our mission of supporting texters, volunteers, and staff. This position requires extensive experience in infrastructure management, automation, and Site Reliability Engineering (SRE) practices.

Key Responsibilities:

Assisting to lead and mentor a team of 5 SREs, fostering a collaborative and innovative work environment.
Working closely with the 3 staff in TechOps/Security on enforcement of security best practices across the infrastructure and development processes.
Design, implement, and maintain our highly available and scalable AWS infrastructure that powers our service.
Collaborate with developers to optimize application performance and reliability.
Develop and maintain monitoring, logging, and alerting systems to ensure system health and performance.
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
Respond to and resolve incidents, minimizing downtime and ensuring quick recovery.
Support and encourage a diversity of backgrounds, voices, and perspectives on the engineering team
Proactively communicate expectations, progress, and issues to engineers, product managers, and other colleagues with clarity and kindness, delivering and receiving feedback respectfully
Spread knowledge, provide mentorship, and promote technical best practices
Learn both independently and from your colleagues, stretch yourself, and grow as an engineer and teammate
Write and review high-quality, easy-to-read, and testable code that follows best practices
Manage time successfully by focusing on priorities, delivering on deadlines, and asking for help when stuck
Providing engineering input and estimating work both during refinement and architecture design.
Participate in retrospectives and post-mortems to improve our processes and operations
Conduct regular security audits and vulnerability assessments, addressing any identified issues.
Stay up-to-date with industry trends and emerging technologies, recommending and implementing improvements as needed.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred).
Proven experience as a Staff SRE or in a similar SRE role, with experience in observability and chose engineering. strong focus on infrastructure and DevOps in a software delivery capacity.
Experience maintaining the reliability of online SaaS/PaaS with a 7/24 schedule.
Proficiency in AWS and infrastructure as code (e.g., Terraform, CloudFormation).
Strong scripting and automation skills and in-depth knowledge of containerization and orchestration (e.g., Docker, Kubernetes).
Proven experience in implementing CI/CD pipelines and tools (Github Actions) and observability tools (Datadog).
A commitment to ethical practices, data privacy, and security.
Solid understanding of network protocols, security principles, and best practices.
Excellent problem-solving skills and the ability to work under pressure, with strong communication skills to collaborate effectively with cross-functional teams.
Ability to learn quickly and manage your time successfully by focusing on priorities, delivering on deadlines, and asking for help when needed.
Strong communication skills, with the ability to collaborate effectively with cross-functional teams.
Demonstrates an understanding of essential computer science principles and how to apply them to solve problems. This including basic data structures, control structures and functions

Preferred Qualifications:

Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
Experience implementing Failure Injection / Chaos Engineering practices.
Cloud Solution Architect certifications or completed training (e.g. AWS Cloud Practitioner Essentials and/or AWS Certified Solutions Architect - Associate) GCP or Azure.
Strong experience with AWS Solution Architecture across various web applications and APIs, Databricks, and AI/ML workloads.
Knowledge of compliance and regulatory standards (e.g., GDPR, HIPAA, ISO 27001, SOC2, etc.).
Experience in a non-profit or mission-driven organization.

Benefits:

Crisis Text Line employee benefits are thoughtfully designed using an equity lens, acknowledging that we are all unique human beings with individual life circumstances that require flexibility and support.

Benefits include:

20 paid holidays including:
- Federal holidays like Juneteenth and Labor Day
- Election day
- Holiday break from Dec 24 through January 1
- 2 renewal days
- 2 floating holidays
Flexible paid time off, including:
- 15 vacation days
- 3 personal days
- 7 sick days
Medical, dental, and vision benefits for the staff member and family at no cost to the employee
403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
12 weeks paid parental leave (after 6 months of employment)
Student loan repayment (after 2 years of continuous full time service)
Family support through a virtual childcare platform
Stipends/Allowances
- Mental health (Monthly)
- Internet Service (Monthly)
- Professional Development (Annual)
- Wellness (Annual)
- Home office setup (One time/First year)

(Benefits are only for US-based employees. International benefits may differ).

RDFzuZFZ1X

Staff Site Reliability Engineer

3 days ago

Seattle, Washington, United States Crisis Text Line, Inc. Full time

This is a remote position.Role Job Summary:As a Staff Site Reliability Engineer (SRE), reporting to the Senior Engineering Manager of SRE/Infrastructure, you will be a key technical leader ensuring the reliability, scalability, and security of our platform.In this role, you will play a strategic part in architecting, building, and maintaining the tooling...
Staff Site Reliability Engineer

1 day ago

Seattle, Washington, United States DAT Freight & Analytics Full time

Get AI-powered advice on this job and more exclusive features.About DATDAT is an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions...
Staff Site Reliability Engineer

3 days ago

Seattle, Washington, United States Dat Services Inc Full time

About DATDATis an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on...
Site Reliability Engineer

3 days ago

Seattle, Washington, United States Sogeti USA Full time

Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experienced with Python and Shell Scripting. Should have extensive experience with Azure or AWS (Azure preferred) Experience with Monitoring and Observability - Datadog Experience with Infrastructure as a Code -...
Site Reliability Engineer

4 weeks ago

Seattle, Washington, United States Sogeti Full time

Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experienced with Python and Shell Scripting. Should have extensive experience with Azure or AWS (Azure preferred) Experience with Monitoring and Observability - Datadog Experience with Infrastructure as a Code -...
Site Reliability Engineer

2 weeks ago

Seattle, Washington, United States HCLTech Full time

Site Reliability Engineer (Architect) Primary skills : looking for expertise in Azure, Linux (RHEL), Azure DevOps, AKS, Terraform, GIT, RHEL-Tuna, Artefactory, Puppet, and ArgoCD. Additionally, a strong understanding of Compute, Network, and Storage is essential.On....
Site Reliability Engineering Lead

7 days ago

Seattle, Washington, United States Apple Full time

The Site Reliability Engineering team at Apple is responsible for ensuring the reliability and scalability of our object storage services. As a Senior Site Reliability Engineer, you will work closely with cross-functional teams to design, develop, and maintain large-scale cloud-based systems.Your primary focus will be on optimizing system performance,...
Site Reliability Engineer

2 weeks ago

Seattle, Washington, United States EVONA Full time

This range is provided by EVONA. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range $180,000.00/yr - $180,000.00/yrHead of DevOps and Cyber Security @ EVONA | Aerospace | Defense | Deep Tech Site Reliability Engineer (SRE)About the RoleAs a Site Reliability Engineer (SRE) , you will be...
Site Reliability Engineer

2 weeks ago

Seattle, Washington, United States Name Tag Full time

Job Title: Site Reliability EngineerLocation: Remote-firstJob Type: Full-TimeSummaryNametag is seeking a skilled Site Reliability Engineer to ensure the reliability, scalability, and security of our comprehensive product stack. This role requires a deep understanding of modern infrastructure, a passion for building reliable systems, and a commitment to...
Staff Site Reliability Engineer, Threat Detection

2 weeks ago

Seattle, Washington, United States Gemini Full time

About the CompanyGemini is a global crypto and Web3 platform founded by Tyler Winklevoss and Cameron Winklevoss in 2014. Gemini offers a wide range of crypto products and services for individuals and institutions in over 70 countries.Crypto is about giving you greater choice, independence, and opportunity. We are here to help you on your journey. We build...
Site Reliability Engineer-Remote

3 weeks ago

Seattle, Washington, United States Georgia IT Inc Full time

Site Reliability Engineer Location - Remote - must be willing to work PST - High preference for someone local to Seattle Duration - 12 months Rate: DOE US Citizens and Green cards & GC-EAD Only. No Third-party C2C available for this job8-10+ years of Site Reliability / DevOps Engineering Experienced with PowerShell Scripting. Should have extensive experience...
Sr Site Reliability Engineer

2 weeks ago

Seattle, Washington, United States Energy Jobline Full time

Job Description Overview: The Senior Site Reliability Engineer plays a critical role in ensuring the reliability, scalability, and performance of our systems and services. They are responsible for designing and implementing tools and automated solutions to improve system reliability, monitoring, and incident response.Key Responsibilities: Develop and...
Compute Site Reliability Engineer

7 days ago

Seattle, Washington, United States Apple Full time

Compute Site Reliability Engineer (SRE) - Kubernetes Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a site reliability engineer to...
Site Reliability Engineering Leader

2 weeks ago

Seattle, Washington, United States Apple Full time

Site Reliability Engineering Leader - Security, Apple Service Engineering People at Apple don't just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here Join Apple, and help us leave the...
Site Reliability Engineering Leader

2 weeks ago

Seattle, Washington, United States Apple Full time

Site Reliability Engineering Leader - Security, Apple Service Engineering People at Apple don't just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here Join Apple, and help us leave the...
Site Reliability Engineer

3 days ago

Seattle, Washington, United States Kanshe Infotech Full time

Job Title: Site Reliability Engineer (SRE) Location: Seattle, WA (Onsite - Relocation Required) About the Role: We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of our systems on the Azure cloud platform. The ideal candidate will possess deep expertise in Azure,...
Site Reliability Engineer

2 weeks ago

Seattle, Washington, United States Tik Tok Full time

Site Reliability Engineer - Video Platform - USDS (SEA) ResponsibilitiesAbout TikTok U.S. Data SecurityTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and...
Site Reliability Engineering Leader

3 days ago

Seattle, Washington, United States Apple Full time

Summary People at Apple don't just build products - they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here Join Apple, and help us leave the world better than we found it.The Apple Services Engineering(ASE) team...
Senior Site Reliability Engineer

4 weeks ago

Seattle, Washington, United States Apple Full time

Senior Site Reliability Engineer - ASE Seattle,Washington,United States Software and Services Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join Apples Cloud Service...
Senior Site Reliability Engineer

7 days ago

Seattle, Washington, United States Apple Full time

SummaryPosted: Aug 21, 2024Role Number: 200564619Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join Apple's Cloud Service Infrastructure team as a site reliability engineer...

Americas

Europe

Asia / Oceania

Africa

Staff Site Reliability Engineer