No more applications are being accepted for this job

Site Reliability Engineer

4 weeks ago

Boston, United States Material Bank Full time

--

An SRE is responsible for maintaining reliability. That means facilitating automated, streamlined, and efficient error responses and reducing human error at scale. SREs spend a lot of time removing pain points, configuring internal tools, and setting and testing system benchmarks. They also develop and monitor robust engineering pipelines for everyday product operability.

Site Reliability Engineers at Material Bank are responsible for performance, availability, reliability, efficiency, change management, monitoring, and emergency response of a system. Other core tasks of SREs include:

Establish and Monitor Service-Level Indicators (SLIs) and setting Service-Level Objectives (SLOs)

– SREs facilitate proper SLIs for efficient performance through proper resource utilization, with minimal errors. They also set SLOs for reviewing internal targets, such as high availability.

Risk assessments and error budgeting

– SREs are responsible for establishing the reliability target for systems, even taking measured risks with subsequent product launches.

Monitoring outputs

— Ticketing, logging, and alerts (signifying different levels of needed human actions) are critical tasks for an SRE.

Demand forecasting and capacity planning

– Projects require careful assessments to plan for future demand, outages, and emergencies. An SRE works in conjunction with product heads to perform these tasks.

Collaboration

– SREs must collaborate with many diverse teams, disseminating best practices and reviewing best reliability decisions to make for better cross-departmental product development.

Writing Postmortems

– Postmortem reports help the team learn from incidents to prevent their recurrence.

What you'll do:

Lead the handling of ticket queue (MaterialBank production issues) for AWS and GCP corporate infrastructure requests from team members. This ranges from simple IAM and route 53 DNS requests to designing and deploying new scalable application infrastructure.

Be on an on-call (Opsgenie) rotation to respond to incidents that impact GitLab.com availability, and provide support for service engineers with customer incidents.

Use your on-call shift to prevent incidents from ever happening.

Run our infrastructure with Ansible, Terraform, ArgoCD, Bitbucket CI/CD, and Kubernetes.

Build monitoring with NewRelic that alerts on symptoms rather than on outages.

Document every action so your findings turn into repeatable actions and then into automation.

Improve operational processes (such as deployments and upgrades) to make them as boring as possible.

Design, build and maintain core infrastructure that enables Materialbank scaling to support thousands of concurrent users.

Debug production issues across services and levels of the stack.

Plan the growth of Material Bank’s infrastructure.

What you’ll get from us:

Our people

: If you thrive in an inclusive, innovative, and fast-paced organization, look no further You will get to work alongside some of the brightest minds - Join a genuinely fun and supportive workplace where we keep our employees consistently engaged through internal communication and corporate events

Relaxation and Celebrations

:Generous PTO, Sick Days, Paid National Holidays, and even more (ask us about this when we connect).

Health Benefits

:

We

contribute

to your medical, dental, vision and short-term/long-term

disability plans

and have a strong employee assistance program.

Plan for your Retirement

:

401(k)

eligible

after your first 90 day's employed

Giving Back

: We sponsor multiple events throughout the year to help out our communities. You will receive time off to give back as well.

Growth

: We’ll help you take your career to the next level. We want you to be creative and take initiative which will allow you to grow and create within the company. Most importantly, be the best at what matters

Flexible Work Schedules

: With business units and employees across the globe, Material Technologies has embraced a hybrid working model allowing department leaders to decide on thebest approach for their respective teams, whether that beremote, in person, or a little of both.

#J-18808-Ljbffr

Site Reliability Engineer

7 days ago

Boston, United States Biofourmis Full time

Position Overview: Biofourmis is seeking a talented and experienced Site Reliability Engineer to join our dynamic global team. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and performance of our digital health platform. You will collaborate closely with cross-functional teams to design,...
Engineering Manager

7 days ago

Boston, United States New Balance Full time

Who We Are: Since 1906, New Balance has empowered people through sport and craftsmanship to create positive change in communities around the world. We innovate fearlessly, guided by our core values and driven by the belief that conventions were meant to be challenged. We foster a culture in which every associate feels welcomed and respected, where leaders...
Site Reliability Engineer

6 days ago

Boston, MA, United States Biofourmis Full time

Position Overview: Biofourmis is seeking a talented and experienced Site Reliability Engineer to join our dynamic global team. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and performance of our digital health platform. You will collaborate closely with cross-functional teams to design,...
Site Reliability Engineer

2 days ago

Boston, United States Material Bank Full time

Material Bank is a fast-paced, high-growth technology company and created the world's largest material marketplace for the Architecture and Design industry, providing the fastest and most powerful way to start and manage a design project. Learn more about us at www.materialbank.com or see below. -- An SRE is responsible for maintaining reliability. That...
Site Reliability Engineer

1 day ago

Boston, United States Material Bank Full time

Material Bank is a fast-paced, high-growth technology company and created the world's largest material marketplace for the Architecture and Design industry, providing the fastest and most powerful way to start and manage a design project. Learn more about us at www.materialbank.com or see below. -- An SRE is responsible for maintaining reliability. That...
Senior Site Reliability Engineer

2 days ago

Boston, United States PathAI Full time

Who We Are PathAI is on a mission to improve patient outcomes with AI-powered pathology. We are transforming traditional pathology methods into powerful, new technologies. These innovations in pathology can help accelerate drug development, improve confidence in the accuracy of diagnosis, and get life-saving therapies to patients more quickly. At PathAI,...
Senior Site Reliability Engineer

23 hours ago

Boston, United States PathAI Full time

Who We Are PathAI is on a mission to improve patient outcomes with AI-powered pathology. We are transforming traditional pathology methods into powerful, new technologies. These innovations in pathology can help accelerate drug development, improve confidence in the accuracy of diagnosis, and get life-saving therapies to patients more quickly. At PathAI,...
Reliability Engineer

7 days ago

Boston, United States Sequoia Biotech Consulting Full time

Responsibilities The GxP Reliability Engineer will provide reliability engineering support for all facilities, utilities systems and equipment including analytical instrumentation, R&D lab support equipment and systems. This role will facilitate the deployment of Maintenance and Reliability Best Practices for new and existing equipment, facilities, and...
Reliability Engineer

17 hours ago

Boston, United States Sequoia Biotech Consulting Full time

Responsibilities The GxP Reliability Engineer will provide reliability engineering support for all facilities, utilities systems and equipment including analytical instrumentation, R&D lab support equipment and systems. This role will facilitate the deployment of Maintenance and Reliability Best Practices for new and existing equipment, facilities, and...
Site Reliability Engineer

7 days ago

Boston, United States BlueSkyClarity Full time

Site Reliability Engineer (Kubernetes, Microservices, Operations) Apply Site Reliability Engineer (Kubernetes, Microservices, Operations), Boston, MA, Downtown & Metro West Market Compensation Commensurate with experience, bonus, equity, benefits additional, EOE Candidates must be a U.S. citizen or national, refugee, asylum, or lawful permanent resident. H1b...
Site Reliability Engineer

16 hours ago

Boston, United States BlueSkyClarity Full time

Site Reliability Engineer (Kubernetes, Microservices, Operations) Apply Site Reliability Engineer (Kubernetes, Microservices, Operations), Boston, MA, Downtown & Metro West Market Compensation Commensurate with experience, bonus, equity, benefits additional, EOE Candidates must be a U.S. citizen or national, refugee, asylum, or lawful permanent resident. H1b...
Oracle: Principal Site Reliability Engineer

6 days ago

Boston, MA, United States Soteriare Full time

Apply locations Merrimack, NH Boston, MA time type Full time posted on Posted 5 Days Ago job requisition id 2093756 Job Description: As a member of the TechOps SRE team, you'll work closely with our engineering partners to help enable and drive initiatives from design to implementation. This is a phenomenal opportunity to have a direct impact on the...
Lead Site Reliability Engineer

7 days ago

Boston, United States Dice Full time

Dice is the leading career destination for tech experts at every stage of their careers. Our client, Motion Recruitment Partners, LLC, is seeking the following. Apply via Dice today! We are partnered with a a dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join...
Site Reliability Engineer, Infrastructure- Boston

2 days ago

Boston, United States Tik Tok Full time

Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Why Join Us At TikTok, our people are humble, intelligent, compassionate and creative. We create to...
Site Reliability Engineer, Infrastructure- Boston

24 hours ago

Boston, United States Tik Tok Full time

Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Why Join Us At TikTok, our people are humble, intelligent, compassionate and creative. We create to...
Lead Site Reliability Engineer

7 days ago

Boston, United States Motion Recruitment Partners LLC Full time

We are partnered with aa dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join their grown DevOps team to ensure the reliability and performance of their highly scalable systems. You will work closely with software engineers to automate tooling and migrate...
Lead Site Reliability Engineer

2 weeks ago

Boston, United States Motion Recruitment Full time

We are partnered with a a dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join their grown DevOps team to ensure the reliability and performance of their highly scalable systems. You will work closely with software engineers to automate tooling and migrate...
Lead Site Reliability Engineer

1 day ago

Boston, United States Motion Recruitment Full time

We are partnered with a a dynamic startup poised to revolutionize data management, competing with established players. They are looking for a Senior Site Reliability to join their grown DevOps team to ensure the reliability and performance of their highly scalable systems. You will work closely with software engineers to automate tooling and migrate...
Principal Site Reliability Engineer

2 weeks ago

Boston, United States Apollo Solutions Full time

Principal DevOps Engineer/SRE Apollo Solutions have partnered with a disruptive early stage AI/ML start-up backed by top tier venture capital. In this role, you will be working closely with their founders and founding engineers to ensure fast, secure and reliable features can be delivered as well as building their infrastructure to feature massive...
Principal Site Reliability Engineer

3 weeks ago

Boston, United States Apollo Solutions Full time

Principal DevOps Engineer/SRE Apollo Solutions have partnered with a disruptive early stage AI/ML start-up backed by top tier venture capital. In this role, you will be working closely with their founders and founding engineers to ensure fast, secure and reliable features can be delivered as well as building their infrastructure to feature massive...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer