Site Reliability Engineer 1

2 days ago


Boston, United States WEX, Inc. Full time

About the Team/Role

The WEX Site Reliability Engineering (SRE) team is looking for a motivated and quick-learning Level 1 Site Reliability Engineer to join our growing team. We are passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As a member of the SRE organization, you will support internal stakeholders and Engineering teams, tackling complex challenges and enhancing our engineering teams' and customers' experience. You will have the opportunity to work alongside experienced SREs and gain valuable hands-on experience in a dynamic and supportive environment.

As a Level 1 SRE at WEX, you will:

  1. Learn the fundamentals of SRE: Gain a solid understanding of core SRE principles, including monitoring, incident management, and automation.
  2. Develop basic automation scripts: Use scripting languages like Python or Bash to automate simple tasks and improve operational efficiency.
  3. Triage and resolve incidents: Participate in on-call rotations, assisting with the identification and resolution of incidents under the guidance of senior SREs.
  4. Monitor system health: Utilize monitoring tools to identify and escalate potential issues, ensuring the stability and performance of our systems.
  5. Collaborate with development teams: Work closely with software engineers to understand their systems and provide operational support.
  6. Contribute to documentation: Help maintain and improve internal documentation, including runbooks, knowledge base articles, and playbooks.
  7. Continuously learn and grow: Expand your knowledge of cloud technologies, DevOps practices, and SRE tools through internal and external training opportunities.

How you'll make an impact

  1. Develop a basic understanding of code, networking, operating systems, and storage solutions: You'll be able to identify and troubleshoot common issues related to these areas.
  2. Assist in developing automation and utilizing monitoring tools to ensure system reliability: You'll learn how to use tools to automate tasks and monitor system health.
  3. Participate in incident response and troubleshooting alongside senior SREs: You'll gain experience in identifying, escalating, and resolving incidents.
  4. Participate in 24x7 Site Reliability rotations and escalation workflows with guidance from senior team members: You'll learn how to respond to incidents and escalate issues appropriately.
  5. Learn to identify and address basic performance bottlenecks: This will include understanding code optimization, configuration changes, and infrastructure upgrade recommendations.
  6. Collaborate with development teams to ensure software design meets operational requirements: You'll learn how to communicate effectively with developers and advocate for operational best practices.
  7. Work with development teams to make sure operational needs are met by assisting with support requests from other engineering teams: You'll gain experience in providing support and collaborating with different teams.
  8. Contribute to the continuous improvement of processes and procedures to increase system reliability and efficiency: You'll participate in team discussions and contribute ideas for improvement.
  9. Stay up-to-date with the latest industry trends and technologies: You'll be encouraged to learn new technologies and share your knowledge with the team.

Experience you'll bring

  1. Basic understanding of at least one major programming language: C#, Java, GoLang, Python. You should be able to read and understand code, and write scripts.
  2. Familiarity with a Cloud Computing platform (AWS, Azure, or GCP): You should have a basic understanding of cloud concepts and services.
  3. Strong communication and collaboration skills: You'll be working closely with different teams, so effective communication is essential.
  4. BA/BS degree in Computer Science or related technical field or equivalent job experience: A strong foundation in computer science principles is important.

Nice to have

  1. Basic understanding of infrastructure as code, preferably Terraform: Familiarity with IaC concepts and tools is a plus.
  2. Working knowledge of RESTful APIs: Understanding how APIs work is beneficial.
  3. Exposure to observability and logging technologies: Any experience with monitoring and logging tools is helpful.
  4. Experience with at least one major RDBMS and NoSQL data store: Familiarity with databases is a plus.
  5. Exposure to containerization technologies such as Docker or Kubernetes: Basic knowledge of containers and orchestration is beneficial.
  6. Familiarity with GitOps: Understanding of GitOps principles is helpful.
#J-18808-Ljbffr

  • Boston, United States Space Executive Full time

    My client, a series C Hyper-growth Cyber Security scale-up collaborating with leading brands across the Automotive industry is seeking a Senior Site Reliability Engineer to join their team.This is a permanent role and will be remote. In terms of salary, this will be market leading and additional benefits will include equity and a bonus scheme.This is a...


  • boston, United States Space Executive Full time

    My client, a series C Hyper-growth Cyber Security scale-up collaborating with leading brands across the Automotive industry is seeking a Senior Site Reliability Engineer to join their team.This is a permanent role and will be remote. In terms of salary, this will be market leading and additional benefits will include equity and a bonus scheme.This is a...


  • Boston, United States Space Executive Full time

    My client, a series C Hyper-growth Cyber Security scale-up collaborating with leading brands across the Automotive industry is seeking a Senior Site Reliability Engineer to join their team.This is a permanent role and will be remote. In terms of salary, this will be market leading and additional benefits will include equity and a bonus scheme.This is a...


  • boston, United States Space Executive Full time

    My client, a series C Hyper-growth Cyber Security scale-up collaborating with leading brands across the Automotive industry is seeking a Senior Site Reliability Engineer to join their team.This is a permanent role and will be remote. In terms of salary, this will be market leading and additional benefits will include equity and a bonus scheme.This is a...


  • Boston, Massachusetts, United States Klaviyo Full time

    Klaviyo is committed to empowering creators to own their destiny by making first-party data accessible and actionable like never before. To achieve this goal, we need a talented Site Reliability Engineering Manager to join our team.The Site Reliability Engineering Manager will be responsible for leading a team of Site Reliability Engineers in Klaviyo's...


  • Boston, Massachusetts, United States Klaviyo Full time

    About the RoleWe're seeking a skilled Site Reliability Engineer to join our team at Klaviyo. As a Site Reliability Engineer, you will be responsible for ensuring the availability and scalability of our systems, as well as collaborating with product teams to deliver high-quality software.Key ResponsibilitiesDesign and develop systems and processes to enable...


  • Boston, Massachusetts, United States Klaviyo Full time

    At Klaviyo, we value the unique backgrounds, experiences, and perspectives each team member brings to our workplace every day.We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond traditional job requirements.Want to learn more about life at Klaviyo? Visit our website to see how we empower creators to...


  • Boston, Massachusetts, United States Klaviyo Full time

    Unlock Your Potential as a Senior Site Reliability Engineer at KlaviyoWe're on a mission to empower creators to own their destiny, and we need talented individuals like you to help us achieve it. As a Senior Site Reliability Engineer at Klaviyo, you'll play a critical role in ensuring the reliability, scalability, and security of our platform.Key...


  • Boston, Massachusetts, United States Veradigm Full time

    Transforming Healthcare with VeradigmWelcome to Veradigm, where our mission is to harness the power of research, analytics, and artificial intelligence to develop scalable data-driven solutions that bring significant value to all healthcare stakeholders. As a Senior Site Reliability Engineer, you will be part of a dynamic team that is dedicated to delivering...


  • Boston, Massachusetts, United States Zscaler Full time

    About ZscalerZscaler is a leading cloud security company that provides a secure platform for enterprises to connect users, devices, and applications. As a Staff Site Reliability Engineer - Federal, you will play a critical role in ensuring the security and reliability of our cloud infrastructure.Key ResponsibilitiesOversee operational tasks for FedRAMP cloud...


  • Boston, Massachusetts, United States Oracle Full time

    Job Title: Senior Site Reliability EngineerOracle is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our cloud infrastructure team, you will be responsible for designing, deploying, and optimizing Oracle Health applications.About the RoleThis is a unique opportunity to work with a world-leading cloud solutions...


  • Boston, United States Zscaler Full time

    We're looking for an experienced Staff Site Reliability Engineer (Federal) to join our ZPA team, reporting to the Senior Manager SRE. This role requires Secret Security Clearance that you must maintain throughout employment. An Information Assurance Technician Level 2 Certification is also required, but you can obtain that within your first few weeks of...


  • Boston, United States Zscaler Full time

    Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your...


  • Boston, Massachusetts, United States Zscaler Full time

    About ZscalerZscaler is a leading cloud security company that serves thousands of enterprise customers worldwide, including 40% of Fortune 500 companies. Founded in 2007, Zscaler's mission is to make the cloud a safe place to do business and provide a seamless experience for enterprise users.As the operator of the world's largest security cloud, Zscaler...


  • Boston, Massachusetts, United States Zscaler Full time

    About ZscalerZscaler is a leading provider of cloud-based security solutions, serving thousands of enterprise customers worldwide, including 40% of Fortune 500 companies. Founded in 2007, our mission is to make the cloud a safe and secure place for businesses to operate.As the operator of the world's largest security cloud, Zscaler accelerates digital...


  • Boston, Massachusetts, United States Klaviyo Full time

    We're looking for a skilled Senior Site Reliability Engineer to join our team at Klaviyo. As a key member of our Site Reliability Engineering team, you will be responsible for designing, building, and delivering software to improve the availability, scalability, and efficiency of our services.Key responsibilities include:Designing and developing systems and...

  • Reliability Engineer

    2 weeks ago


    Boston, United States MENTOR Technical Group Corporation Full time

    Mentor Technical Group (MTG) provides a comprehensive portfolio of technical support and solutions for the FDA-regulated industry. As a world leader in life science engineering and technical solutions, MTG has the knowledge and experience to ensure compliance with pharmaceutical, biotechnology, and medical device safety and efficacy guidelines. With offices...


  • Boston, United States ZipRecruiter Full time

    Mentor Technical Group (MTG) provides a comprehensive portfolio of technical support and solutions for the FDA-regulated industry. As a world leader in life science engineering and technical solutions, MTG has the knowledge and experience to ensure compliance with pharmaceutical, biotechnology, and medical device safety and efficacy guidelines. With offices...


  • Boston, Massachusetts, United States Air Space Intelligence Full time

    About Air Space IntelligenceAir Space Intelligence is a software-first aerospace company that develops AI-powered mission control systems to ensure the world's most complex air operations succeed.We serve major U.S. airlines as well as U.S. and allied government organizations.Our software is used in mission-critical operations to provide our partners with a...


  • Boston, Massachusetts, United States WHOOP Full time

    At WHOOP, we're on a mission to unlock human performance. Our innovative data platforms are the game-changing connective tissue flowing vital resources to teams, applications, and insightful solutions that power real-time AI, cutting-edge science, and bold visionary decision-making.As a Data Reliability Engineer at WHOOP, you will play a crucial role in...