Senior Site Reliability Engineer

1 week ago


remote us Epam Full time

Description DESCRIPTION Join EPAM as a Senior Site Reliability Engineer specializing in AWS In this role, you'll ensure fleet services reliability and availability under the SRE model. If you have a good track record of highly scalable, distributed systems projects and previous experience working as an SRE, we'd love to hear from you. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. Responsibilities Collaborating with service teams to improve the reliability and efficiency of workloads and services using SRE practices Organizing and participating in wargames/gamedays and enhancing observability practices and tooling Developing and improving CI/CD processes to enhance release cadence and success Building, consuming toil backlog, automating toilsome tasks, and documenting knowledge and processes Writing code that improves scalability, performance, maintainability, and security Requirements Senior Engineers with a good track record of highly scalable, distributed systems projects in the past 5 years Previous experience working as an SRE and a good understanding of SRE methodologies and philosophies AWS cloud expertise and experience running multi-region workloads Observability experience with distributed services, for example, experience of distributed tracing and similar concepts Strong programming and automation experience: Python, Golang We Offer Career plan and real growth opportunities Unlimited access to LinkedIn learning solutions International Mobility Plan within 25 countries Constant training, mentoring, online corporate courses, eLearning and more English classes with a certified teacher Support for employees initiatives (Algorithms club, toastmasters, agile club and more) Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more) Flexible work schedule and dress code Collaborate in a multicultural environment and share best practices from around the globe Hired directly by EPAM & % under payroll Law benefits (IMSS, INFONAVIT, 25% vacation bonus) Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members) 13 % employee savings fund, capped to the law limit Grocery coupons 30 days December bonus Employee Stock Purchase Plan 12 vacations days plus 4 floating days Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st) Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more Monthly non-taxable amount for the electricity and internet bills



  • Remote, Oregon, United States Careviso Full time

    Senior Site Reliability EngineerLocation: Remote in the United States About the Role We're looking for a Senior Site Reliability Engineer or DevOps Engineer to join our small but growing infrastructure team. You'll work alongside our existing Site Reliability team to build and maintain the systems that keep our platform reliable, secure, and observable....


  • Remote, United States Webflow Full time

    At Webflow, our mission is to bring development superpowers to everyone. Webflow is the leading visual development platform for building powerful websites without writing code. By combining modern web development technologies into one platform, Webflow enables people to build websites visually, saving engineering time, while clean code seamlessly generates...


  • remote, us Epam Full time

    Description DESCRIPTION Join EPAM as an AWS SRE. In this role, you'll collaborate with service teams to improve the reliability and efficiency of workloads and services using SRE practices. If you're a senior engineer with a good track record of highly scalable, distributed systems projects in the past 5 years, we'd love to hear from you. EPAM is a leading...


  • remote, us Epam Full time

    Description DESCRIPTION Step into the future of cloud technology as a Senior Site Reliability Engineer specializing in Azure Data DevOps at our innovative IT company. This pivotal role offers the opportunity to design and manage cutting-edge Azure cloud infrastructure, ensuring the seamless performance and reliability of data-intensive applications. If you...


  • remote, us Crisis Text Line Full time

    Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.Our mission is at the intersection of empathy and innovation — we promote mental well-being for people wherever they are.Our vision is an empathetic world...


  • remote, us Epam Full time

    Description DESCRIPTION Are you a skilled Azure DevOps Site Reliability Engineer with a passion for ensuring business continuity and helping businesses always be near their clients? Do you have experience in optimizing and supporting OSDU deployment, performing monitoring including incidents resolution, and suggesting improvements? If so, we have an exciting...


  • Remote, Oregon, United States Priority Technology Holdings, LLC Full time

    Job title: Principal Site Reliability EngineerReports to: Director, Site Reliability EngineeringDepartment: EngineeringLocation: RemoteGrade: 21About Priority:Priority Technology Holdings, Inc. is a leading financial technology company on a mission to deliver a personalized, easy-to-adopt financial toolset that accelerates cash flow and optimizes working...


  • Remote, United States Upstart Full time

    About UpstartUpstart is the leading AI lending marketplace partnering with banks and credit unions to expand access to affordable credit. By leveraging Upstart's AI marketplace, Upstart-powered banks and credit unions can have higher approval rates and lower loss rates across races, ages, and genders, while simultaneously delivering the exceptional...


  • remote, us Epam Full time

    Description We are seeking a Site Reliability Engineer (Azure) to join our team. #Not found Responsibilities As a Lead Azure SRE, you will be responsible for driving the reliability, performance, and scalability of cloud-based applications and services. Your expertise in Kubernetes, scripting, troubleshooting, and observability will be instrumental in...


  • Remote, Oregon, United States BABYLIST Full time

    Who We AreBabylist is the leading registry, e-commerce, and content platform for growing families. More than 9 million people shop with Babylist every year, making it the go-to destination for seamless purchasing, trusted guidance, and expert product recommendations for new parents and the people who love them. What began as a universal registry has grown...