Senior Site Reliability Engineer
1 week ago
Description DESCRIPTION Join EPAM as a Senior Site Reliability Engineer specializing in AWS In this role, you'll ensure fleet services reliability and availability under the SRE model. If you have a good track record of highly scalable, distributed systems projects and previous experience working as an SRE, we'd love to hear from you. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. Responsibilities Collaborating with service teams to improve the reliability and efficiency of workloads and services using SRE practices Organizing and participating in wargames/gamedays and enhancing observability practices and tooling Developing and improving CI/CD processes to enhance release cadence and success Building, consuming toil backlog, automating toilsome tasks, and documenting knowledge and processes Writing code that improves scalability, performance, maintainability, and security Requirements Senior Engineers with a good track record of highly scalable, distributed systems projects in the past 5 years Previous experience working as an SRE and a good understanding of SRE methodologies and philosophies AWS cloud expertise and experience running multi-region workloads Observability experience with distributed services, for example, experience of distributed tracing and similar concepts Strong programming and automation experience: Python, Golang We Offer Career plan and real growth opportunities Unlimited access to LinkedIn learning solutions International Mobility Plan within 25 countries Constant training, mentoring, online corporate courses, eLearning and more English classes with a certified teacher Support for employees initiatives (Algorithms club, toastmasters, agile club and more) Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more) Flexible work schedule and dress code Collaborate in a multicultural environment and share best practices from around the globe Hired directly by EPAM & % under payroll Law benefits (IMSS, INFONAVIT, 25% vacation bonus) Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members) 13 % employee savings fund, capped to the law limit Grocery coupons 30 days December bonus Employee Stock Purchase Plan 12 vacations days plus 4 floating days Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st) Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more Monthly non-taxable amount for the electricity and internet bills
-
Senior Site Reliability Engineer
5 days ago
Remote, Oregon, United States Careviso Full timeSenior Site Reliability EngineerLocation: Remote in the United States About the Role We're looking for a Senior Site Reliability Engineer or DevOps Engineer to join our small but growing infrastructure team. You'll work alongside our existing Site Reliability team to build and maintain the systems that keep our platform reliable, secure, and observable....
-
Senior Site Reliability Engineer
1 week ago
Remote, United States Webflow Full timeAt Webflow, our mission is to bring development superpowers to everyone. Webflow is the leading visual development platform for building powerful websites without writing code. By combining modern web development technologies into one platform, Webflow enables people to build websites visually, saving engineering time, while clean code seamlessly generates...
-
AWS - Site Reliability Engineer
2 days ago
remote, us Epam Full timeDescription DESCRIPTION Join EPAM as an AWS SRE. In this role, you'll collaborate with service teams to improve the reliability and efficiency of workloads and services using SRE practices. If you're a senior engineer with a good track record of highly scalable, distributed systems projects in the past 5 years, we'd love to hear from you. EPAM is a leading...
-
Senior Site Reliability Engineer
5 days ago
remote, us Epam Full timeDescription DESCRIPTION Step into the future of cloud technology as a Senior Site Reliability Engineer specializing in Azure Data DevOps at our innovative IT company. This pivotal role offers the opportunity to design and manage cutting-edge Azure cloud infrastructure, ensuring the seamless performance and reliability of data-intensive applications. If you...
-
Staff Site Reliability Engineer
1 day ago
remote, us Crisis Text Line Full timeCrisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.Our mission is at the intersection of empathy and innovation — we promote mental well-being for people wherever they are.Our vision is an empathetic world...
-
Azure DevOps Site Reliability Engineer
5 days ago
remote, us Epam Full timeDescription DESCRIPTION Are you a skilled Azure DevOps Site Reliability Engineer with a passion for ensuring business continuity and helping businesses always be near their clients? Do you have experience in optimizing and supporting OSDU deployment, performing monitoring including incidents resolution, and suggesting improvements? If so, we have an exciting...
-
Principal Site Reliability Engineer
5 days ago
Remote, Oregon, United States Priority Technology Holdings, LLC Full timeJob title: Principal Site Reliability EngineerReports to: Director, Site Reliability EngineeringDepartment: EngineeringLocation: RemoteGrade: 21About Priority:Priority Technology Holdings, Inc. is a leading financial technology company on a mission to deliver a personalized, easy-to-adopt financial toolset that accelerates cash flow and optimizes working...
-
Remote, United States Upstart Full timeAbout UpstartUpstart is the leading AI lending marketplace partnering with banks and credit unions to expand access to affordable credit. By leveraging Upstart's AI marketplace, Upstart-powered banks and credit unions can have higher approval rates and lower loss rates across races, ages, and genders, while simultaneously delivering the exceptional...
-
Site Reliability Engineer
3 days ago
remote, us Epam Full timeDescription We are seeking a Site Reliability Engineer (Azure) to join our team. #Not found Responsibilities As a Lead Azure SRE, you will be responsible for driving the reliability, performance, and scalability of cloud-based applications and services. Your expertise in Kubernetes, scripting, troubleshooting, and observability will be instrumental in...
-
Senior Software Engineer, Site Reliability
21 hours ago
Remote, Oregon, United States BABYLIST Full timeWho We AreBabylist is the leading registry, e-commerce, and content platform for growing families. More than 9 million people shop with Babylist every year, making it the go-to destination for seamless purchasing, trusted guidance, and expert product recommendations for new parents and the people who love them. What began as a universal registry has grown...