Site Reliability Engineer

2 weeks ago

Town of Poland, United States XM Full time

Site Reliability Engineers (SRE) - Multiple Openings The Role: You will join a team working with Observability, Escalations, Post-mortems, Correction of Errors, and other practices that will contribute to the company's goal of cloud resiliency. You will be responsible for driving processes around reliability, best practices, cultural change, and enforcement of these practices. Main responsibilities of the position include: Honor and practice the Resiliency pillar of the Well Architected Framework in all tasks and responsibilities Conduct Chaos Engineering experiments and relevant exercises to improve resiliency and fault-tolerance Research workloads for migrating to the cloud with minimal disruption and impact Monitor cloud migration projects to ensure seamless transitions Design, consult, re-platform, and re-factor the observability of current cloud infrastructure Coordinate with other IT departments and teams regarding observability for both individual and organizational needs Regularly assess cloud deployments for compliance with the company’s standards and best practices Investigate and correct areas where observability is lagging Stay up to date and provide training on new and current technologies, services, tools, methodologies, and practices Occasionally participate in service capacity planning, software performance analysis, and system tuning Mentor colleagues in technical skills and knowledge Analyze, oversee, and remediate the company’s resiliency Participate in on-call support 24/7 based on a rotation schedule Main requirements: BSc/MSc degree in Computer Science or related field 5+ years of cloud services experience, with at least 3 years on AWS cloud 3+ years of experience in SRE or a similar role Experience with monitoring, APM, logging, and notification tools Familiarity with incident, problem and change management procedures and practices Advanced knowledge of SRE practices and methods Understanding and practice of Service Levels Strong troubleshooting skills and the ability to mentor others Extensive experience with Kubernetes and related technologies, services, and ecosystem Advanced knowledge of CI/CD, Infrastructure as Code (IaC) concepts and tools, especially HCL Terraform and AWS CloudFormation Experience with versioning tools like Git Strong organizational and documentation skills Exceptional time management and research abilities Advanced Linux, networking, and scripting skills The following will be considered an advantage: Experience with platforms like Kafka (MSK) Experience with RDBMSs, particularly Postgres and MySQL Knowledge of scripting languages such as Python or Go Benefit from: Attractive remuneration package and perks Intellectually stimulating work environment Continuous personal development and international training opportunities The Hiring Experience: What Awaits You Show Your Skills – Online Technical Challenge Let’s Connect – Intro Chat with Talent Acquisition Deep Dive – First Interview with Your Future Team Final Connection – Final Interview All applications will be treated with strict confidentiality Seniority level Mid-Senior level Employment type Full-time Job function Information Technology #J-18808-Ljbffr

Site Reliability Engineer

3 weeks ago

Town of Poland, United States DevOps projects Full time

Site Reliability EngineerJob OverviewAs a Site Reliability Engineer (SRE) at Ververica, you will design, provision, and maintain the infrastructure for Ververica’s Unified Streaming Data Platform across multiple cloud providers, including AWS, GCP, and Azure. Your role will involve architectural improvements, implementation ownership, and driving...
Site Reliability Engineer

3 weeks ago

Town of Poland, United States Mirantis Full time

About Mirantis Mirantis is a Kubernetes-native AI infrastructure company that enables organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data‑intensive applications. By combining open‑source innovation with deep expertise in Kubernetes orchestration, Mirantis empowers platform...
Site Reliability Engineer

3 weeks ago

Town of Poland, United States E-Solutions Full time

Site Reliability Engineer Build and maintain SRE dashboards using SLIs to measure and monitor SLO adherence. Define and implement auto-healing, resilient, and fault-tolerant systems from design through production. Serve as the primary contact for production application issues, coordinating with engineering teams to resolve incidents efficiently. Diagnose and...
Lead Site Reliability Engineer with Dynatrace

3 weeks ago

Town of Poland, United States EPAM Systems Full time

Lead Site Reliability Engineer with Dynatrace 2 days ago Be among the first 25 applicants We are seeking a Lead Site Reliability Engineer to enhance and migrate observability solutions using Dynatrace. You will play a key role in establishing advanced monitoring frameworks and deploying AI‑driven anomaly detection to improve system reliability. This...
Senior Site Reliability Engineer Krakow

1 week ago

Town of Poland, United States VGW Malta Limited Full time

VGW is an interactive entertainment company, harnessing technology and creativity to deliver world-class, free-to-play online social games.We have an exciting opportunity to join our Engineering team in Poland and are currently looking for a Senior Site Reliability Engineer to join the team.You'll focus on ensuring the reliability of our systems as we bring...
Lead Site Reliability Engineer

3 weeks ago

Town of Poland, United States Coupa Software, Inc. Full time

Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter,...
Sr. Site Reliability Engineer, 100% Remote Work

3 weeks ago

Town of Poland, United States PRIMUS Global Technologies Pvt Ltd Full time

Sr. Site Reliability Engineer, 100% Remote Work (Poland) 4 days ago Be among the first 25 applicants Sr. Site Reliability Engineer, 100% Remote Work 6 months contract to hire Bill Rate: $49.00/hr. USD (From Apex to PRIMUS US) – Cannot go above this bill rate Client is ABBYY Interview Process: 2 Technical Video Interview IMP NOTE: Candidates must be in...
Site Reliability Engineering Lead

2 weeks ago

Town of Charlotte, United States National Black MBA Association Full time

Job Description At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day. Being a Great Place to Work is core to how we drive Responsible Growth. This includes our...
Site Reliability Engineer

2 weeks ago

Town of Belgium, United States Intigriti Full time

Your mission As a Site Reliability Engineer, you are part of the Product & Engineering team at Intigriti. In your day-to-day activities, you ensure the continuous availability of our development pipeline and cloud infrastructure. In a proactive way, you safeguard our cloud environment by analyzing, implementing, and delivering qualitative ‘cloud...
Blockchain Site Reliability Engineer

3 weeks ago

Town of Texas, United States InfStones Full time

Blockchain Site Reliability Engineer Location: Dallas, TX, USA (Remote Acceptable - USA Applicants Only) Company: InfStones (https://infstones.com/) Contact: recruiter-usa@infstones.com About Company InfStones is an advanced, enterprise-grade Platform as a Service (PaaS) blockchain infrastructure provider trusted by the top blockchain companies in the world....

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer