Lead SRE

3 weeks ago


St Albans, United States Paysera Full time

As the Lead SRE Engineer at Paysera, you will be responsible for ensuring the availability, performance, and security of Paysera's IT infrastructure, systems, and applications.You will work closely with our development teams and system administrators to provide guidance and support for designing and deploying applications that meet the high availability and reliability standards of Paysera.The ideal candidate is someone who is passionate about building scalable systems, possesses a deep understanding of system architecture, and is committed to improving uptime and service quality. What you will do:

Design and implement processes that ensure the high availability and performance of Paysera's systems;

Collaborate with the engineering teams to advocate for and implement reliability practices during system design and development;

Establish proactive monitoring systems and practices to detect and prevent potential issues before they escalate;

Analyse system trends and usage to predict potential future issues;

Build and lead the incident management processes;

Lead efforts to quickly resolve any system outages, ensuring minimal impact on customers;

Drive the improvement of Mean Time To Detection (MTTD) and Mean Time To Recovery (MTTR) through effective monitoring, alerting, and response processes;

Set and work towards achieving targets for Mean Time Between Failures (MTTB) and system Service Level Agreements (SLAs) – aiming for an SLA of 99.9% for critical systems;

Regularly review and report on performance metrics, ensuring that systems are consistently meeting set standards and goals;

Conduct post-mortem reviews of any system outages, derive insights, and drive process and tooling improvements;

Foster a culture of continuous improvement within the team and across the organisation. What we expect:

A minimum of 5 years of experience in Site Reliability Engineering, System Administration, Incident management or a closely related field;

Demonstrated experience in designing and managing the reliability of large-scale systems;

Familiarity with modern infrastructure technologies and deployment processes;

Strong proficiency in monitoring tools and methodologies: ELK, Grafana, New Relic, Datadog, Zabbix;

Strong knowledge of networking and security protocols such as TCP/IP, HTTP/S, SSL, TLS, etc;

Strong experience with containerisation technologies such as Docker and Kubernetes;

Strong problem-solving skills with a proactive approach to issue resolution;

Ability to work efficiently under pressure and manage multiple priorities;

Excellent communication skills, with the ability to explain complex technical issues to non-technical stakeholders;

A collaborative team player with a strong desire to mentor and share knowledge;

Fluency in English. For candidates

If you would like to join our team, please send your CV with the subject "Lead SRE" to the email address apply@paysera.com . Only selected candidates will be contacted, but we are grateful to all who send their CV. SALARY Depends on candidate's experience and competence

#J-18808-Ljbffr



  • St Louis, United States The Dignify Solutions LLC Full time

    Support experience for Event Framework/Event Drive Applications/Java/J2EE/Spring/Springboot based applications, cloud based microservices Experience on Logging, Monitoring and Alerting tools like Splunk, Dynatrace, Grafana etc. Creating meaningful metrics of the data available across event based applications, and using the data to debug issues, continued...