Cloud Operations Reliability Engineer

1 week ago


New York, New York, United States CLS Group Full time

About CLS Group

CLS Group is a pivotal entity within the global foreign exchange (FX) ecosystem, serving thousands of counterparties and enhancing the safety, efficiency, and cost-effectiveness of FX transactions. Our robust global settlement infrastructure minimizes systemic risk and standardizes processes for participants in the most actively traded currencies worldwide. By employing multilateral netting, we significantly reduce funding requirements, allowing clients to optimize their capital and resources.

Our suite of products empowers clients to effectively manage risk throughout the entire FX lifecycle, utilizing advanced processing tools and market intelligence derived from the largest source of executed FX data available.

At CLS, we are committed to making a positive impact, starting with our workforce. Our core values – Protect, Improve, Grow – guide our operations and foster a supportive, inclusive work environment that encourages innovation and openness.

Role Overview

The primary responsibility of this position is to ensure the application of Site Reliability Engineering (SRE) principles within our cloud-hosted environment. Additionally, this role will serve as a key resource for SRE automation within the Platform Operations team.

Key Responsibilities

  • Implement SRE methodologies in the cloud environment, focusing on the automation of repetitive tasks and the definition and execution of Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
  • Establish SRE practices within the Cloud team, collaborating closely with Infrastructure Engineering to enhance observability and telemetry, ensuring that cloud services are equipped with the necessary service metrics and monitoring.
  • Develop GitOps practices for the cloud environment using tools such as Terraform and Ansible, acting as a liaison between Engineering and Cloud Operations to fully integrate Infrastructure as Code for all new cloud deployments.
  • Provide escalation support for cloud and automation-related issues, prioritizing production stability at all times.
  • Identify and address risks and stability concerns in the cloud environment through SRE best practices, contributing to incident postmortems.

Educational Qualifications

  • Bachelor's degree or equivalent experience.
  • Preferred industry-standard IT certifications, such as AWS, Microsoft, VMware, or Redhat Linux.

Experience Requirements

  • Proven technical operational support experience within an infrastructure services team, ideally in cloud-hosted or on-premise environments.
  • Strong knowledge of automation technologies, particularly Terraform and Ansible, with the ability to implement Infrastructure as Code through GitOps methodologies.
  • Familiarity with at least one scripting language, preferably Python or PowerShell.
  • A minimum of two years of experience applying SRE methodologies within a support team, with an understanding of associated service level metrics.
  • Experience with Application Performance Monitoring (APM) tools such as Grafana, Datadog, or Dynatrace.
  • Background in regulated financial services or banking organizations is advantageous.

Special Skills and Knowledge

  • Ability to understand and utilize at least one cloud service platform, such as AWS or Azure.
  • Strong service-oriented mindset, consistently delivering high-quality service to the business.
  • Effective communication skills with both technical and non-technical stakeholders at all levels.
  • Proactive approach with the ability to provide regular updates to management and stakeholders.


  • New York, New York, United States Diverse Lynx Full time

    About the Role:Diverse Lynx is seeking a highly skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the reliability and efficiency of our cloud-based systems.Key Responsibilities:Design and implement automated workflows to reduce TOIL and improve system reliabilityDevelop and maintain...


  • New York, New York, United States Diverse Lynx Full time

    About the Role:Diverse Lynx is seeking a highly skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the reliability and efficiency of our cloud-based systems.Key Responsibilities:Design and implement automated workflows to reduce TOIL and improve system reliabilityDevelop and maintain...


  • New York, New York, United States CLS Group Full time

    About CLS GroupCLS Group stands as a pivotal entity within the global foreign exchange (FX) ecosystem. Trusted by numerous counterparties, CLS enhances the safety, efficiency, and cost-effectiveness of FX transactions. Each day, trillions of dollars in currency are processed through our advanced systems.Our globally recognized settlement infrastructure,...


  • New York, New York, United States CLS Group Full time

    About CLS GroupCLS Group is a pivotal entity within the global foreign exchange (FX) ecosystem, serving a multitude of counterparties. Our systems facilitate the secure, efficient, and cost-effective flow of trillions of dollars in currency transactions daily.Designed by the market for the market, our unparalleled global settlement infrastructure mitigates...


  • New York, New York, United States CLS Group Full time

    About CLS GroupCLS Group is a pivotal entity within the global foreign exchange (FX) ecosystem, facilitating secure and efficient currency transactions for numerous counterparties. Our systems handle trillions of dollars in currency flows daily, enhancing the safety and cost-effectiveness of FX operations.Our state-of-the-art global settlement...


  • New York, New York, United States CLS Group Full time

    About CLS GroupCLS Group stands as a pivotal entity within the global foreign exchange (FX) ecosystem. Our services are leveraged by numerous counterparties, ensuring that FX transactions are executed with enhanced safety, efficiency, and cost-effectiveness. Each day, trillions of dollars in currency transactions flow through our robust systems.Designed by...


  • New York, New York, United States CLS Group Full time

    About CLS GroupCLS Group stands as a pivotal entity within the global foreign exchange (FX) landscape. Serving a multitude of counterparties, CLS enhances the safety, efficiency, and cost-effectiveness of FX transactions. Our systems facilitate the movement of trillions of dollars in currency daily.Developed by the market for the market, our unparalleled...


  • New York, New York, United States Celonis GmbH Full time

    We're Celonis, the global leader in Process Mining technology and one of the world's fastest-growing SaaS firms. We believe there is a massive opportunity to unlock productivity by placing data and intelligence at the core of business processes - and for that, we need you to join us.**The Team:**Site Reliability Engineering**The Role:**+ You will be part of...


  • New York, New York, United States Kyndryl Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Cloud Infrastructure team at Kyndryl. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based services.Key ResponsibilitiesDesign and Implement Monitoring and Logging Systems: Develop and...


  • New York, New York, United States Celonis Full time

    About the Team:Celonis' Site Reliability Engineering team is a highly technical, collaborative, and creative group focused on ensuring the reliability and scalability of our cloud-based applications and platforms.About the Role:You will be responsible for designing, implementing, and managing cloud-based FedRAMP-compliant applications and platforms, ensuring...


  • New York, New York, United States Betterment Full time

    About the RoleWe are seeking a highly skilled Cloud Reliability Engineer to join our team at Betterment. As a Staff Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based systems.Key ResponsibilitiesDesign and implement scalable and reliable cloud native solutions using AWSDevelop...


  • New York, New York, United States Hebbia Full time

    About HebbiaHebbia is a cutting-edge technology company that specializes in developing Artificial General Intelligence (AGI) solutions. Our mission is to empower users to collaborate with AI on complex tasks and validate responses, rather than blindly trusting them.Job DescriptionAs a highly skilled Site Reliability Engineer, you will play a critical role in...


  • New York, New York, United States Talented Hires Full time

    About Talented HiresTalented Hires is a dynamic and ambitious Series A startup leading the charge in generative AI for language processing. Our vision is to revolutionize how machines understand and generate human language, unlocking new possibilities for communication and interaction.Why Work with UsInnovative Projects: Engage in cutting-edge AI projects...


  • New York, New York, United States Russell Tobin & Associates Full time

    Job Description:As a Site Reliability Engineer at Russell Tobin & Associates, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. We are seeking a highly skilled and experienced engineer to join our team and contribute to the design, implementation, and maintenance of our cloud-based systems.Key...


  • New York, New York, United States Russell Tobin & Associates Full time

    Job Description:As a Site Reliability Engineer at Russell Tobin & Associates, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. We are seeking a highly skilled and experienced engineer to join our team and contribute to the design, implementation, and maintenance of our cloud-based systems.Key...


  • New York, New York, United States Vodastra Full time

    Job Title: Cloud Infrastructure ManagerCompany: VodastraRole Overview:The successful candidate will be responsible for overseeing the management of cloud infrastructure, ensuring optimal resource distribution, executing system enhancements, and regulating user access protocols.Perform comprehensive analyses of various system challenges, including software...


  • New York, New York, United States Quality Healthcare Staffing Full time

    Position Title: Lead Cloud Operations EngineerLocation: RemoteWork Schedule: Monday to Friday; 9:00 am to 6:00 pmCompensation: $120 to $125 per hourRole Overview:We are in search of a highly experienced Lead Cloud Operations Engineer to enhance and sustain current infrastructures while innovating new deployment strategies for Quality Healthcare Staffing...


  • New York, New York, United States Betterment Full time

    About the RoleBetterment is seeking a highly skilled Staff Site Reliability Engineer to join our team. As a Staff Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based systems.Key ResponsibilitiesDesign and implement scalable and reliable cloud-based systems using AWS, Docker,...


  • New York, New York, United States Diverse Lynx Full time

    Cloud Operations EngineerDiverse Lynx is seeking a talented Cloud Operations Engineer to enhance our team. This full-time role emphasizes expertise in cloud infrastructure and operational efficiency.The ideal candidate will be responsible for:In-depth knowledge of Google Cloud Platform (GCP)Expertise in establishing and managing Customer User Journeys (CUJ),...


  • New York, New York, United States Canoe Intelligence Full time

    Job OverviewCompany: Canoe IntelligencePosition: Lead Cloud Operations EngineerLocation: Flexible (Remote or Hybrid)Compensation: Competitive salary based on experienceRole Summary:As a Lead Cloud Operations Engineer, you will play a pivotal role in our internal platform team, collaborating with fellow DevOps and Software Engineers to develop and sustain the...