Lead Site Reliability Engineer

1 week ago


New York, New York, United States Tenth Mountain Full time
Lead Site Reliability Engineer

At Tenth Mountain, we're committed to helping veterans transition into rewarding civilian careers. As a Lead Site Reliability Engineer, you'll play a critical role in ensuring the reliability and availability of our Payments infrastructure.

Key Responsibilities:
  • Provide 24/5 round-the-clock support for the Payments team, covering multiple regions.
  • Manage and resolve incidents related to the Payments infrastructure, ensuring minimal downtime.
  • Participate in weekend on-call rotations, acknowledging incidents within 10 minutes and responding appropriately.
  • Maintain and manage the Payments IKP infrastructure, ensuring high availability and performance.
  • Implement and enhance monitoring, alerting, and incident response processes.
  • Automate manual tasks to boost efficiency and reduce human error.
Requirements:
  • Proven experience as a Senior DevOps Engineer or Site Reliability Engineer in an Agile environment.
  • Strong background in Linux/Unix systems and Shell Scripting (BASH).
  • Experience with Kubernetes, preferably GKE on-premises.
  • Proficiency in programming with one or more high-level languages, such as Python or Go.
  • Experience with building and managing automated CI/CD pipelines and related tools (GitLab CI/CD, Jenkins).
  • Familiarity with VMware and other virtualization platform technologies.
Preferred Skills:
  • Knowledge of Istio and Anthos Service Mesh.
  • Familiarity with monitoring and logging tools (Splunk, Prometheus, Datadog, Kiali).
  • Kubernetes certification is a plus.
  • Experience with load balancers, reverse proxies (Nginx Controller/Seesaw), and containerization technologies (Docker).
  • Exposure to infrastructure-as-code tools (Terraform) and technologies like OpenTelemetry, OpenMetrics, and Kafka.
Why Tenth Mountain?

We believe in your potential and are committed to helping you transition smoothly into a rewarding civilian career. Join us and be part of a company that values your skills, experience, and dedication.



  • New York, New York, United States Alchemy Full time

    About the RoleAlchemy is seeking a highly skilled Site Reliability Engineer to join our Infrastructure team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our globally used developer platform.Key ResponsibilitiesDesign, deploy, and continuously improve the infrastructure supporting...


  • New York, New York, United States Apollo Solutions Full time

    Site Reliability EngineerApollo Solutions is partnering with a pioneering artificial intelligence business that is revolutionizing the use of AI/ML in gaming and security.The company is working closely with government contracts and gaming console companies and is seeking a Site Reliability Engineer to join their growing team.The Site Reliability Engineer...


  • New York, New York, United States Cynet Systems Full time

    Job Title: Site Reliability EngineerJob Summary:Cynet Systems is seeking a highly skilled Site Reliability Engineer to lead the development and implementation of geospatial application performance monitoring strategies. The ideal candidate will have a strong background in Site Reliability Engineering (SRE) and proven experience in using Dynatrace for...


  • New York, New York, United States Braze Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Braze. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our internal-facing services and platforms.Key ResponsibilitiesPartner with Braze's engineering teams to architect products that effectively utilize...


  • New York, New York, United States Intuit Inc Full time

    Job Title: Site Reliability Engineering ManagerAt Intuit Inc, we're seeking an experienced Site Reliability Engineering Manager to lead our Site Reliability Engineering Team. As a key member of our Engineering organization, you will be responsible for ensuring the reliability, scalability, and performance of our application used by both internal engineers...


  • New York, New York, United States Alchemy Full time

    About the RoleAlchemy is seeking a highly skilled Site Reliability Engineer to join our Infrastructure team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our globally used developer platform.Our mission is to empower builders with the tools they need to create exceptional on-chain products....


  • New York, New York, United States Fourier Ltd Full time

    Site Reliability EngineerFourier Ltd is seeking a skilled Site Reliability Engineer to join our technical operations team. As a Site Reliability Engineer, you will play a critical role in ensuring the superior performance and availability of our production applications throughout the development cycle.Key Responsibilities:Configure and manage multiple...


  • New York, New York, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide.Our mission is to provide a secure and reliable platform for users to express themselves, learn, and be entertained.Role OverviewWe are seeking a skilled Site Reliability Engineer to join our U.S....


  • New York, New York, United States Intuit Inc Full time

    Job OverviewMailchimp is a leading marketing platform for small businesses, empowering millions of customers worldwide to build their brands and grow their companies with a suite of marketing automation, multichannel campaigns, CRM, and analytics tools.Job DescriptionWe are seeking an experienced Engineering Leader to lead our Site Reliability Engineering...


  • New York, New York, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide.Our mission is to provide a secure and reliable platform for users to express themselves, learn, and be entertained.Site Reliability Engineering at TikTokAs a Site Reliability Engineer at TikTok, you...


  • New York, New York, United States Lorven Technologies Full time

    Job Title: Site Reliability EngineerLorven Technologies is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and reliable cloud...


  • New York, New York, United States CapB InfoteK Full time

    Job Title: Site Reliability EngineerAbout the Role:At CapB InfoteK, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:• Develop and build low-level component...


  • New York, New York, United States Lorven Technologies Full time

    Job Title: Site Reliability EngineerLorven Technologies is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly available...


  • New York, New York, United States Lorven Technologies Full time

    Job Title: Site Reliability EngineerLorven Technologies is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain infrastructure automation...


  • New York, New York, United States FLOAT LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Float LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure, enabling our engineering teams to focus on delivering high-quality software to our customers.Key...


  • New York, New York, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Unreal Gigs. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our systems.Key Responsibilities:Design, implement, and maintain scalable infrastructure solutions to support...


  • New York, New York, United States Hudson River Trading Full time

    Job Title: Senior IT Site Reliability EngineerHudson River Trading (HRT) is a leading financial services company that utilizes a scientific approach to trading. We are seeking a highly skilled Senior IT Site Reliability Engineer to join our team.Job Summary:The Senior IT Site Reliability Engineer will be responsible for ensuring the availability and...


  • New York, New York, United States Tik Tok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our AML team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and implement large-scale systems to ensure high availability and scalability.Monitor...


  • New York, New York, United States Radar Full time

    About the RoleWe're seeking a skilled Site Reliability Engineer to join our team at Radar, a leading provider of location infrastructure for every product and service. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and performance of our production infrastructure.Key ResponsibilitiesDesign, implement, and...


  • New York, New York, United States Insight Global Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Insight Global. As a Site Reliability Engineer, you will be responsible for ensuring the uptime and reliability of our production and non-production environments. You will work closely with our development teams to build and maintain the infrastructure and applications...