Reliability Engineering Lead

4 days ago


New York, New York, United States Trumid Full time
About Us

Trumid is a pioneering fintech that's revolutionizing fixed income trading. Our cutting-edge electronic solutions are empowering us to grow rapidly, and we're seeking exceptional talent to redefine the intersection of technology and finance.

The Opportunity

We're looking for a Lead Site Reliability Engineer (SRE) to ensure our systems' reliability, scalability, and performance as we continue to scale. This role offers a unique opportunity to shape our firm's reliability practices and infrastructure. You'll be instrumental in optimizing our existing infrastructure, implementing new technologies, and enhancing our incident response capabilities.

Key Responsibilities:
  • Transform the SRE function to evolve, simplify, and scale existing solutions.
  • Drive improvements in system reliability, scalability, and performance through innovative solutions and industry best practices.
  • Lead incident response efforts, including troubleshooting, resolution, and conducting post-mortem analysis to prevent future incidents.
  • Automate repetitive tasks to reduce manual intervention and improve operational efficiency.
  • Collaborate closely with software development, DevOps, and infrastructure teams to embed reliability into the development lifecycle.
About You

SRE expert with foundation knowledge of SRE best practices. Demonstrated hands-on experience managing large-scale and highly-available cloud-based systems. Deep understanding of cloud components in at least one of the major cloud providers (e.g., AWS, GCP, Azure), including infrastructure, services, and tooling. Expertise in containerization and orchestration tools (e.g., Docker, Kubernetes) and experience with deployment strategies such as blue-green and canary deployments.

Requirements:
  • Strong scripting and programming skills in Python, Bash, Go, or similar languages.
  • Excellent problem-solving skills, focusing on diagnosing complex issues in large-scale distributed systems.
  • Strong communication and collaboration skills, capable of working effectively with cross-functional teams in a fast-paced environment.
  • Bachelor's degree in computer science (or equivalent) and at least 10 years of professional experience at a fast-paced tech-oriented company.
What We Offer:
  • Highly competitive compensation: $220,000 - $300,000 per year.
  • Fully paid medical, dental, and vision coverage.
  • Remote work options.
  • A team-oriented and collaborative company culture.
  • Equal-opportunity employer.


  • New York, New York, United States Tenth Mountain Full time

    Lead Site Reliability EngineerAt Tenth Mountain, we're committed to helping veterans transition into rewarding civilian careers. As a Lead Site Reliability Engineer, you'll play a critical role in ensuring the reliability and availability of our Payments infrastructure.Key Responsibilities:Provide 24/5 round-the-clock support for the Payments team, covering...


  • New York, New York, United States Capital One Full time

    Job SummaryCapital One is seeking a highly skilled Reliability Engineer to join our team. As a Reliability Engineer, you will be responsible for designing, developing, and implementing technical solutions to ensure the reliability and availability of our systems.You will work closely with cross-functional teams to identify and prioritize opportunities to...


  • New York, New York, United States Capital One Services, LLC Full time

    About the Role:We are seeking a highly skilled Reliability Engineer to join our team at Capital One Services, LLC. As a Reliability Engineer, you will be responsible for designing, developing, testing, implementing, and supporting technical solutions in full-stack development tools and technologies.Key Responsibilities:Collaborate with Agile teams to design,...


  • New York, New York, United States Citadel Enterprise Americas Services LLC Full time

    Job SummaryCitadel Enterprise Americas Services LLC is seeking a skilled Site Reliability Engineer to join our team. As a key member of our technical operations team, you will be responsible for ensuring the reliability and performance of our trading applications. This is a challenging and rewarding role that requires a strong understanding of software...


  • New York, New York, United States Insight Global Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Insight Global. As a Site Reliability Engineer, you will be responsible for ensuring the uptime and reliability of our production and non-production environments. You will work closely with our development teams to build and maintain the infrastructure and applications...


  • New York, New York, United States Palantir Technologies Full time

    A World-Changing CompanyPalantir builds the world's leading software for data-driven decisions and operations.By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.The RoleReliability Engineers are the driving forces of...


  • New York, New York, United States Tik Tok Full time

    About Site Reliability Engineering at TikTokTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. As a Site Reliability Engineer at TikTok, you will play a critical role in ensuring the reliability and scalability of our systems.Responsibilities Develop and maintain automation procedures to...


  • New York, New York, United States Oakland Search Full time

    Senior Site Reliability EngineerAbout the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team in New York City. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining our software systems to ensure high availability, scalability, and performance.Key...


  • New York, New York, United States Citadel Securities Americas Services LLC Full time

    Job SummaryCitadel Securities Americas Services LLC is seeking a skilled Reliability Specialist to join our team. As a key member of our infrastructure support team, you will be responsible for ensuring the smooth operation of our trading applications. This includes collaborating with cross-functional teams to identify and resolve production issues,...


  • New York, New York, United States Podium Full time

    At Podium, our mission is to empower local businesses to succeed. We achieve this by providing a comprehensive platform that streamlines lead conversion, communication, and sales. Our platform, powered by AI and integrations, helps local businesses thrive in a competitive market.Our team is dedicated to fostering a culture that values exceptional talent and...


  • New York, New York, United States Hebbia Full time

    About HebbiaHebbia is a cutting-edge technology company that empowers users to collaborate with AI on each step and validate responses. Our mission is to put capable AI in the hands of 1 billion people by 2030.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to contribute to building systems that optimize the uptime and reliability of...


  • New York, New York, United States Capital One Full time

    About the Role:We are seeking a highly skilled Reliability Engineering Expert to join our team at Capital One. As a Reliability Engineering Expert, you will be responsible for designing, developing, and implementing technical solutions to improve the reliability and scalability of our systems.Key Responsibilities:Collaborate with Agile teams to design,...


  • New York, New York, United States Peloton Full time

    About the RolePeloton is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our platform. You will work closely with our engineering teams to design, implement, and operate scalable systems that meet the needs of our users.Key...


  • New York, New York, United States Huntress Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and scalability of our distributed systems.Your primary focus will be on designing, implementing, and maintaining our cloud infrastructure, ensuring that it meets the needs of...


  • New York, New York, United States City National Bank Full time

    Job SummaryCity National Bank is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and maximum uptime of our systems in the Data Center or Cloud Platform.Key ResponsibilitiesImplement solutions that improve stability, security, scalability,...


  • New York, New York, United States Squarespace Full time

    About the RoleWe're seeking an experienced software engineer to join our Infrastructure Engineering team as a Senior Site Reliability Engineer, Compute. As a key member of our team, you'll play a crucial role in ensuring the reliability and performance of our systems, working closely with product teams to maintain the stability of our hybrid data centers and...


  • New York, New York, United States Tik Tok Full time

    About the RoleTikTok is seeking a skilled Site Reliability Engineer to join our U.S. Data Security team. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our software systems.Responsibilities:Collaborate with infrastructure, product, and platform engineering teams to design and deploy scalable and secure...


  • New York, New York, United States Tik Tok Full time

    Job SummaryTikTok is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the scalability and reliability of our cloud infrastructure. You will work closely with our infrastructure, product, and platform engineering teams to design, deploy, and maintain scalable and secure...


  • New York, New York, United States Clear Corporate Services LLC Full time

    At CLEAR, we're pushing the boundaries of digital and biometric identification, making it easier for our members to navigate the world.We're seeking a Senior Site Reliability Engineer to spearhead our SRE function, driving innovation in our identity platform. This role will involve leading reliability-focused practices, collaborating with the Software...


  • New York, New York, United States Betterment Full time

    About BettermentBetterment is a leading technology-driven financial services company that offers investing and retirement solutions for retail investors and investment advisors, as well as financial wellness solutions for small and medium-sized businesses. Our team is passionate about our mission: making people's lives better.About the RoleAs a Staff Site...