Site Reliability Engineer

1 week ago


New York, United States Grafbase, Inc. Full time

We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's reliability and scalability.

What you will do:

  • Work across teams to ensure software is developed and deployed for maximum reliability
  • Develop, run and improve processes and tools
  • Build automation to support reliability efforts for all of our production services
  • Join incidents, help solve them, and assist in drafting RCAs and other documentation that are provided directly to customers
About You:
  • You have at least 8+ years of experience working with production systems
  • Experienced in managing large-scale production systems
  • Strong proficiency in the Rust programming language
  • Hands-on experience with containerization technologies like Helm, Docker or Kubernetes
  • Solid experience with cloud platforms such as AWS, Azure, Google Cloud
  • Knowledgeable of network protocols, load balancing, and DNS management
  • Familiar with monitoring and logging tools and best practices
  • Deployed and monitored systems using infrastructure as code
  • Excellent problem-solving and troubleshooting skills


  • New York, United States Apollo Solutions Full time

    Site Reliability Engineer - Web3 Apollo Solutions have partnered with an innovative web3 start-up backed by top tier venture capital with a strong runway. They are looking to revolutionize the way way we with about the application of web3 and have already made significant inroads into the gaming, entertainment and finance industries. In this role, you will...


  • New York, United States Nationstaff Full time

    About This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...


  • New York, United States STAND 8 Technology Services Full time

    STAND 8 provides end to end IT solutions to enterprise partners across the United States and with offices in Los Angeles, New York, New Jersey, Atlanta, and more including internationally in Mexico and India. Are you passionate about Building Management Systems (BMS) and creating best-in-class building environments driven by best-in-class software and...


  • New York, NY, United States Hudson River Trading Full time

    Hudson River Trading (HRT) is looking for a Senior IT Site Reliability Engineer to join our growing IT Solutions Delivery team. This team is responsible for developing and maintaining the corporate productivity stack for the entire firm, both on-prem and in the cloud. As a Senior IT SRE, you will ensure the availability and reliability of systems within this...


  • New York, New York, United States Oscar Health Full time

    About the RoleOscar Health is a cutting-edge health insurance company that's revolutionizing the industry. As a Site Reliability Engineer II, you'll be a key member of our SRE team, responsible for building and maintaining scalable, highly reliable software systems.With a focus on bridging the gap between development and operations teams, you'll work closely...


  • New York, United States Transfinder Full time

     The Junior Site Reliability Engineer (SRE) works to ensure Transfinder provides clients with the best-hosted software experience possible. The SRE works collaboratively between Development and Operations to scale, secure, monitor, and maintain cloud infrastructure for running Transfinder products.? Utilizing AWS and other cloud technologies, this position...


  • New York, United States Cockroach Labs Full time

    Databases are the beating heart of every business in the world. Cockroach Labs is the creator of CockroachDB, the most highly evolved cloud-native, distributed SQL database on the planet that scales fast, survives anything, and thrives anywhere. We created CockroachDB to unshackle teams from the constraints of their database. Join us on our mission to...


  • New York, New York, United States Fidelity Information Services Full time

    Company OverviewFidelity Information Services is a leading provider of financial services and technology solutions. Our mission is to empower our clients with innovative and reliable systems.SalaryThe estimated annual salary for this position is $31,200.Job DescriptionWe are seeking an experienced Site Reliability Engineer to join our team. As a key member...


  • New York, United States Insight Global Full time

    Job DescriptionJob DescriptionJob Description:Our client is looking for 1 remote Site Reliability Engineer to join their engineering organization. They will be responsible for investigating issues within broadcast playout systems and their integration points to find the root cause of problems or systemic issues. They must have at least 2 years of experience...


  • New York, United States Insight Global Full time

    Job DescriptionJob DescriptionJob Description:Our client is looking for 1 remote Site Reliability Engineer to join their engineering organization. They will be responsible for investigating issues within broadcast playout systems and their integration points to find the root cause of problems or systemic issues. They must have at least 2 years of experience...


  • New York, United States MarketAxess Full time

    About Us  MarketAxess is on a journey to digitally transform one of the world’s largest financial markets, enabling the shift from analog, phone-based trading to a fully electronic marketplace. Why does this matter? Because our platform makes trading fixed-income more accessible, ultimately improving transparency, efficiency, and competition in the...


  • New York, United States MarketAxess Full time

    About Us MarketAxess is on a journey to digitally transform one of the world's largest financial markets, enabling the shift from analog, phone-based trading to a fully electronic marketplace. Why does this matter? Because our platform makes trading fixed-income more accessible, ultimately improving transparency, efficiency, and competition in the...


  • New York, New York, United States STAND 8 Technology Services Full time

    At STAND 8 Technology Services, we're on a mission to impact the world positively by creating success through PEOPLE, PROCESS, and TECHNOLOGY.We're seeking a dedicated Site Reliability Engineer with a strong focus on Building Management System (BMS) software, specifically Tridium Niagara. The ideal candidate will have a deep level of experience in setting up...


  • New York, United States Diverse Lynx Full time

    Job Title: SRE - Site Reliability Engineer Location: New York , NY (Onsite ) Full time Opportunity Minimum Experience: 5 - 10 Years Job Description Should be having cloud engineering experience and acting as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, and implementing automated solutions...


  • New York, New York, United States Capital One Full time

    Capital One Reliability Engineer RoleWe are seeking a skilled Lead Reliability Engineer to join our team at Capital One. As a key member of our engineering group, you will play a critical role in designing and implementing reliable systems that meet the needs of our customers.About the JobCollaborate with Agile teams to design, develop, test, implement, and...


  • New York, United States Capital One Full time

    Plano 2 (31062), United States of America, Plano, TexasDirector, Technical Program Management- Site Reliability EngineeringAre you interested in leading programs that deliver on critical business goals and build large scale products & platforms?About Capital One: At Capital One, we're changing banking for good. We were founded on the belief that no one...


  • New Bedford, Massachusetts, United States Global InfoTek Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Global InfoTek, Inc. The ideal candidate will have a strong background in building and maintaining infrastructure as code on large-scale multi-site deployments.Key Responsibilities:Evaluate and assess new ways to scale platform capabilitiesAutomate workflows to push the...


  • New York, New York, United States Capital One Full time

    OverviewCapital One is a leading financial institution seeking a seasoned reliability engineer to drive process improvements and influence the strategic direction of our technology teams.Salary Range:The estimated annual salary for this role in New York City (Hybrid On-Site) is $201,400 - $229,900. Candidates hired to work in other locations will be subject...


  • New York, New York, United States Trumid Full time

    About UsTrumid is a pioneering fintech that's revolutionizing fixed income trading. Our cutting-edge electronic solutions are empowering us to grow rapidly, and we're seeking exceptional talent to redefine the intersection of technology and finance.The OpportunityWe're looking for a Lead Site Reliability Engineer (SRE) to ensure our systems' reliability,...


  • New York, United States mthree Recruiting Portal Full time

    SRE - Leading Investment Bank Market leading investment bank requires a Systems Reliability Engineer join their Reliability & Production Engineering department. This role supports Institutional Securities and Wealth Management brokerage Operations platforms which include diverse technologies hosted by on premises and cloud platforms. The role is expected to...