Site Reliability Engineer

3 weeks ago


New York, United States FLOAT LLC Full time
Who We Are

Float is the world's leading software for teams to plan their time. Launched in 2012, we've grown every year since, and remain proudly independent, self-funded and profitable. As a certified B Corporation, we're committed to making a positive contribution to our team, customers, the environment, and the remote community. We're a team of 50 working 100% remotely who believe in living our Best Work Life. You'll. partner with team members globally, including Australia, Mexico, Italy, Nigeria, Canada, and the USA. Hear what our team has to say by browsing our blog, or reading our Glassdoor reviews. Check out what our customers think of Float from our G2 reviews.

We're on a scale up journey, and we're seeking people who thrive in this stage, given the autonomy, and the opportunity, to do the best work of their career.

Why We're Hiring For This Role

The role of Site Reliability Engineers at Float is to increase the autonomy of the product and engineering teams by growing their capabilities to focus on solving problems. SRE makes sure our engineers get scalable infrastructure to build software on top of, making sure pipelines from idea to customer run smoothly and are easily built upon, and we also deal with broad areas of security around our network and defining internal security policy and practices.

Our goals for the Engineering team are to increase the pace with which they deliver improvements for our customers, provide an increasingly sophisticated and reliable service from our teams, and mitigate external threats as we grow.

You will help us tackle those problems by increasing reliability of our services to support larger clients joining Float, and increasing the robust security systems we've implemented to continue protecting our growing customer base.

Chris Nash, our Team Lead (SRE & QA), explains the important role you will play within our SRE team. Watch this video.

You'll be working asynchronously with a bright, dedicated team from across the globe, with a strong focus on taking complex problems and creating solutions that feel simple and intuitive for our customers.

What You'll Be Responsible For

Early on, you'll jump right into:
  • Continuing to support the regular maintenance of all the engineering systems supporting Float's customers
  • Identifying areas requiring support to scale
  • Identifying areas for improving service resilience, ultimately delivering the ability to be resilient within the product and engineering teams themselves
  • Optimizing our monitoring and observability stack, building on the knowledge to create a standard set of tools and configurations for the product and engineering teams
  • Understanding Float's SLOs in context, and building out SLO patterns and procedures for product and engineering teams
Once you are settled, we expect that you will jump into the following projects:
  • Building a repeatable and trustworthy disaster recovery program using chaos engineering techniques
  • Migrating all of our deployment configurations to a global single source of truth
  • Expanding Float's infrastructure across multiple regions to create a global network
What You'll Need To Be Successful

We want you to love your work and believe that these skills will allow you to succeed in the role.

Applying these skills requires:
  • An excellent understanding of how SRE operates as an enabling team
  • A very good understanding of Service Level Objectives
  • Working experience with Terraform, Bash, and a go-to language which ideally would be one of PHP, NodeJS, Python
  • Experience with Kubernetes and GCP would be highly valued
As a fully remote team, we're looking for someone comfortable with asynchronous communication as the default, which means you have previous remote experience and are comfortable using tools like Slack, Loom, and Linear to communicate as needed. Don't worry-you will have significant deep work time since we have very few meetings.

Why Join Us

Pay for this role is US $167,471 (Level 3). Here's a blog post with more information on how we determine our salaries.

We're a global async remote company with a diverse team of people from all over the world who share a common belief in living our best work life. We believe deeply in the idea of transparency and share our Float Handbook publicly so potential new team members can see first hand our perks & benefits as well as our ways of working. If you feel like you can thrive at Float to do your best work, we would love to hear from you.

Hiring Process For This Role

You'll find a lot of useful information about our interview process and what it's like to join our global team on the Float careers page. The hiring process for this role looks like this:
  • Initial First Meet (20 min): You'll meet with Julia Fulton, Talent Manager, to discuss your interest in the role and review your questions about working at Float.
  • Take-Home Assignment: Candidates that move forward will be invited to complete a take-home assignment for the engineering team to review. This is a 4-hour assignment. Candidates will receive high-level feedback from the hiring team and those that move forward will proceed to the technical interview stage to discuss results further in more detail.
  • Technical Interview (45 min): You'll meet with Chris Nash (Team Lead, SRE & QA) and Bogdan Frunza (Senior SRE) to discuss more about your technical experience. This will be a great opportunity for you to ask any questions and talk about goals for the role.
  • Leadership Interview (45 min): You'll meet with Lars Gelfin (CTO) and Colin Ross (Director of Engineering) to discuss more about your experience. This will be a great opportunity for you to ask any questions and talk about goals for the role.
  • Founder Interview (30 min): You'll meet with Glenn, Float's CEO, to get to know you and see if you have potential to be a great addition to the team.


Note: Industry research shows that women and those in traditionally underrepresented groups generally don't apply to jobs unless they check all the boxes for the role. If you feel strongly that you have what it takes for this role but don't check 100% of the boxes-that's okay-we encourage you to apply anyway and highlight what you can bring to the table.

  • New York, United States Automatic Data Processing Full time

    ADP is hiring a Site Reliability Engineer. Do you thrive in a challenging environment, love production systems, curious by nature with a thirst for pushing the limits? Are you inspired by transformation and making an impact on the lives of millions o Reliability Engineer, Liability, Reliability, Engineer, Reliability, Operations, Manufacturing


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...


  • New York, United States RedTech Recruitment Full time

    Site Reliability Engineer – Graduates consideredWe are excited to be able to offer this Site Reliability Engineer role working for an industry-leading software company. This company has won several awards and is pioneering in their machine learning technology. Founded 8 years ago, with a team of 150 brilliant engineers, they are already renowned as having...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Mondrian Alpha Full time

    An industry leading systematic trading fund is seeking highly skilled Site Reliability Engineers to join a team responsible for engineering and supporting the companies critical infrastructure platforms. This team also handles the centralized development infrastructure and works alongside engineering teams across the business assure the optimal route of...


  • New York, United States ICTerGezocht Full time

    Locatie Amsterdam Vacature in het kort Ever thought of how many people log in to the app or Internet Banking website each month? Over five million! The objective of the Personal Banking Grid is to ensure that each visit is not only secure but also a personal and smooth experience. As a Site Reliability Engineer, you play a key role in this mission. You will...


  • New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States InterEx Group Full time

    Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States Hebbia Full time

    About Hebbia The user interface for AGI - Hebbia is AI that works the way you work. Designed to be generally capable- it can tackle even the most complex tasks, citing answers over any amount of sources. By showing its work, Hebbia empowers users to collaborate with AI on each step and validate responses instead of blindly trusting them. Our mission is to...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in NJContract Duration: Long-term EngagementCompensation: $50 per hourNote: No OPT/CPT candidates will be considered.We are seeking a highly skilled Senior Site Reliability Engineer (SRE) with subject matter expertise. The ideal candidate will possess exceptional communication skills and the...


  • New York, New York, United States Streaming Talent Full time

    Streaming Talent is seeking a highly skilled Site Reliability Engineer to join our client's US team. As a key member of the Site Reliability Team, you will be responsible for ensuring the smooth operation of the company's Content Delivery Network.The ideal candidate will have a strong background in cloud technologies, with experience working with Kubernetes...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-termCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication abilities and the confidence to engage with executive-level teams.Key...


  • New York, United States InterEx Group Full time

    Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication abilities and the confidence to engage with executive-level...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess:Exceptional communication skills, with the ability to engage confidently with...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a seasoned professional with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication skills and the confidence to engage with executive-level...