Principal Site Reliability Engineer

1 week ago


New York, New York, United States Zocdoc Full time
Your Impact on our Mission

Zocdoc is looking for a Principal Site Reliability Engineer to help drive the reliability, resiliency, observability, availability, and scalability of our systems and services. You'll be challenged to drive continuous improvements to uptime and performance for our patients and providers in a constantly evolving environment. You'll work with and provide subject matter expertise for our AWS cloud-based environments, distributed systems, monolith, and microservices. We're looking for someone who loves challenging the status quo and strives to make everything they touch easier, faster, and more robust.

You'll enjoy this role if you are

  • Passionate about ensuring complex systems never skip a beat
  • Pragmatic in your decision making day-to-day
  • Motivated to learn new technologies, design patterns, and work in the cloud
  • Comfortable with failures and outages and believe in blameless post-mortems
  • Excited to work in a highly collaborative environment with diverse individuals
  • Autonomous, individually accountable, and comfortable working in a remote environment
  • A believer that diverse and inclusive teams and cultures are non-negotiable

Your day to day is

  • Analyzing and decomposing complex distributed system challenges to drive sound design decisions targeted toward reliability, availability, scalability, and resiliency
  • Supporting our large product engineering org with their scaling, performance, and uptime needs as well as helping diagnose and debug production related issues
  • Monitoring and maintaining complex cloud-based infrastructure, systems, and services, and ensuring its uptime in order to enable millions of patients to get the care they need
  • Automating and codifying our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof
  • Analyzing and performance tuning systems, code, and networking for scaling and optimal operation
  • Mentoring your peers and colleagues by giving thoughtful feedback, with the notion that helping others means learning and growing yourself

You'll be successful in this role if you have

  • A Bachelor's degree in Computer Science, Computer Engineering, or equivalent engineering experience
  • 8+ years of progressive engineering experience in Site Reliability or adjacent disciplines (DevOps, Platform, Backend Engineering, etc.)
  • 5+ years of experience in deploying, managing, and supporting modern cloud-based environments and infrastructure like AWS/Azure/GCP, Docker, Kubernetes, IaC, etc.
  • 4+ years of production on-call experience in a 24/7 cloud-based environment
  • Exceptional troubleshooting, debugging, and diagnostic skills for cloud and web-based technologies using industry standard observability tooling and frameworks
  • Experience with edge technologies such as load balancers, reverse proxies, web application firewalls, routing, etc.
  • Deep understanding of web applications and ability to troubleshoot HTTP/HTTPS, TLS, DNS, TCP/IP, and similar protocols
Benefits
  • Flexible, hybrid work environment
  • Unlimited PTO
  • 100% paid employee health benefit options
  • Employer funded 401(k) match
  • L&D offerings + a free LinkedIn learning account
  • Corporate wellness programs with Headspace and Peloton
  • Sabbatical leave (for employees with 5+ years of service)
  • Competitive parental leave
  • Cell phone reimbursement
  • In office perks including:
  • Catered lunch everyday along with snacks
  • Commuter Benefits
  • Convenient Soho location


  • New York, New York, United States Apollo Solutions Full time

    Site Reliability Engineer - Web3 Apollo Solutions have partnered with an innovative web3 start-up backed by top tier venture capital with a strong runway. They are looking to revolutionize the way way we with about the application of web3 and have already made significant inroads into the gaming, entertainment and finance industries. In this role, you will...


  • New York, New York, United States Parallel Partners Full time

    Site Reliability Engineer, Python, New York City, NY There are 5 openings available for the Site Reliability Engineer position.These will be an onsite opportunities in either Los Angeles, CA; New York City, NY; or Seattle, WA. Responsibilities: Manage cloud infrastructure, provide resource allocation, system upgrades, user access control etc. Perform deep...


  • New York, New York, United States Citadel Full time

    Job DescriptionAt Citadel, a leading investor in the world's financial markets, we aim to win together as one team to earn the long-term trust of our capital partners and each other. Our collaborative approach allows technologists to grow alongside other team members and execute on big, innovative ideas.Site Reliability Engineers (SREs) combine software and...


  • New York, New York, United States Expensify Full time

    Your Mission, Should You Choose to Accept:Join our passionate team of top-notch engineers to solve a real-world problem, and help people spend less time managing expenses and more time pursuing their real goals. As we revolutionize the way people manage their expenses, being part of the Expensify team means building the easiest, fastest, and most efficient...


  • New York, New York, United States Particle Health Full time

    At Particle Health, our mission is to unlock the power of medical records in an intelligent platform that focuses healthcare back on the patient. Our energy is spent connecting to people's diverse sets of medical data, making that data useful in different settings, and designing an effortless way to share that information with any organization a person...


  • New York, New York, United States Particle Health Full time

    At Particle Health, our mission is to unlock the power of medical records in an intelligent platform that focuses healthcare back on the patient. Our energy is spent connecting to people's diverse sets of medical data, making that data useful in different settings, and designing an effortless way to share that information with any organization a person...


  • New York, New York, United States Nillion Full time

    Nillion is humanity's first Blind Computer. It is powered by a decentralized network of nodes that enables "Blind Computation" through the coordination and orchestration of privacy enhancing technologies (PETs) such as multi-party computation (MPC) and homomorphic encryption (HE). Nillion believes Blind Computation will become the internet's base layer for...


  • New York, New York, United States Diverse Lynx Full time

    Position: SRE - Site Reliability Engineer Location: New York Type: FulltimeJOB DESCRIPTIONShould be having cloud engineering experience and acting as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, and implementing automated solutions to reduce TOIL. Good knowledge on GCP Hands on in defining...


  • New York, New York, United States IFF Family of Companies Full time

    Requisition ID : 503479 Job Description IFF in Newark, DE is seeking a Maintenance and Reliability Engineer to join our team The Newark Site is aligned to the pharma solutions business and is a leading manufacturer of microcrystalline cellulose and other pharmaceutical business excipient grade materials. Our site is also a key supplier to the Food & Beverage...


  • New York, New York, United States Pfizer Full time

    The Intern Site Reliability Engineer will be in a agile cross-functional team supporting different products of our organization. The main focus areas of the role include Cloud Infrastructure, CI/CD pipelines, Monitoring & Observability, Automation, Code Quality and Security. BASIC QUALIFICATIONS Cloud (AWS) Containers (Docker, Kubernetes) CI/CD (GitHub...

  • Principal Engineer

    1 week ago


    New York, New York, United States Fanatics Full time

    Job DescriptionOverview As Principal Engineer at Fanatics Betting & Gaming (FBG), you are here to help build out our Data Platform Engineering team. This role reports to our Director of Platform Engineering and is responsible for defining, implementing, training, and executing against our engineering strategy, creating processes, and building tools within...


  • New York, New York, United States Justworks Full time

    Who We AreAt Justworks, you'll enjoy a welcoming and casual environment, great benefits, wellness program offerings, company retreats, and the ability to interact with and learn from leaders in the startup community. We work hard and care about our most prized asset - our people.We're helping businesses get off the ground by enabling them to focus on running...


  • New York, New York, United States Justworks Full time

    Who We AreAt Justworks, you'll enjoy a welcoming and casual environment, great benefits, wellness program offerings, company retreats, and the ability to interact with and learn from leaders in the startup community. We work hard and care about our most prized asset - our people.We're helping businesses get off the ground by enabling them to focus on running...

  • Reliability Engineer

    1 month ago


    New York, New York, United States Dupont Full time

    En DuPont, trabajamos en cosas que importan, ya sea en proporcionar agua limpia a más de mil millones de personas en el planeta, producir materiales esenciales en los dispositivos tecnológicos cotidianos (desde smartphones hasta vehículos eléctricos) o proteger a los trabajadores de todo el mundo. Si deseas ser parte de una empresa multindustrial líder...

  • Reliability Engineer

    4 weeks ago


    New York, New York, United States Dupont Full time

    En DuPont, trabajamos en cosas que importan, ya sea en proporcionar agua limpia a más de mil millones de personas en el planeta, producir materiales esenciales en los dispositivos tecnológicos cotidianos (desde smartphones hasta vehículos eléctricos) o proteger a los trabajadores de todo el mundo. Si deseas ser parte de una empresa multindustrial líder...


  • New York, New York, United States PulsePoint Full time

    DescriptionSenior SRE Job DescriptionThere's likely a reason you've taken the time out of your busy day to review this opportunity at PulsePoint. Maybe you're in need of a change or there's "an itch you're looking to scratch." Whatever the reason, ask yourself the following questions:Do you want to join a company that takes pride in the work they do?Do you...


  • New York, New York, United States Your IT & Corporate Recruiter Full time

    Job DescriptionYour IT Recruiter is looking for a Principal Electrical Engineer for our client.Are you a visionary Electrical Engineer with a passion for innovation and a proven track record of successful project leadership? We are seeking a highly skilled and experienced Principal Level Engineer to join our client's dynamic team. As a Principal Engineer,...


  • New York, New York, United States Jones Lange Lasalle, Inc. Full time

    JLL supports the Whole You, personally and professionally. Our people at JLL are shaping the future of real estate for a better world by combining world class services, advisory and technology to our clients. We are committed to hiring the best, most Reliability Engineer, Liability, Reliability, Continuous Improvement, Reliability, Engineer, Property...


  • New York, New York, United States The Alexander Group Human Resource Consultants, INC. Full time

    Are you a visionary Electrical Engineer with a passion for innovation and a proven track record of successful project leadership? We are seeking a highly skilled and experienced Principal Level Engineer to join our client's dynamic team. As a Principal Engineer, you will play a pivotal role in providing technical expertise and leadership to their Building...


  • New York, New York, United States EvolutionIQ Full time

    About Us: EvolutionIQ's mission is to improve the lives of injured and disabled workers and enable them to return to the workforce, saving billions of dollars in avoidable costs and lost productivity to the US and global economies and make insurance more affordable for everyone. We are currently experiencing massive growth and to accomplish our goals, we are...