Infrastructure Reliability Specialist

2 weeks ago


New York, New York, United States FILD Search, LLC Full time

About the Role:

Are you an Infrastructure Reliability Specialist dedicated to enhancing user experiences for a vast audience? Do you thrive on ensuring system uptime, high availability, and effective disaster recovery for international platforms? If so, this opportunity may be for you.

We are collaborating with a leading entity in the sports and entertainment sector to expand their technology team. We are seeking a Global Platform Reliability Engineer to be part of an elite infrastructure group. This role offers the chance to contribute to one of the most recognized platforms in the industry.

As a Global Platform Reliability Engineer, you will manage extensive data across various domains including sports, content, marketing, ticketing, finance, and media. The organization is experiencing significant growth and is focused on expanding its consumer reach, emphasizing data integrity, uptime, and robust disaster recovery strategies while pursuing innovative marketing initiatives and enhancing brand partnerships.

Key Responsibilities:

  • Oversee high-traffic, large-scale global platforms that encompass diverse services from content delivery to media streaming.
  • Collaborate with teams to maintain uptime, ensure high availability, and manage disaster recovery protocols.
  • Support incident response efforts effectively.
  • Work alongside teams to safeguard the availability, security, and integrity of services.
  • Create and manage dashboards and reports to monitor database performance and health.
  • Develop tools for monitoring and alerting to identify error conditions and service degradation.
  • Establish Service Level Indicators (SLIs) and Service Level Objectives (SLOs), implement observability tools, and optimize for cost efficiency.
  • Assist in troubleshooting production issues across various services and layers of the stack.
  • Identify opportunities for automation and self-service in infrastructure and database operations.
  • Lead critical projects and initiatives, taking ownership of their success.
  • Collaborate with a team of exceptional engineers and technologists.

Qualifications:

  • Minimum of 3 years of experience in site reliability engineering.
  • Strong understanding of SRE and DevOps principles, with the ability to communicate technical concepts across different organizational levels.
  • Experience in automation, alerting, and remediation, with a focus on minimizing operational toil.
  • Expertise in cloud services such as GCP, AWS, or Oracle Cloud.
  • Proficient in Terraform, Git, and CI/CD practices.
  • Familiarity with real-time log and event monitoring tools, including DataDog, Cloud Logging, and Splunk.
  • Experience managing mission-critical databases and data pipelines (e.g., Oracle, Postgres, Mongo, BigQuery, Kafka, Airflow).
  • Proficient in Linux and possess scripting skills.
  • Programming experience in languages such as Go, Python, Bash, Java, or JavaScript.
  • Ability to write production-level code in a compiled language.
  • Willingness to work non-standard shifts, including nights and weekends, along with on-call responsibilities.
  • Bachelor's degree in Computer Science, Mathematics, or a related field.

What We Offer:

  • Competitive salary range of $150,000 to $160,000.
  • Opportunity to work at the intersection of technology, media, sports, and entertainment.
  • Contribute to platforms and products used globally.
  • Flexible hybrid work environment with modern office facilities.
  • Comprehensive benefits package including Medical, Dental, and Vision coverage.
  • Short-term and long-term disability insurance.
  • Flexible working hours.
  • Generous paid time off policy, close to 30 days annually, including holidays and year-end breaks.
  • 401k plan with matching contributions.
  • Tuition reimbursement programs.
  • Maternity and paternity leave benefits.
  • Employee perks for friends and family.

If you are an Infrastructure Reliability Specialist looking to lead the development of extensive data platforms within a premier sports and entertainment organization, we encourage you to explore this opportunity.



  • New York, New York, United States Russell Tobin & Associates Full time

    Position Overview: Site Reliability Engineer Role: SRE - Production SupportLocation: New York, NY (Hybrid)Compensation: $50 – 55 /hr. W2Employment Type: Contract to PermanentJob Responsibilities:As a Site Reliability Engineer, you will be responsible for ensuring the stability and reliability of our systems. Your key responsibilities will...


  • New York, New York, United States Hebbia Full time

    About HebbiaHebbia is a cutting-edge technology company that specializes in developing Artificial General Intelligence (AGI) solutions. Our mission is to empower users to collaborate with AI on complex tasks and validate responses, rather than blindly trusting them.Job DescriptionAs a highly skilled Site Reliability Engineer, you will play a critical role in...


  • New York, New York, United States Alloy Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Infrastructure Team at Alloy. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available...


  • New York, New York, United States Alloy Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Infrastructure Team at Alloy. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure high uptime and reliability.Key ResponsibilitiesDesign and implement scalable and secure cloud infrastructure...


  • New York, New York, United States Kyndryl Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Cloud Infrastructure team at Kyndryl. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based services.Key ResponsibilitiesDesign and Implement Monitoring and Logging Systems: Develop and...


  • New York, New York, United States Squarespace Full time

    About the RoleSquarespace is seeking an experienced Senior Site Reliability Engineer to join our Compute team. As a key member of our infrastructure engineering team, you will play a critical role in ensuring the reliability and scalability of our system.Key ResponsibilitiesDesign and implement scalable and reliable infrastructure solutions to support our...


  • New York, New York, United States Radar Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Radar, a leading provider of location infrastructure for every product and service. As a Site Reliability Engineer, you will play a critical role in designing, implementing, and maintaining our production infrastructure, ensuring high availability, scalability, and...


  • New York, New York, United States Radar Full time

    Job OverviewPosition SummaryWe are seeking Infrastructure Reliability Engineers to enhance our production systems. Radar operates a high-volume, data-centric platform managing over 1 billion API requests daily. Our services are utilized by more than 100 million devices globally. We maintain a multi-availability zone setup, with a key focus on advancing our...


  • New York, New York, United States Radar Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Radar, a leading provider of location infrastructure for every product and service. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign,...


  • New York, New York, United States Radar Full time

    About the RoleRadar is a high-throughput, data-intensive application handling 1 billion+ API calls per day. We're seeking a skilled Site Reliability Engineer to work on our production infrastructure.Key ResponsibilitiesDesign and implement scalable cloud infrastructure using Terraform and AWSCollaborate with cross-functional teams to ensure 99.999%...


  • New York, New York, United States ITE MGMT Full time

    Job OverviewITE Management, a prominent investment management and private equity firm with a focus on acquiring transportation assets and operational platforms within the infrastructure domain, is on the lookout for an IT Infrastructure Specialist to become a part of our IT division. Established in 2014, ITE is dedicated to identifying real asset investment...


  • New York, New York, United States Drum Associates Full time

    Drum Associates is on the lookout for an Infrastructure Automation Specialist to become an integral part of our committed team, focused on building reliable and scalable systems. You will have a significant impact within an innovative technology division that is dedicated to nurturing the next generation of leaders.Core Responsibilities:Establish...


  • New York, New York, United States City of New York Full time

    Job OverviewThe City of New York is dedicated to fostering a diverse workforce that reflects the community it serves. We are currently in search of a qualified individual to join our team as an Infrastructure Systems Specialist.This role is pivotal in advancing the organization’s IT objectives, ensuring a resilient infrastructure, effective deployment...


  • New York, New York, United States Russell Tobin & Associates Full time

    Job Description:As a Site Reliability Engineer at Russell Tobin & Associates, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. We are seeking a highly skilled and experienced engineer to join our team and contribute to the design, implementation, and maintenance of our cloud-based systems.Key...


  • New York, New York, United States Russell Tobin & Associates Full time

    Job Description:As a Site Reliability Engineer at Russell Tobin & Associates, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. We are seeking a highly skilled and experienced engineer to join our team and contribute to the design, implementation, and maintenance of our cloud-based systems.Key...


  • New York, New York, United States Flow Traders Full time

    Flow Traders is seeking a skilled Virtual Infrastructure Specialist to enhance our dynamic IT department. The successful candidate will be proactive and committed to the ongoing enhancement of our technological framework. This is an exceptional chance to become part of a prominent proprietary trading firm that fosters an entrepreneurial and innovative ethos...


  • New York, New York, United States FanDuel Full time

    ABOUT FANDUELFanDuel Group is a pioneering sports-tech entertainment organization that is transforming the way fans connect with their favorite sports, teams, and leagues. As the leading gaming platform in the United States, FanDuel encompasses a diverse portfolio of top brands in gaming, sports wagering, daily fantasy sports, advance-deposit betting, and...


  • New York, New York, United States Diverse Lynx Full time

    Position: Infrastructure Automation SpecialistWe are seeking a skilled Infrastructure Automation Specialist with a robust background in DevOps practices.Proficiency in Python: A solid understanding of Python programming is essential. Experience with CI/CD Tools: Familiarity with tools such as Ansible, Docker, Kubernetes, and Jenkins is required. Automation...


  • New York, New York, United States Engineers Gate Full time

    Position OverviewEngineers Gate (EG) stands at the forefront of quantitative investment, leveraging advanced technology for computer-driven trading across global financial markets. Our diverse team comprises researchers, engineers, and finance experts who employ sophisticated statistical models to analyze data and uncover predictive signals aimed at...


  • New York, New York, United States Open Systems Technologies Full time

    Position Overview:A leading financial services organization is in search of a proficient Platform Infrastructure Engineer. The successful candidate will possess a wealth of experience in DevOps, TechOps, or Site Reliability Engineering (SRE), with a robust foundation in AWS technologies. This position provides an attractive compensation package and the...