We have other current jobs related to this field that you can find below


  • New York, United States Automatic Data Processing Full time

    ADP is hiring a Site Reliability Engineer. Do you thrive in a challenging environment, love production systems, curious by nature with a thirst for pushing the limits? Are you inspired by transformation and making an impact on the lives of millions o Reliability Engineer, Liability, Reliability, Engineer, Reliability, Operations, Manufacturing


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Mondrian Alpha Full time

    An industry leading systematic trading fund is seeking highly skilled Site Reliability Engineers to join a team responsible for engineering and supporting the companies critical infrastructure platforms. This team also handles the centralized development infrastructure and works alongside engineering teams across the business assure the optimal route of...


  • New York, United States ICTerGezocht Full time

    Locatie Amsterdam Vacature in het kort Ever thought of how many people log in to the app or Internet Banking website each month? Over five million! The objective of the Personal Banking Grid is to ensure that each visit is not only secure but also a personal and smooth experience. As a Site Reliability Engineer, you play a key role in this mission. You will...


  • New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States InterEx Group Full time

    Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States Hebbia Full time

    About Hebbia The user interface for AGI - Hebbia is AI that works the way you work. Designed to be generally capable- it can tackle even the most complex tasks, citing answers over any amount of sources. By showing its work, Hebbia empowers users to collaborate with AI on each step and validate responses instead of blindly trusting them. Our mission is to...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in NJContract Duration: Long-term EngagementCompensation: $50 per hourNote: No OPT/CPT candidates will be considered.We are seeking a highly skilled Senior Site Reliability Engineer (SRE) with subject matter expertise. The ideal candidate will possess exceptional communication skills and the...


  • New York, United States InterEx Group Full time

    Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication abilities and the confidence to engage with executive-level...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess:Exceptional communication skills, with the ability to engage confidently with...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a seasoned professional with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication skills and the confidence to engage with executive-level...


  • New York, United States Diverse Lynx Full time

    SRE - Site Reliability Engineer Jacksonville, FL / Cary, NC / New York, NY (Onsite) Full-time Should be having cloud engineering experience and acting as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, and implementing automated solutions to reduce TOIL. Good knowledge on GCP Hands...


  • New York, United States TENTH MOUNTAIN LLC Full time

    Job DescriptionJob DescriptionJoin KGES as a Lead Site Reliability EngineerAt KGES, we are dedicated to connecting veterans with top-tier employers who value their military background and skills. As a Lead Site Reliability Engineer, you'll have the opportunity to leverage your expertise in DevOps, cloud infrastructure, and incident management, while...


  • New York, United States CloudFit Software Full time

    CloudFit Software is looking for a Site Reliability Engineer (SRE) to work directly on increasing quality, performance, and reliability of the CloudFit Managed Applications and Services systems. Our SREs help maintain accountability of production workloads while making it easier to run modern applications and services in the cloud. As a team, we bring...

Site Reliability Engineer

2 months ago


New York, United States Hale Recruiting Full time

Summary - Site Reliablity Engineer (For one of the Big 4 Sports &Entertainment League)

Our client is enhancing the landscape of the live sports and entertainment industry. They are striving to deliver innovative, cutting-edge technologies to enable safe, unforgettable fan experiences across the globe. They are assembling a world-class technology team to build and support platforms and products that anticipate these emerging opportunities.

The Data(base) Reliability Engineer will join the infrastructure team while also working alongside league team members and be responsible for the following areas:

Uptime, High Availability and Disaster recovery planning Incident response Optimization of data stores Identify SLIs and define SLOs Observability tooling Debugging running systems and providing tools to assist runtime debugging Optimizations for cost control Ability to interface with all levels of employees Ability to work both independently with little supervision and in a team environment Ensures availability, security, integrity, and recovery of data, pipelines and data stores. Define and configure relevant database metrics to ensure observability Create and maintain dashboards and reports to visualize database performance and health Create monitoring and alerting to trigger on error conditions, degradation symptoms and defined SLOs, as well as outages Develops and implements data store maintenance plans, including performing integrity checks, Updating statistics and monitoring security and hardware resource utilization Work with peers to roll out changes to production environments and help mitigate and prevent Data-related production incidents Work on automation of data store infrastructure and help engineering succeed by providing self-service tools Resolves performance, capacity, replication, and other distributed data, pipeline and data store issues Support and debug data production issues across services and levels of the stack Provide timely incident response and participate in on-call rotations Continuously identify opportunities for process improvement and automation to enhance database performance, reliability, and efficiency Prioritize unblocking your teammates, collaboration and knowledge sharing

Qualifications:

To perform this job successfully, an individual must be able to perform the Duties and Responsibilities (Duties) above satisfactorily and meet the requirements below. The requirements listed below are representative of the minimum knowledge, skill, and/or ability required. Reasonable accommodations will be made to enable individuals with disabilities to perform the essential functions of the job.

Education and/or Experience: Required: Minimum of a bachelor’s degree in Computer Science, MIS or related degree and five (5) years of relevant experience including software or reliability engineering, database administration, datastore programming experience or combination of education, training and experience. Ability to communicate clearly and effectively strong opinions on how to use technologies such as cloud, microframeworks, DevOps, automation, and observability tools Demonstrable experience engineering automation of triggers, alerts, and remediation Have written code in a compiled language that runs in production somewhere Experience in Oracle 19c, Postgres, Mongo, Change Data Capture, data and data store monitoring, management and support Experience with OLTP, OLAP as well as PL/SQL code development and tuning Experience in Linux OS and shell scripting Extensive experience in performance tuning and analysis Strong ITIL principles are a plus Capacity planning for all aspects of a data store system (storage, compute, memory, etc.) Understanding of networking and connectivity and how it relates to a data store environment Excellent problem solving and troubleshooting skills Ability to work non-standard shifts including nights and/or weekend on-call responsibilities Dedicated to continuous improvement of yourself and our SRE/DBRE capabilities Key Technical Traits APIs and microservices: REST, Web, Graph Database Solutions – Oracle, MYSQL, MSSQL, CloudSQL, NoSQL Cloud Providers: Oracle Cloud Infrastructure, Google Cloud Platform, AWS Real-time log/event monitoring – DataDog, Stackdriver, Oracle Enterprise Manager, Oracle Cloud Monitoring, SolarWinds, Splunk, SumoLogic, OpenTelemetry Scripting: PL/SQL, Shell Secured Access and control – Okta SSO and MFA, MS Active Directory, DataSafe Software Development tools – Jira, GIT, Jenkins, ArgoCD, Terraform Compliance: PCI DSS, SSAE18/SOC 1

#J-18808-Ljbffr