Site Reliability Engineer
4 weeks ago
Summary - Site Reliablity Engineer (For one of the Big 4 Sports &Entertainment League)
Our client is enhancing the landscape of the live sports and entertainment
industry. They are striving to deliver innovative, cutting-edge technologies to enable safe,
unforgettable fan experiences across the globe. They are assembling a world-class technology team to build and support platforms and products that anticipate these emerging opportunities.
The Data(base) Reliability Engineer will join the infrastructure team while also working
alongside league team members and be responsible for the following areas:
- Uptime, High Availability and Disaster recovery planning
- Incident response
- Optimization of data stores
- Identify SLIs and define SLOs
- Observability tooling
- Debugging running systems and providing tools to assist runtime debugging
- Optimizations for cost control
- Ability to interface with all levels of employees
- Ability to work both independently with little supervision and in a team environment
- Ensures availability, security, integrity, and recovery of data, pipelines and data stores.
- Define and configure relevant database metrics to ensure observability
- Create and maintain dashboards and reports to visualize database performance and health
- Create monitoring and alerting to trigger on error conditions, degradation symptoms and defined
- SLOs, as well as outages
- Develops and implements data store maintenance plans, including performing integrity checks,
- Updating statistics and monitoring security and hardware resource utilization
- Work with peers to roll out changes to production environments and help mitigate and prevent
- Data-related production incidents
- Work on automation of data store infrastructure and help engineering succeed by providing
self-service tools
- Resolves performance, capacity, replication, and other distributed data, pipeline and data store issues
- Support and debug data production issues across services and levels of the stack
- Provide timely incident response and participate in on-call rotations
- Continuously identify opportunities for process improvement and automation to enhance
database performance, reliability, and efficiency
- Prioritize unblocking your teammates, collaboration and knowledge sharing
Qualifications:
To perform this job successfully, an individual must be able to perform the Duties and Responsibilities (Duties) above satisfactorily and meet the requirements below. The requirements listed below are representative of the minimum knowledge, skill, and/or ability required. Reasonable accommodations will be made to enable individuals with disabilities to perform the essential functions of the job.
Education and/or Experience: Required:
- Minimum of a bachelor’s degree in Computer Science, MIS or related degree and five (5) years of relevant experience including software or reliability engineering, database administration, datastore programming experience or combination of education, training and experience.
- Ability to communicate clearly and effectively strong opinions on how to use technologies such as cloud, microframeworks, DevOps, automation, and observability tools
- Demonstrable experience engineering automation of triggers, alerts, and remediation
- Have written code in a compiled language that runs in production somewhere
- Experience in Oracle 19c, Postgres, Mongo, Change Data Capture, data and data store monitoring, management and support
- Experience with OLTP, OLAP as well as PL/SQL code development and tuning
- Experience in Linux OS and shell scripting
- Extensive experience in performance tuning and analysis
- Strong ITIL principles are a plus
- Capacity planning for all aspects of a data store system (storage, compute, memory, etc.)
- Understanding of networking and connectivity and how it relates to a data store environment
- Excellent problem solving and troubleshooting skills
- Ability to work non-standard shifts including nights and/or weekend on-call responsibilities
- Dedicated to continuous improvement of yourself and our SRE/DBRE capabilities
- Key Technical Traits
- APIs and microservices: REST, Web, Graph
- Database Solutions – Oracle, MYSQL, MSSQL, CloudSQL, NoSQL
- Cloud Providers: Oracle Cloud Infrastructure, Google Cloud Platform, AWS
- Real-time log/event monitoring – DataDog, Stackdriver, Oracle Enterprise Manager, Oracle Cloud
- Monitoring, SolarWinds, Splunk, SumoLogic, OpenTelemetry
- Scripting: PL/SQL, Shell
- Secured Access and control – Okta SSO and MFA, MS Active Directory, DataSafe
- Software Development tools – Jira, GIT, Jenkins, ArgoCD, Terraform
- Compliance: PCI DSS, SSAE18/SOC 1
-
Site Reliability Engineer
2 weeks ago
New York, United States Unreal Gigs Full timeJob DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...
-
Site Reliability Engineer
1 week ago
New York, United States Unreal Gigs Full timeJob Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...
-
Site Reliability Engineering Manager
2 weeks ago
New York, United States developrec Full timeSRE Lead/Manager | San Diego, CA | Full-time Role Overview: As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently standing at around one million. You'll spearhead our cloud infrastructure strategy...
-
Site Reliability Engineer
3 weeks ago
New York, United States InterEx Group Full timeSenior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
7 days ago
New York, United States InterEx Group Full timeSenior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
2 weeks ago
New York, United States The Judge Group, LLC Full timeContract: 6+ months Hybrid: Riverwoods, IL W2 ONLY - NO C2C Job Responsibilities: Guide full stack developers on the importance of SRE principles. Analyze, design, and deploy new functionality and enhancements with high quality (security, reliability, operations) to production. Build new and analyze current monitoring for applications for...
-
Site Reliability Engineer
4 weeks ago
New York, United States InterEx Group Full timeSenior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer
3 days ago
New York, United States Citadel Securities Americas Services LLC Full timeSite Reliability Engineer (Citadel Securities Americas Services LLC - New York, NY); Multiple positions available: Collaborate with cross-functional teams, including trading, quantitative, and software engineering teams, to support and enhance Citadel's core suite of trading applications with the latest, most cutting edge technology in order to proactively...
-
Site Reliability Engineer
3 weeks ago
New York, United States Nationstaff Full timeAbout This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...
-
Site Reliability Engineer
4 days ago
New York, United States Nationstaff Full timeAbout This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...
-
Site Reliability Engineer
1 day ago
New York, United States Gallery Systems Full timeJob Summary: Job Description: We are seeking a Site Reliability Engineer (SRE) with 3-5 years experience to join our team at Gallery Systems. The SRE will play a critical role in overseeing the reliability, performance, and scalability of our systems in a Microsoft/Linux environment. The ideal candidate will bring expertise and best practices from previous...
-
Site Reliability Engineer
7 days ago
New York, United States Hale Recruiting Full timeSummary - Site Reliablity Engineer (For one of the Big 4 Sports &Entertainment League) Our client is enhancing the landscape of the live sports and entertainment industry. They are striving to deliver innovative, cutting-edge technologies to enable safe, unforgettable fan experiences across the globe. They are assembling a world-class technology team to...
-
Junior Site Reliability Engineer
3 weeks ago
New York, New York, United States Sesame Workshop Full timeSesame Workshop is seeking a Junior Site Reliability Engineer. Sesame Workshop is an independent nonprofit organization dedicated to helping children grow smarter, stronger, and kinder. This role is within the Digital Media Engineering (DME) group which is part of the Technology and Engineering department and will help provide support for our diverse media...
-
Junior Site Reliability Engineer
1 week ago
New York, United States Sesame Workshop Full timeJob Description Sesame Workshop is seeking a Junior Site Reliability Engineer. Sesame Workshop is an independent nonprofit organization dedicated to helping children grow smarter, stronger, and kinder. This role is within the Digital Media Engineering (DME) group which is part of the Technology and Engineering department and will help provide support for our...
-
Senior Site Reliability Engineer
2 days ago
New York, United States Mondrian Alpha Full timeA leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies. The right candidate will come...
-
Senior Site Reliability Engineer
3 days ago
New York, United States Mondrian Alpha Full timeA leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies.The right candidate will come...
-
Senior Site Reliability Engineer
5 days ago
New York, United States InterEx Group Full timeROLE: Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission-critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in...
-
Site Reliability Engineer DevOps
3 days ago
New York, United States PEX Full time SITE RELIABILITY ENGINEER SUMMARY: Since 2006 PEX has been on a steady march to build and evolve a solution that helps improve the way organizations operate in order to make them more efficient, more nimble, and more competitive. PEX has evolved into a robust, secure SaaS solution with a deep suite of workforce spend management capabilities, advanced...
-
Senior Site Reliability Engineer
2 weeks ago
New York, United States InterEx Group Full timeROLE: Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission-critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...
-
Site Reliability Engineer DevOps
2 weeks ago
New York, United States PEX Full timeJob DescriptionJob DescriptionSITE RELIABILITY ENGINEER SUMMARY: Since 2006 PEX has been on a steady march to build and evolve a solution that helps improve the way organizations operate in order to make them more efficient, more nimble, and more competitive. PEX has evolved into a robust, secure SaaS solution with a deep suite of workforce spend...