Site Reliability Engineer

3 weeks ago


New York, United States Synergis Full time

SRE/Dynatrace Lead Contract to hire - W2 Remote - Candidate MUST be in Georgia or Alabama

Job Description The

Site Reliability Engineer Lead

will work with stakeholders to define SLOs and SLIs as well as develop the overall SRE strategy and roadmap. The ideal candidate will develop a depth of understanding of how all the systems work together, how they fail, how they can be improved and how they can be designed and monitored automatically and manually in the most optimized way – and share that understanding through dashboarding, reporting, demonstrations, presentations, and guidance with leadership and other engineers across the organization. This position will assist in the support and maintenance of a Continuous Integration and Continuous Delivery automation framework across multiple tools and platforms. This person will equally play a vital role on the team and must demonstrate a solid understanding of site reliability, operability, infrastructure, and other DevOps groups. They should possess a high-degree of automation skills and expertise. This person should possess a software development background preferably including the development of shared/common components and integrations. Extensive C#.NET and PowerShell experience with Azure DevOps/GitHub, Visual Studio, Kubernetes/Google/AWS/Apigee and other cloud tools, CI/CD pipelines, and the Continuous/Shift-Left principles and process, in general, thrive in a fast-paced dynamic environment, and effortlessly adapt to the ever-evolving changes that comes along with technology and doing business. The person in this role will interact closely with several other Business Units to ensure clear understanding of business and technical needs; timely delivery of requirements; and service level agreements, events, and operations. Also, this person would collaborate and interface with those who have operational responsibility of the software infrastructure performance, SDLC development, application security, system monitoring, risk, compliance, including testing and documentation – as well as present to leadership and technical teams findings and possible solutions to addressing application/system challenges impacting service-levels.

Responsibilities: Team Leadership: Lead and mentor a team of SREs, fostering a collaborative and high-performing culture. Set clear expectations, provide regular feedback, and empower team members to excel. Drive continuous learning and skill development within the team. Willing to Collaborate/Cooperate with Enterprise, Infrastructure, and Architects on DevOps tool chain decisions. Be a collaborative highly experienced & supportive team player with strong communication and planning skills. Gives attention to detail and demonstrates a high commitment to customer satisfaction. Product Ownership: Collaborate with leadership and prospective business teams to understand, ensure, and anticipate their SLAs and SLOs commitments. Translate user requirements into clear and actionable product features. Define the overall vision for the team’s products and services. Follow engineering best practices, participate in new tool evaluations. Operational Excellence: Define and implement best practices for incident response, monitoring, and alerting. Identify, assess, and integrate various open-source technologies and cloud services. Collaborate with cross-functional teams (product development, operations, etc.) to improve system reliability. Develop and maintain incident playbooks and post-mortem processes. Database SQL experience with Oracle, Microsoft SQL Server, TOAD, MongoDB, others. Develop internal process automations using observability (like Splunk & Dynatrace), DevOps, & DevSecOps tools and AI technologies. Assume ownership of automation processes; problem resolution and root cause analysis; strong automation, problem-solving skills, and ability to follow through to completion. Detailed, hands-on experience with public cloud resources and services such as Microsoft Azure, Docker, and other similar tools Develop new automation framework to utilize and define/refine existing build and deploy processes. Be a contributor toward development/testing of common services to support Monitoring, containerization, CI/CD. Infrastructure and Automation: Automate repetitive tasks and processes to improve efficiency. Champion infrastructure as code (IaC) principles. Working closely with Development, QA, Product Management, and Operations teams to make sure Product Releases on-time with quality. Performance and Scalability: Monitor system performance, identify bottlenecks, and optimize resource utilization. Plan for capacity growth and scalability. Collaborate with development teams to ensure efficient code deployment. Incident Management: Serve as a technical contributor in incident response efforts during critical incidents. Coordinate with stakeholders to minimize downtime and impact. Drive root cause analysis and preventive measures. Technology Stack: Our SREs work with a variety of tools and platforms, including but not limited to:Dynatrace Splunk Grafana OpenTelemetry Github Enterprise Azure DevOps Microsoft SQL Server and Oracle RDMS & Oracle Cloud Primarily .NET development environment Qualifications: Bachelor’s Degree in Computer Science (or closely related field) with at least 5+ years of relevant IT experience Strong verbal/written communication skills Strong analytical and problem-solving skills Strong PowerShell Scripting, Python, Bash, Perl, Shell, and other scripting skills At minimum, 5 years of experience as a Site Reliability Engineer Experience leading projects and teams A background within Object Oriented software development An understanding of monitoring, performance management, automation, and cloud infrastructure Exceptional architectural, design and development skills Excellent verbal and written communication skills Work great in a team environment and as an individual contributor on solo efforts with minimal supervision 5+ years of DevOps experience in application (vendor & internal apps) delivery and transformation to deployment automation Proven ability to follow priorities and timelines Track record & passion for bringing high-quality automation, including continuous deployments, automated service failover, etc. into a team Knowledge of good coding practices and improving code quality Extensive experience in creating robust release management processes using PowerShell and other automation scripting technologies Extensive experience in Configuration Mgt/Deployment tools such as SCCM / Azure DevOps (TFS) Experience in explaining and persuading DevOps’ culture, automation, lean, measurement, and sharing changes throughout internal teams and with business partners. Experience with building CI/CD Delivery solutions using tools such as Azure DevOps (TFS), GitHub, Docker Experience building an SRE practice Experience within financial services Experience working in an Agile environment Experience in systems administration activities and networking technologies Experience with organizational transformations to Agile methodology and practices within a more DevOps-centric ecosystem supporting process and delivery automation and collaboration. Has experience of architecting solutions to support and host customer facing products with a focus on stability, scalability, security, testability, and maintainability.

The hourly pay range for this position is $55 to $70/hr (dependent on factors including but not limited to client requirements, experience, statutory considerations, and location). Benefits available to full-time employees: medical, dental, vision, disability, life insurance, 401k and commuter benefits. Note: Disclosure as required by the Equal Pay for Equal Work Act (CO), NYC Pay Transparency Law, and sb5761 (WA).

About Synergis Synergis serves a myriad of clients across nearly all industries, from start-ups to Fortune 100 companies. The outcomes of these relationships are demonstrated in a growing list of more than 300 clients and industry recognition by Inc. magazine and the Atlanta Business Chronicle. From its foundation in 1997, Synergis has been successfully recruiting and placing IT professionals in all areas of information technology. For more information about Synergis, please visit the company website at www.synergishr.com.

Synergis is proud to be an Equal Opportunity Employer. We value diversity and do not discriminate on the basis of race, color, ethnicity, national origin, religion, age, gender, gender identity, political affiliation, sexual orientation, marital status, disability, military/veteran status, or any other status protected by applicable law.

For immediate consideration, please forward your resume to Sumner Pirkle at spirkle@synergishr.com. If you require assistance or an accommodation in the application or employment process, please contact us at spirkle@synergishr.com



  • New York, United States Unreal Gigs Full time

    Job Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States developrec Full time

    SRE Lead/Manager | San Diego, CA | Full-time Role Overview: As the Engineering Manager for Site Reliability, you'll lead the charge in transitioning to cloud-based solutions while ensuring the stability of our existing systems for our rapidly growing user base, currently standing at around one million. You'll spearhead our cloud infrastructure strategy...


  • New York, United States InterEx Group Full time

    Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...


  • New York, United States InterEx Group Full time

    Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...


  • New York, United States The Judge Group, LLC Full time

    Contract: 6+ months Hybrid: Riverwoods, IL W2 ONLY - NO C2C Job Responsibilities: Guide full stack developers on the importance of SRE principles. Analyze, design, and deploy new functionality and enhancements with high quality (security, reliability, operations) to production. Build new and analyze current monitoring for applications for...


  • New York, United States InterEx Group Full time

    Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...


  • New York, United States Citadel Securities Americas Services LLC Full time

    Site Reliability Engineer (Citadel Securities Americas Services LLC - New York, NY); Multiple positions available: Collaborate with cross-functional teams, including trading, quantitative, and software engineering teams, to support and enhance Citadel's core suite of trading applications with the latest, most cutting edge technology in order to proactively...


  • New York, United States Nationstaff Full time

    About This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...


  • New York, United States Nationstaff Full time

    About This Role We are seeking a talented Site Reliability Engineer with experience in building and maintaining continuous integration, automating programmatic tasks, deploying applications, configuration management, and monitoring and maintaining the uptime of the platform. The Site Reliability Engineer will be an expert in Linux, is passionate about open...


  • New York, United States Gallery Systems Full time

    Job Summary: Job Description: We are seeking a Site Reliability Engineer (SRE) with 3-5 years experience to join our team at Gallery Systems. The SRE will play a critical role in overseeing the reliability, performance, and scalability of our systems in a Microsoft/Linux environment. The ideal candidate will bring expertise and best practices from previous...


  • New York, United States Hale Recruiting Full time

    Summary - Site Reliablity Engineer (For one of the Big 4 Sports &Entertainment League) Our client is enhancing the landscape of the live sports and entertainment industry. They are striving to deliver innovative, cutting-edge technologies to enable safe, unforgettable fan experiences across the globe. They are assembling a world-class technology team to...


  • New York, United States Sesame Workshop Full time

    Job Description Sesame Workshop is seeking a Junior Site Reliability Engineer. Sesame Workshop is an independent nonprofit organization dedicated to helping children grow smarter, stronger, and kinder. This role is within the Digital Media Engineering (DME) group which is part of the Technology and Engineering department and will help provide support for our...


  • New York, New York, United States Sesame Workshop Full time

    Sesame Workshop is seeking a Junior Site Reliability Engineer. Sesame Workshop is an independent nonprofit organization dedicated to helping children grow smarter, stronger, and kinder. This role is within the Digital Media Engineering (DME) group which is part of the Technology and Engineering department and will help provide support for our diverse media...


  • New York, United States Mondrian Alpha Full time

    A leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies. The right candidate will come...


  • New York, United States Mondrian Alpha Full time

    A leading systematic multi strat fund are seeking an experienced site reliability engineer to join a team of senior engineers to focus on varying platforms throughout the business. SRE's here combine software and systems engineering experience to build, maintain and improve systems that power the companies investment strategies.The right candidate will come...


  • New York, United States InterEx Group Full time

    ROLE: Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission-critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in...


  • New York, United States PEX Full time

    ​ SITE RELIABILITY ENGINEER SUMMARY: Since 2006 PEX has been on a steady march to build and evolve a solution that helps improve the way organizations operate in order to make them more efficient, more nimble, and more competitive. PEX has evolved into a robust, secure SaaS solution with a deep suite of workforce spend management capabilities, advanced...


  • New York, United States InterEx Group Full time

    ROLE: Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission-critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...


  • New York, United States PEX Full time

    Job DescriptionJob Description​SITE RELIABILITY ENGINEER SUMMARY: Since 2006 PEX has been on a steady march to build and evolve a solution that helps improve the way organizations operate in order to make them more efficient, more nimble, and more competitive. PEX has evolved into a robust, secure SaaS solution with a deep suite of workforce spend...