Sr. Site Reliability Engineer
6 days ago
Job Title: Sr. Site Reliability Engineer
Ready to apply Before you do, make sure to read all the details pertaining to this job in the description below.
Contract Type: Contract to hire
Location: Hybrid (Dallas Tx)
Must Have and Metrics Technical Skills:
Years of experience: 7+
Ability to collaborate with cross-functional teams, troubleshoot effectively, and proactively identify areas for improvement in network reliability and performance
Ansible Tower
BigPanda
Configuring, managing, and troubleshooting network performance and latency issues across complex, distributed systems
Dynatrace
Grafana
Network performance tuning and monitoring, with a deep understanding of network protocols and network optimization techniques
ThousandEyes
Extensive experience in network performance tuning and monitoring
Deep understanding of network protocols (e.g., TCP/IP, DNS, HTTP/S) and network optimization techniques.
Proficiency with Dynatrace and BigPanda for real-time monitoring, root cause analysis, and incident response; hands-on experience with these tools is required.
Strong background in configuring, managing, and troubleshooting network performance and latency issues across complex, distributed systems.
Experience with additional monitoring and observability tools like Thousand Eyes and Grafana.
Skilled in Ansible Tower for automation of network and system configurations.
Demonstrated ability to collaborate with cross-functional teams, troubleshoot effectively, and proactively identify areas for improvement in network reliability and performance.
Flex Skills/Nice to Have:
Proven experience in incident/problem management with a good understanding of any of the tools used for this purpose.
- Good understanding of both UNIX and Windows operating systems
- Good understanding of web hosting technologies like Apache / tomcat or other equivalent web/app servers.
- Good understanding of Big Data & cloud concepts.
- Good understanding of database technologies like ORACLE and SQL.
- Good understanding of monitoring tools is an added advantage.
- Solid understanding of the major functionality bundled into a release, both from a technology and business point of view.
- Strong knowledge of relevant applications and development life cycles.
- Experience working with geographically distributed and culturally diverse work-groups.
- Strong desire to learn new technology.
Roles and Responsibilities:
Monitor infrastructure, servers, middleware, databases, and batch jobs.
Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc.
Troubleshoot environment, data control and operational issues.
Create and Maintain documentation to ensure knowledge accessibility.
Automate and streamline process using scripts and scheduling tools.
Liaise with other application support teams and internal/external business and technical partners.
Provide ad hoc and on-demand reports.
Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.
Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.
Participates in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.
Ensures that all production changes are processed according to Change Management policies and procedures.
Ensures that appropriate levels of Quality Assurance have been met for all new and existing products.
Support Sustained Resiliency, Disaster Recovery, and High Availability events.
Help Level 2 operation team with setting up monitoring and bridging the gaps in current monitoring setup.
Play key part in setting up reporting and be a key component in Monitor -> Report -> Improve principle
Coordinate incident management coverage, to ensure appropriate coverage.
Call facilitation, coordination and communications during critical outage situations.
Call documentation, queue management, ticket analysis and interface to impacting lines of business for incident impact analysis via the Production Assurance process.
End to end view of issues for objectivity.
Influence senior technology leads across organizations to ensure timely resolution of incidents
Problem Management:
Participate and ensure RCA (root cause analysis) activities on client impacting incidents are executed and action items are assigned / completed.
Provide expertise and support during critical incidents, interfacing with all impacted groups to better manage the message.
Chronic issue coordination and leadership.
Guidance to all staff involved and vendors in driving a coordinated approach for results.
Hygiene and Capacity Maintenance:
Responsible for data quality of PLM.
Work aggressively to make sure all servers are up to company standards as per uptimes, patch level etc.
Work on Capacity planning for applications, estimating and analyzing growth rates of vital infrastructure components and adding capacity pro-actively as and when required.
Understand application code, work flow and business usage of application.
Understand DB component of application.
Understand the impacts of application based on seasonality of critical applications.
Document known errors and play important role in Knowledge transfer to Level 1 team.
Reduce escalations to Level 3 based on incremental learning about applications.
Intended length of Assignment: 4/5/2025
Reason for open position: SRE/SRC Special Projects
Potential for Contract Extension: N/A
This position is contract with the right to hire if a need becomes available. Manager will only look at candidates that are open to converting to a full time PNC employee. PNC will not sponsor work visas if the decision is made to hire the contingent worker: YES
Initiatives/Projects: SRE / SRC Special Projects
Industry background: Technical
Soft Skills:
- Excellent communication skills, both verbal and written, with the ability to lead/manage large conference calls.
- Comfortable providing clear problem descriptions and guidance to business users in a time critical environment.
- Ability to be proactive with a strong bias for action, naturally inquisitive, and bias for continuous improvement of practices / processes.
- Excellent influence, negotiation and presentation skills.
- Experience in working with cross line of business teams, Outside Service Providers and Partner Organizations.
- Outstanding interpersonal skills and ability to establish strong relationships with all levels of management.
- Ability to work independently as a self-starter, and within a team environment.
Interview Process:
Logistics:
2 step interview
1st round with HM
2nd round panel ITV with engineering managers
-
Sr. Site Reliability Engineer
2 weeks ago
Dallas, United States Sygna LLC Full timeJob Title: Sr. Site Reliability Engineer Contract Type: Contract to hire Location: Hybrid (Dallas Tx) Must Have and Metrics Technical Skills: Years of experience: 7+ Ability to collaborate with cross-functional teams, troubleshoot effectively, and proactively identify areas for improvement in network reliability and performance Ansible...
-
Site Reliability Engineer
7 days ago
Dallas, United States Coforge Full timeJob Title: Site Reliability EngineerSkills: Site Reliability Engineering, DevOps, Dynatrace, Thousand Eyes, ServiceNow, Cloud Platforms (AWS, GCP, Azure), CI/CD pipelines & Observability tools (Prometheus, Grafana & ELK Stack) Experience: 5+ yearsJob Location: Dallas, TXWork Mode: FTE / OnsiteIn-person interview requiredWe at Coforge are hiring Site...
-
Site Reliability Engineer
7 days ago
Dallas, United States Coforge Full timeJob Title: Site Reliability EngineerSkills: Site Reliability Engineering, DevOps, Dynatrace, Thousand Eyes, ServiceNow, cloud platforms (AWS, GCP, Azure), CI/CD pipelines, Observability tools (Prometheus, Grafana, ELK Stack)Experience: 5+ yearsJob Location: Dallas, TXFTE/ OnsiteIn-person interview requiredWe at Coforge are hiring Site Reliability Engineer...
-
Site Reliability Engineer
3 weeks ago
dallas, United States Tietoevry Full timePosition – Site Reliability EngineerLocation – Dallas, TXDuration – 6 months Rate - CompetitiveVisa Preference - USC, GC, H4 EAD, L2 EADJob Summary:We are looking for a Site Reliability Engineer to join our team to develop and automate solutions for operational efficiencies and improved reliability of our Cloud Platform. As we expand customer...
-
Site Reliability Engineer
2 weeks ago
dallas, United States Tietoevry Full timePosition – Site Reliability EngineerLocation – Dallas, TXDuration – 6 months Rate - CompetitiveVisa Preference - USC, GC, H4 EAD, L2 EADJob Summary:We are looking for a Site Reliability Engineer to join our team to develop and automate solutions for operational efficiencies and improved reliability of our Cloud Platform. As we expand customer...
-
Site Reliability Engineer
3 weeks ago
Dallas, United States Tietoevry Full timePosition – Site Reliability EngineerLocation – Dallas, TXDuration – 6 months Rate - CompetitiveVisa Preference - USC, GC, H4 EAD, L2 EADJob Summary:We are looking for a Site Reliability Engineer to join our team to develop and automate solutions for operational efficiencies and improved reliability of our Cloud Platform. As we expand customer...
-
Senior Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Capgemini Full timeSite Reliability Engineer Job DescriptionWe're seeking an experienced Site Reliability Engineer to join our team at Capgemini. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud...
-
Site Reliability Engineer
3 weeks ago
Austin, TX, United States Sustainable Talent Full timeJoin Sustainable Talent as an Engineering Technician (Site Reliability Engineer) supporting Nvidia and their IPP Platform Group (Infrastructure, Planning and Process)! This is a W-2 full-time contract with openings in Hillsboro, OR and Austin, TX. We offer competitive pay $35-45/hourly based on factors like experience, education, location, etc. and provide...
-
Lead Site Reliability Engineer
2 weeks ago
Plano, TX, United States Cognizant Full timeAbout Cognizant's Digital Engineering Practice: At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective empathy...
-
Site Reliability Engineer
2 weeks ago
Annapolis Junction, MD, United States Maximus Full timeGeneral information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...
-
Site Reliability Engineer
3 weeks ago
Annapolis Junction, MD, United States Maximus Full timeGeneral information ...
-
Site Reliability Engineer
3 weeks ago
Duluth, GA, United States BlueSky Resource Solutions Full timeJob Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...
-
Sr. Engineering Manager
3 weeks ago
Dallas, TX, United States TTS Technologies Full timeSr. Engineering ManagerFull time / Direct HireOnsite in Dallas, TexasStart: ASAPSalary: 130-155K + bonusOur manufacturing client is looking for a Sr. Engineering Manager to manage all aspects of the engineering and maintenance functions.Duties:Management of engineers and maintenance managers/supervisors.Check preventative maintenance programs for the...
-
Site Reliability Engineer
3 weeks ago
Fairfax, VA, United States Apex Systems Full timeWe are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...
-
Site Reliability Engineer
4 months ago
Oklahoma City, OK, United States Paycom Payroll Llc Full timeSite reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites.RESPONSIBILITIESDevelop software to...
-
Site Reliability Engineer
3 weeks ago
Newton, MA, United States Intelliswift Software Full timeTitle : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...
-
Site Reliability Engineer
3 weeks ago
Portland, OR, United States Matlen Silver Full timeCompensation: $70 - $75/HourHybrid: 2 Days Onsite Portland, OregonDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...
-
Site Reliability Engineer IN
4 days ago
Indianapolis, IN, United States BCforward Full timeSite Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...
-
dallas, United States Coforge Full timeJob Title: Site Reliability EngineerSkills: Site Reliability Engineering, DevOps, Dynatrace, Thousand Eyes, ServiceNow, cloud platforms (AWS, GCP, Azure), CI/CD pipelines, Observability tools (Prometheus, Grafana, ELK Stack)Experience: 5+ yearsJob Location: Dallas, TXFTE/ OnsiteIn-person interview requiredWe at Coforge are hiring Site Reliability Engineer...
-
dallas, United States Coforge Full timeJob Title: Site Reliability EngineerSkills: Site Reliability Engineering, DevOps, Dynatrace, Thousand Eyes, ServiceNow, Cloud Platforms (AWS, GCP, Azure), CI/CD pipelines & Observability tools (Prometheus, Grafana & ELK Stack) Experience: 5+ yearsJob Location: Dallas, TXWork Mode: FTE / OnsiteIn-person interview requiredWe at Coforge are hiring Site...