Site Reliability Engineer

2 weeks ago


Pittsburgh PA United States Rose International Full time

Date Posted: 11/08/2024

Hiring Organization: Rose International

Position Number: 474141

Job Title: Site Reliability Engineer

Job Location: Pittsburgh, PA, USA, 15222

Work Model: Hybrid

Shift:

Hybrid: 3 days in office / 2 remote

Hours: 8 am to 5 pm CST

Employment Type: Temp to Hire

Estimated Duration (In months): 6

Min Hourly Rate($): 65.00

Max Hourly Rate($): 70.00

Must Have Skills/Attributes: Ansible, Dynatrace, Grafana, Troubleshooting

Job Description

***Only qualified Site Reliability Engineer candidates located near the Dallas, TX or Pittsburgh, PA area to be considered due to the position requiring an onsite presence***


Education:

• Needs to have the skillset


Years of experience:

• 7+ years of experience


Must Have Technical Skills:

1. Extensive experience in network performance tuning and monitoring

2. Deep understanding of network protocols (e.g., TCP/IP, DNS, HTTP/S) and network optimization techniques.

3. Proficiency with Dynatrace and BigPanda for real-time monitoring, root cause analysis, and incident response; hands-on experience with these tools is required.

4. Strong background in configuring, managing, and troubleshooting network performance and latency issues across complex, distributed systems.

5. Experience with additional monitoring and observability tools like Thousand Eyes and Grafana.

6. Skilled in Ansible Tower for automation of network and system configurations.

7. Demonstrated ability to collaborate with cross-functional teams, troubleshoot effectively, and proactively identify areas for improvement in network reliability and performance.


Flex Skills/Nice to Have:

- Proven experience in incident/problem management with a good understanding of any of the tools used for this purpose.

- Good understanding of both UNIX and Windows operating systems

- Good understanding of web hosting technologies like apache / tomcat or other equivalent web/app servers.

- Good understanding of Big Data & cloud concepts.

- Good understanding of database technologies like ORACLE and SQL.

- Good understanding of monitoring tools is an added advantage.

- Solid understanding of the major functionality bundled into a release, both from a technology and business point of view.

- Strong knowledge of relevant applications and development life cycles.

- Experience working with geographically distributed and culturally diverse work-groups.

- Strong desire to learn new technology.


Soft Skills:

- Excellent communication skills, both verbal and written, with the ability to lead/manage large conference calls.

- Comfortable providing clear problem descriptions and guidance to business users in a time critical environment.

- Ability to be proactive with a strong bias for action, naturally inquisitive, and bias for continuous improvement of practices / processes.

- Excellent influence, negotiation and presentation skills.

- Experience in working with cross line of business teams, Outside Service Providers and Partner Organizations.

- Outstanding interpersonal skills and ability to establish strong relationships with all levels of management.

- Ability to work independently as a self-starter, and within a team environment.


Roles and Responsibilities:

• Monitor infrastructure, servers, middleware, databases, and batch jobs.

• Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc.

• Troubleshoot environment, data control and operational issues.

• Create and Maintain documentation to ensure knowledge accessibility.

• Automate and streamline process using scripts and scheduling tools.

• Liaise with other application support teams and internal/external business and technical partners.

• Provide ad hoc and on-demand reports.

• Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.

• Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.

• Participates in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.

• Ensures that all production changes are processed according to Change Management policies and procedures.

• Ensures that appropriate levels of Quality Assurance have been met for all new and existing products.

• Support Sustained Resiliency, Disaster Recovery, and High Availability events.

• Help Level 2 operation team with setting up monitoring and bridging the gaps in current monitoring setup.

• Play key part in setting up reporting and be a key component in Monitor -> Report -> Improve principle

• Coordinate incident management coverage, to ensure appropriate coverage.

• Call facilitation, coordination and communications during critical outage situations.

• Call documentation, queue management, ticket analysis and interface to impacting lines of business for incident impact analysis via the Production Assurance process.

• End to end view of issues for objectivity.

• Influence senior technology leads across organizations to ensure timely resolution of incidents

• Problem Management:

• Participate and ensure RCA (root cause analysis) activities on client impacting incidents are executed and action items are assigned / completed.

• Provide expertise and support during critical incidents, interfacing with all impacted groups to better manage the message.

• Chronic issue coordination and leadership.

• Guidance to all staff involved and vendors in driving a coordinated approach for results.

• Hygiene and Capacity Maintenance:

• Responsible for data quality of PLM.

• Work aggressively to make sure all servers are up to company standards as per uptimes, patch level etc.

• Work on Capacity planning for applications, estimating and analyzing growth rates of vital infrastructure components and adding capacity pro actively as and when required.

• Understand application code, work flow and business usage of application.

• Understand DB component of application.

• Understand the impacts of application based on seasonality of critical applications.

• Document known errors and play important role in Knowledge transfer to Level 1 team.

• Reduce escalations to Level 3 based on incremental learning about applications.


  • **Only those lawfully authorized to work in the designated country associated with the position will be considered.**


  • **Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client’s business needs and requirements.**


Benefits:

For information and details on employment benefits offered with this position, please visit here. Should you have any questions/concerns, please contact our HR Department via our secure website.

California Pay Equity:

For information and details on pay equity laws in California, please visit the State of California Department of Industrial Relations' website here.

Rose International is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, sexual orientation, gender (expression or identity), national origin, arrest and conviction records, disability, veteran status or any other characteristic protected by law. Positions located in San Francisco and Los Angeles, California will be administered in accordance with their respective Fair Chance Ordinances.

If you need assistance in completing this application, or during any phase of the application, interview, hiring, or employment process, whether due to a disability or otherwise, please contact our HR Department.

Rose International has an official agreement (ID #132522), effective June 30, 2008, with the U.S. Department of Homeland Security, U.S. Citizenship and Immigration Services, Employment Verification Program (E-Verify). (Posting required by OCGA 13/10-91.).



  • United, PA, United States Resource Informatics Group Full time

    Job DescriptionSRE role is a combination of architect-digital, full stack developer-digital, cloud engineering, and system engineer.Responsibilities:Expert level full stack developer profile with extensive experience in various time series database technology.Proficiency in data pipeline automation and infrastructure as code.Extensive knowledge and working...


  • Pittsburgh, Pennsylvania, United States PNC Full time

    Job SummaryPNC is seeking a highly skilled Site Reliability Engineer Manager to join our team. As an SRE Group Manager, you will be responsible for leading a team of Site Reliability Engineers to ensure the reliability and performance of our applications and infrastructure.Key ResponsibilitiesLead a team of Site Reliability Engineers to design, implement,...


  • Pittsburgh, PA , USA, United States PNC Full time

    Job SummaryPNC is seeking a highly skilled Site Reliability Engineering Group Manager to join our team. As a key member of our Site Reliability team, you will be responsible for managing teams of Site Reliability Engineers across multiple operating sites and applications to improve reliability, quality, and time-to-market of highly complex software...


  • Pittsburgh, Pennsylvania, United States Aurora Innovation Full time

    About Aurora InnovationAurora Innovation is a leading technology company that is revolutionizing the transportation industry with its self-driving technology. We are committed to making transportation safer, more accessible, and more efficient than ever before.Job Title: InfoSec Site Reliability EngineerWe are seeking a highly skilled InfoSec Site...


  • Pittsburgh, PA, United States ConsultUSA Full time

    Description:Our client has an immediate need for a Site Reliability Engineer, who will be responsible for specializing in improving all aspects of reliability, acting as a conduit between infrastructure and application teams on support issues, and improving tools, automation, processes, and software.Requirements:Bachelor’s degree in Engineering, Computer...


  • Pittsburgh, Pennsylvania, United States PNC Full time

    Job DescriptionPNC is seeking a highly skilled Site Reliability Engineering Group Manager to join our team. As a key member of our Site Reliability Engineering team, you will be responsible for leading a team of Site Reliability Engineers to improve the reliability, quality, and time-to-market of our software solutions.Key Responsibilities:Lead a team of...


  • Pittsburgh, Pennsylvania, United States General Dynamics Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at General Dynamics Mission Systems. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and survivability of mission-critical resources.Key ResponsibilitiesEnsuring uptime of critical systemsAutomating systems administration...


  • , FL, United States Hays Recruitment Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Hays Recruitment. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining large-scale, distributed systems that are highly available and scalable.Key Responsibilities:Design and...


  • Pittsburgh, Pennsylvania, United States General Dynamics Full time

    As a Senior Principal Site Reliability Engineer at General Dynamics Mission Systems, you will be a key member of a cross-functional team responsible for maintaining the survivability and reliability of mission-critical resources. Your expertise in ensuring uptime of critical systems, automating systems administration activities, and configuring, monitoring,...


  • Pittsburgh, United States Sygna LLC Full time

    Job Title: Sr. Site Reliability Engineer Contract Type: Contract to hireLocation: Hybrid (Dallas Tx / Pittsburgh PA) Must Have and Metrics Technical Skills: Years of experience: 7+


  • Pittsburgh, Pennsylvania, United States PNC Full time

    Job DescriptionPosition OverviewPNC is a leading financial institution that values its people as its greatest differentiator and competitive advantage. We strive to deliver the best experience for our customers by fostering an inclusive workplace culture where all employees feel respected, valued, and empowered to contribute to the company's success.As a...


  • Annapolis Junction, MD, United States Maximus Full time

    General information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...


  • Annapolis Junction, MD, United States Maximus Full time

    General information ...


  • Duluth, GA, United States BlueSky Resource Solutions Full time

    Job Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...


  • Fairfax, VA, United States Apex Systems Full time

    We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...


  • Oklahoma City, OK, United States Paycom Payroll Llc Full time

    Site reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites.RESPONSIBILITIESDevelop software to...


  • Newton, MA, United States Intelliswift Software Full time

    Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...


  • Portland, OR, United States Matlen Silver Full time

    Compensation: $70 - $75/HourHybrid: 2 Days Onsite Portland, OregonDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...


  • Miami, FL, United States INSPYR Solutions Full time

    Title: Site Reliability Engineer Make sure to apply quickly in order to maximise your chances of being considered for an interview Read the complete job description below. Location: Miami, FL Duration: 6+ months Compensation: $55.00 -60.00 Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S. Site Reliability...


  • Austin, TX, United States Sustainable Talent Full time

    Join Sustainable Talent as an Engineering Technician (Site Reliability Engineer) supporting Nvidia and their IPP Platform Group (Infrastructure, Planning and Process)! This is a W-2 full-time contract with openings in Hillsboro, OR and Austin, TX. We offer competitive pay $35-45/hourly based on factors like experience, education, location, etc. and provide...