Site Reliability Engineer

2 weeks ago

Pittsburgh PA United States ConsultUSA Full time

Description:

Our client has an immediate need for a Site Reliability Engineer, who will be responsible for specializing in improving all aspects of reliability, acting as a conduit between infrastructure and application teams on support issues, and improving tools, automation, processes, and software.

Requirements:

Bachelor’s degree in Engineering, Computer Science, or a related field is a plus
Extensive experience in network performance tuning and monitoring
Deep understanding of network protocols (e.g., TCP/IP, DNS, HTTP/S) and network optimization techniques.
Proficiency with Dynatrace and BigPanda for real-time monitoring, root cause analysis, and incident response; hands-on experience with these tools
Strong background in configuring, managing, and troubleshooting network performance and latency issues across complex, distributed systems
Experience with additional monitoring and observability tools like Thousand Eyes and Grafana
Skilled in Ansible Tower for automation of network and system configurations
Demonstrated ability to collaborate with cross-functional teams, troubleshoot effectively, and proactively identify areas for improvement in network reliability and performance
Proven experience in incident/problem management with a good understanding of any of the tools used for this purpose is a plus
- Good understanding of both UNIX and Windows operating systems
- Good understanding of web hosting technologies like apache/tomcat or other equivalent web/app servers
- Good understanding of Big Data & cloud concepts
- Good understanding of database technologies like ORACLE and SQL
- Good understanding of monitoring tools is an added advantage
- Solid understanding of the major functionality bundled into a release, both from a technology and business point of view
- Strong knowledge of relevant applications and development life cycles
- Experience working with geographically distributed and culturally diverse work-groups

Responsibilities:

Monitor infrastructure, servers, middleware, databases, and batch jobs
Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc.
Troubleshoot environment, data control, and operational issues
Create and Maintain documentation to ensure knowledge accessibility
Automate and streamline processes using scripts and scheduling tools
Liaise with other application support teams and internal/external business and technical partners
Provide ad hoc and on-demand reports
Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production
Lead problem resolution conduct root cause analysis and establish processes that will help incident prevention
Participates in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution, and reporting
Ensures that all production changes are processed according to Change Management policies and procedures
Ensures that appropriate levels of Quality Assurance have been met for all new and existing products
Support Sustained Resiliency, Disaster Recovery, and High Availability events
Help the Level 2 operation team with setting up monitoring and bridging the gaps in the current monitoring setup
Play a key part in setting up reporting and be a key component in Monitor -> Report -> Improve principle
Coordinate incident management coverage, to ensure appropriate coverage
Call facilitation, coordination, and communications during critical outage situations
Call documentation, queue management, ticket analysis, and interface to impacting lines of business for incident impact analysis via the Production Assurance process
End-to-end view of issues for objectivity
Influence senior technology leads across organizations to ensure the timely resolution of incidents
Participate and ensure RCA (root cause analysis) activities on client-impacting incidents are executed and action items are assigned/completed
Provide expertise and support during critical incidents, interfacing with all impacted groups to better manage the message
Chronic issue coordination and leadership.
• Guidance to all staff involved and vendors in driving a coordinated approach to results
Responsible for data quality of PLM
Work aggressively to make sure all servers are up to company standards as per uptimes, patch level, etc
Work on Capacity planning for applications, estimating and analyzing growth rates of vital infrastructure components, and adding capacity pro-actively as and when required
Understand application code, workflow, and business usage of the application
Understand DB component of application
Understand the impacts of application based on the seasonality of critical applications
Document known errors and play an important role in Knowledge transfer to the Level 1 team
Reduce escalations to Level 3 based on incremental learning about applications

Why Work for ConsultUSA:

ConsultUSA offers competitive salaries, major medical (PPO or HDHP w/ HSA), dental, and vision insurance plans, and 401k plan with immediate eligibility for both salary and hourly employees
ConsultUSA hosts several outings and events, holiday and summer parties, and volunteer opportunities throughout the year for employees
We will work with you to obtain training for in-demand technologies and prepare you for industry-recognized certification exams
ConsultUSA offers Business Analysis and Project Management training through our Project Management Institute (PMI)® award-winning sister company, PMCentersUSA

How to Apply:

To submit your application, please click the “Apply Now” button located at the top and bottom of the page.

ConsultUSA is committed to providing equal employment opportunities (EEO) to all qualified employees and applicants for employment without regard to race, color, religion, gender identity or expression, sexual orientation, national origin, age, disability, genetic information, marital status, pregnancy, ancestry, or status as a covered veteran as well as any other prohibited criteria under any applicable federal, state, and local laws applicable to ConsultUSA.

For a complete listing of all ConsultUSA jobs please visit www.consultusa.com

Site Reliability Engineer

4 weeks ago

United, PA, United States Resource Informatics Group Full time

Job DescriptionSRE role is a combination of architect-digital, full stack developer-digital, cloud engineering, and system engineer.Responsibilities:Expert level full stack developer profile with extensive experience in various time series database technology.Proficiency in data pipeline automation and infrastructure as code.Extensive knowledge and working...
Site Reliability Engineer Manager

4 weeks ago

Pittsburgh, Pennsylvania, United States PNC Full time

Job SummaryPNC is seeking a highly skilled Site Reliability Engineer Manager to join our team. As an SRE Group Manager, you will be responsible for leading a team of Site Reliability Engineers to ensure the reliability and performance of our applications and infrastructure.Key ResponsibilitiesLead a team of Site Reliability Engineers to design, implement,...
Site Reliability Engineer

3 weeks ago

Pittsburgh, PA, United States Rose International Full time

Date Posted: 11/08/2024Hiring Organization: Rose InternationalPosition Number: 474141Job Title: Site Reliability EngineerJob Location: Pittsburgh, PA, USA, 15222Work Model: HybridShift:Hybrid: 3 days in office / 2 remoteHours: 8 am to 5 pm CSTEmployment Type: Temp to HireEstimated Duration (In months): 6Min Hourly Rate($): 65.00Max Hourly Rate($): 70.00Must...
Sr. Site Reliability Engineer

2 weeks ago

Pittsburgh, United States Sygna LLC Full time

Job Title: Sr. Site Reliability Engineer Contract Type: Contract to hireLocation: Hybrid (Dallasâ€¯Tx / Pittsburghâ€¯PA)â€¯Must Have and Metrics Technical Skills: Years of experience: 7+
Site Reliability Engineering Group Manager

4 weeks ago

Pittsburgh, Pennsylvania, United States PNC Full time

Job DescriptionPosition OverviewPNC is a leading financial institution that values its people as its greatest differentiator and competitive advantage. We strive to deliver the best experience for our customers by fostering an inclusive workplace culture where all employees feel respected, valued, and empowered to contribute to the company's success.As a...
Site Reliability Engineer

2 weeks ago

Annapolis Junction, MD, United States Maximus Full time

General information Job Posting Title Site Reliability Engineer Date Wednesday, October 16, 2024 City Annapolis Junction State MD Country United States Working time Full-time Description & Requirements Maximus is seeking a Site Reliability Engineer to provide expertise to a federal client in support of their mission critical systems in defense of our...
Site Reliability Engineer

2 weeks ago

Annapolis Junction, MD, United States Maximus Full time

General information ...
Site Reliability Engineer

3 weeks ago

Duluth, GA, United States BlueSky Resource Solutions Full time

Job Title: Site Reliability Engineer – ObservabilityOverview:We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver...
Site Reliability Engineer

3 weeks ago

Fairfax, VA, United States Apex Systems Full time

We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better...
Site Reliability Engineer

4 months ago

Oklahoma City, OK, United States Paycom Payroll Llc Full time

Site reliability engineers will be dedicated full-time to creating software tools, metrics and processes that improve the reliability of applications, sites, and systems in production. The Site Reliability Engineer is primarily responsible for ensuring the integrity, functionality, and reliability of applications and sites.RESPONSIBILITIESDevelop software to...
Site Reliability Engineer

2 weeks ago

Newton, MA, United States Intelliswift Software Full time

Title : Site Reliability EngineerLocation : Newton, MA HybridDuration : 6 MonthsPay rate : $38.73 per hour on W2We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and...
Site Reliability Engineer

3 weeks ago

Portland, OR, United States Matlen Silver Full time

Compensation: $70 - $75/HourHybrid: 2 Days Onsite Portland, OregonDomain: Retail/Supply ChainJob Title: Site Reliability EngineerPosition SummaryAs a Site Reliability Engineer/DevOps Engineer, you will be responsible for ensuring the availability, performance, and reliability of Fulfillment Technology solutions for our client to support omni-channel...
Site Reliability Engineer IN

1 day ago

Indianapolis, IN, United States BCforward Full time

Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...
Site Reliability Engineer

2 days ago

Indianapolis, IN, United States BCforward Full time

Site Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...
Site Reliability Engineer

5 days ago

Miami, FL, United States INSPYR Solutions Full time

Title: Site Reliability Engineer Make sure to apply quickly in order to maximise your chances of being considered for an interview Read the complete job description below. Location: Miami, FL Duration: 6+ months Compensation: $55.00 -60.00 Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S. Site Reliability...
Site Reliability Engineer

2 weeks ago

Austin, TX, United States Sustainable Talent Full time

Join Sustainable Talent as an Engineering Technician (Site Reliability Engineer) supporting Nvidia and their IPP Platform Group (Infrastructure, Planning and Process)! This is a W-2 full-time contract with openings in Hillsboro, OR and Austin, TX. We offer competitive pay $35-45/hourly based on factors like experience, education, location, etc. and provide...
Vice President, Site Reliability/DevOps Engineer

5 days ago

UNITED STATES, PA, PITTSBURGH BNY Full time

Vice President, Site Reliability/DevOps Engineer (Dev Infrastructure Platform) (Vice President, Technical Product Specialist and App Delivery) At BNY, our culture empowers you to grow and succeed. As a leading global financial services company at the center of the world’s financial system we touch nearly 20% of the world’s investible assets. Every day...
Lead Site Reliability Engineer

1 week ago

Plano, TX, United States Cognizant Full time

About Cognizant's Digital Engineering Practice: At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective empathy...
Site Reliability Engineer and Capacity Planning

2 weeks ago

Parsippany, NJ, United States ScaleneWorks People Solutions LLP Full time

Job role : Site Reliability Engineer and Capacity PlanningLocation : Parsippany NJJob Type: ContractDescription:We are looking for a talented Site Reliability Engineer (SRE) with a strong background in Google Cloud Platform (GCP), and RedHat OpenShift administration. The ideal candidate will be responsible for ensuring the reliability, performance, and...
Site Reliability/Chaos Engineer

2 weeks ago

Durham, NC, United States Fidelity TalentSource LLC Full time

Fidelity TalentSource is your destination for discovering your next temporary role at Fidelity Investments. We are currently sourcing for a Chaos Engineer to work in Fidelity’s Site Reliability Center of Excellence in Durham, NC.The RoleWorkplace Investing (WI) is seeking a Site Reliability Engineering (SRE) Chaos Engineering Contractor with 10+ years of...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer