SRE (System Reliability Engineer)
4 weeks ago
Title: SRE (System Reliability Engineer)
Location - O'Fallon, MO( Onsite day 1)
Preferred Experience - 10 + years.
Job Description
Roles/responsibilities:
- Incident Resolution - Review and resolve the Incidents arising from
- Operation Command Center Alerts
- Alerts from Enterprise Monitoring Operations (EM Operations).
- OMNIBUS and Splunk Alerts
- Change Implementation - Deploying the application related artifacts to the production environments in the slotted approved release window
- Reporting the issues with the deployments and coordinating with the Development Teams to fix any deployment issues
- Work Orders - Resolve Work orders in form of Business/functional queries, adhoc testing, verification and validation etc, from Regional product team and customer support teams.
- Traffic Routing - perform traffic routing in support of infrastructure maintenance
- Perform Root Cause Analysis in detail for High severity Incidents - and take action on fixing the underlying cause of the high severity issues. Take necessary preventive actions also.
- Supporting the UAT testing by the Product team and Regional customer support team.
- Configuring application/artifacts and supporting the new customer onboarding to the platform
- Raise new change tickets and arrange for approvals, including CAB approvals
- Review and approve change tickets.
- Work with customers on ad-hoc queries
- Work with Development / Testing team for defect analysis (with Production simulated data)
- Build automation scripts that reduce the number of Incidents and/or improves processes followed
- Support customer to fill in the Post Incident Report (PIR) when any high impacting Incidents affecting customers occurred.
- Participate / Initiate in War Room calls that impacts application availability or has a customer impact
- Willing to work on shifts (Morning & Afternoon shifts) & Weekend support
- Unix Shell Scripting, SQL
- Troubleshooting using logs, Splunk / Dynatrace
- ITSM - Incident, Change and Problem Management
- L2 Support experience is a must
- Snowflake
- PCF Cloud knowledge
- CI/CD, Jenkins, Git & Maven
- Remedy - Ticketing Tool
- Rally (For Story and Bug Tracking)
- Splunk and Dynatrace for Monitoring
- WinSCP (file movement/ validation)
- CyberArk/Putty
- Toad - Querying Tool for DB
-
Site Reliability Engineer
6 days ago
Jersey City, United States Syntricate Technologies Full timeHi, We are looking for a Site Reliability Engineer. Please let me know, if interested. Position Title* Site Reliability Engineer (AWS) (SRE) Position Responsibilities Site Reliability Engineer (AWS) (SRE) Work Location: Jersey city New Jersey Nearby candidates as F2F is required ( 3 days WFO, 2 days WFH) W2 Contract Only Looking for strong AWS experience,...
-
Staff Site Reliability Engineer
1 week ago
Oklahoma City, United States Cribl Full timeThis job was posted by https://okjobmatch.com : For more information, please see: https://okjobmatch.com/jobs/3234770 Cribl does differently. What does that mean? It means we are a serious company that doesn\'t take itself too seriously; and we\'re looking for people who love to get stuff done, and laugh a bit along the way. We\'re growing rapidly - looking...
-
Site Reliability Engineer
7 days ago
Jersey City, United States Jefferies Full timeJob Description Looking for a SRE to join the Investment Banking group which is responsible for the maintenance and support the bankers including the mobile application built on AWS cloud platform. The candidate should have strong technical, functional, and analytical skills with good experience of automation and supporting critical infrastructure and...
-
Site Reliability Engineer
6 days ago
Jersey City, United States CyberTec Full timeSite Reliability Engineer 1 year Jersey City, NJ $80/hr w2 + $7/hr referral fee to you Client : Banking Skills : Java, AWS, Terraform, Python, Telemetry tooling. As a Senior Lead Site Reliability Engineer at JPMorgan Chase, within our Global Technology Infrastructure team you will be in a hands-on SRE/DevOps role making substantial direct and...
-
Senior SRE Specialist
4 days ago
Foster City, California, United States Replit Full timeThe RoleWe are seeking experienced SREs who are passionate about building and maintaining resilient systems at scale. As a Senior SRE Specialist, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.Your mission...
-
Site Reliability Engineering
5 days ago
Jersey City, United States Goldman Sachs Full timeYour Impact At Goldman Sachs, SRE's Platforms team is responsible for designing, developing, and operating distributed systems which provide observability for Goldman's mission-critical applications and platform services. These systems span across on-premises data centers and multiple public cloud environments. We design and build highly scalable tools...
-
SRE(Site Reliability) Architect
6 days ago
Jersey City, United States The Dignify Solutions LLC Full time10+ years of Development and Operations experience in building and running applications in production that has uptime over 99%. Related experience and/or training; or equivalent combination of education and experience 8+ years of experience as a SRE Architect in running large Reliability & Observability Programs for large, complex infrastructure deployments...
-
Cloud SRE
7 days ago
Jersey City, United States IS3 Solutions Full timeWe are seeking a highly skilled and experienced Cloud SRE to join our team. As a Cloud SRE, you will be responsible for ensuring the reliability, performance, and scalability of our cloud infrastructure. You will work closely with cross-functional teams to implement and manage Observability tools and APIs, automate cloud services, and maintain high standards...
-
Site Reliability Engineering
5 days ago
Jersey City, United States The Goldman Sachs Group Full timeYour Impact At Goldman Sachs, SRE's Platforms team is responsible for designing, developing, and operating distributed systems which provide observability for Goldman's mission-critical applications and platform services. These systems span across on-premises data centers and multiple public cloud environments. We design and build highly scalable tools which...
-
Site Reliability Engineer
2 months ago
Jersey City, United States Syntricate Technologies Full timeJob Title : Site Reliability Engineer (AWS) (SRE)- Location : Jersey city ,NJ -( 3 days WFO, 2 days WFH) Duration : 6 +Months Position Responsibilities: Site Reliability Engineer (AWS) (SRE) Work Location: Jersey city New Jersey Only near by candidate will be considered ( 3 days WFO, 2 days WFH) 1 Zoom / tech interview and 1 onsite interview with...
-
Site Reliability Specialist
8 hours ago
Jersey City, New Jersey, United States Collabera Full timeCollabera seeks a highly skilled Site Reliability Engineer to support our cloud-based infrastructure. The ideal candidate will have expertise in GCP, Azure, and automation tools like Unix shell scripting and Python. Key responsibilities include implementing SRE practices, ensuring system reliability, and resolving technical issues. Required qualifications...
-
Reliability Engineering Expert
3 days ago
Jersey City, New Jersey, United States Collabera Full timeWe are looking for a seasoned Reliability Engineering Expert to join our team at Collabera. In this role, you will be responsible for designing and implementing scalable and reliable cloud-based systems.About the JobThis is a 12-month contract position that requires 2+ years of experience working in a SRE team and supporting cloud applications on GCP, Azure,...
-
Site Reliability Engineer
2 months ago
Redwood City, United States 1872 Consulting Full timeSite Reliability Engineer - 100% RemoteRole Summary:Site Reliability Engineers (SREs) are responsible for working with different developer teams to keep our systems running smoothly. They are a blend of pragmatic operators and software craftspeople that apply excellent problem-solving and communication skills to develop or configure tools that will automate,...
-
Site Reliability Engineer Position
2 days ago
Jersey City, New Jersey, United States Collabera Full timeJob OverviewWe are seeking a skilled Site Reliability Engineer to join our team at Collabera. As a key member of our SRE team, you will be responsible for ensuring the reliability and scalability of our cloud-based applications.Key ResponsibilitiesSupporting cloud applications on GCP, Azure, or OpenShiftWorking in SDLC process and software...
-
SRE Engineer
1 week ago
Sun City, United States Syntricate Technologies Full timePosition- SRE Engineer Duration-Contract Location- Sun City, AZ JD Splunk Prometheus & Grafana PostgreSQL DB / Any Relational DB Postman API Kibana / Open Search Tableau Desktop and Server Python ServiceNow GitHub Any one Cloud - preferred AWS Docker & Kubernetes Performance Testing - like JMeter 6 to 8 years' experience Regards, Pallavi Verma ...
-
Manager, Site Reliability Engineering
4 days ago
Foster City, United States Zoox Full timeZoox is looking for a Site Reliability Engineering Manager who will be responsible for leading and growing Zoox's Core Site Reliability Engineering team, ensuring the reliability, scalability, and performance of our critical infrastructure, cloud platform, and core services that powers company-wide software engineering operations. Zoox is a robotics company...
-
Consultant, Site Reliability Engineer
1 week ago
Ohio City, United States Nationwide Full timeIf you're passionate about innovation and love working in an environment where you can constantly improve and adopt new technologies to drive business results, then Nationwide's Information Technology team could be the place for you! At Nationwide®, "on your side" goes beyond just words. Our customers are at the center of everything we do and we're looking...
-
Reliability Engineer
3 days ago
Jersey City, New Jersey, United States Goldman Sachs Bank AG Full time**Job Summary**We are seeking a highly skilled DevOps engineer to join our SRE Platforms team. As a key member of this team, you will be responsible for designing, developing, and operating distributed systems that provide observability for Goldman Sachs' mission-critical applications.The ideal candidate will have experience with AWS cloud infrastructure,...
-
Manager, Site Reliability Engineering
1 week ago
Redwood City, CA, United States Zoox Full timeFoster City, CASoftware – Developer Infrastructure /Full-time /HybridZoox is looking for a Site Reliability Engineering Manager who will be responsible for leading and growing Zoox's Core Site Reliability Engineering team, ensuring the reliability, scalability, and performance of our critical infrastructure, cloud platform, and core services that powers...
-
Site Reliability Engineering
4 days ago
Jersey City, United States Goldman Sachs Bank AG Full timeSite Reliability Engineering - Software Engineer - Vice President - Jersey CityLocation: Jersey City, New Jersey, United StatesOpportunity OverviewCORPORATE TITLE: Vice PresidentOFFICE LOCATION(S): Jersey CityJOB FUNCTION: Software EngineeringDIVISION: Engineering DivisionSALARY RANGE: USD 150,000 - 250,000Your ImpactAt Goldman Sachs, SRE’s Platforms team...