Site Reliability Engineer
3 weeks ago
OverviewSite Reliability Engineer (SRE) role at Bay Systems Consulting. Location: Berkeley, CA (Onsite at Lawrence Berkeley National Laboratory). Employment Type: 5–6 Month Contract (Extension Possible). Pay Rate: $80/hr + Full Benefits (Medical, Dental, Vision, 401k). Employer: Bay Systems Consulting.About the Role: Bay Systems Consulting is seeking a Site Reliability Engineer (SRE) to support the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. NERSC’s mission is to accelerate scientific discovery through high-performance computing and data analysis for the U.S. Department of Energy’s Office of Science. As an SRE in the Operations Group, you will help ensure the accessibility, reliability, security, and availability of world-class HPC systems that support over 10,000 scientific users. You will work with state-of-the-art monitoring systems (such as OMNI), respond to real-time alerts, automate processes, and improve reliability for mission-critical infrastructure. ResponsibilitiesMonitor and support NERSC’s HPC facility as part of a 24x7 operations team (including some overnight “OWL” shifts).Respond to alerts from computer systems, storage, networks, and data center infrastructure by triaging issues or engaging on-call staff.Develop automation to handle routine service conditions and improve system efficiency.Maintain and enhance monitoring tools, pipelines, and alerting systems.Create and maintain scripts and software to integrate HPC system APIs into monitoring pipelines.Collaborate with cross-functional NERSC groups to coordinate maintenance activities and manage diagnostic software.Document and track outages, incidents, and maintenance in the ticketing system.Troubleshoot and resolve diverse technical issues involving HPC, networking, and infrastructure. QualificationsRequired (Level 2): Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience).5+ years of related experience (or 3+ years with a Master’s).Strong Linux/Unix administration and command-line skills.Proficiency with programming/scripting languages (Python, C/C++, Perl, Java, or similar).Experience supporting highly available systems in large-scale data centers.Familiarity with networking, firewalls, ACLs, and network protocols.Knowledge of automation and monitoring tools (e.g., Kubernetes, Prometheus, Alertmanager).Strong troubleshooting and communication skills.Preferred (Level 3): 8+ years of relevant experience (or 6+ with a Master’s).Expertise in software development and monitoring pipeline design.Experience leading technical projects and mentoring junior staff.Advanced knowledge of data center management technologies. #J-18808-Ljbffr
-
Site Reliability Engineer
3 weeks ago
Berkeley, United States DevOps projects Full timeSite Reliability Engineer About the Company LMArena is an engineering-first startup redefining how the world evaluates large language models. Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attracts over one million monthly users—pairwise comparing leading models from OpenAI, Google, Anthropic, and more—to...
-
Site Reliability Engineer
18 hours ago
Berkeley Heights, NJ, United States Nexus Staff Inc. Full timeJob Description CTH 6-12 months Client Fiserv MUST be a USC or GCH Must be local to Berkeley Heights, NJ Description: What does a successful Site Reliability Engineer do at Fiserv? A successful Site Reliability Engineer at Fiserv blends software engineering principles with operational discipline to create high-performing, reliable software systems. They...
-
Site Reliability Engineer
7 days ago
Berkeley Heights, NJ, United States Nexus Staff Inc. Full timeJob Description CTH 6-12 months Client Fiserv MUST be a USC or GCH Must be local to Berkeley Heights, NJ Description: What does a successful Site Reliability Engineer do at Fiserv? A successful Site Reliability Engineer at Fiserv blends software engineering principles with operational discipline to create high-performing, reliable software systems. They...
-
Site Reliability Engineer
7 days ago
Berkeley Heights, NJ, United States Nexus Staff Inc. Full timeJob Description CTH 6-12 months Client Fiserv MUST be a USC or GCH Must be local to Berkeley Heights, NJ Description: What does a successful Site Reliability Engineer do at Fiserv? A successful Site Reliability Engineer at Fiserv blends software engineering principles with operational discipline to create high-performing, reliable software systems. They...
-
Site Reliability Engineer
2 days ago
Berkeley Heights, NJ, United States Nexus Staff Inc. Full timeJob Description CTH 6-12 months Client Fiserv MUST be a USC or GCH Must be local to Berkeley Heights, NJ Description: What does a successful Site Reliability Engineer do at Fiserv? A successful Site Reliability Engineer at Fiserv blends software engineering principles with operational discipline to create high-performing, reliable software systems. They...
-
Site Reliability Engineer
7 hours ago
Berkeley Heights, NJ, United States Nexus Staff Inc. Full timeJob Description CTH 6-12 months Client Fiserv MUST be a USC or GCH Must be local to Berkeley Heights, NJ Description: What does a successful Site Reliability Engineer do at Fiserv? A successful Site Reliability Engineer at Fiserv blends software engineering principles with operational discipline to create high-performing, reliable software systems. They...
-
Site Reliability Engineer
3 weeks ago
Berkeley, CA, United States DevOps projects Full timeSite Reliability Engineer Have you got what it takes to succeed The following information should be read carefully by all candidates. About the Company LMArena is an engineering-first startup redefining how the world evaluates large language models. Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attracts over...
-
Site Reliability Engineer
2 weeks ago
Berkeley Heights, NJ, United States CloudIngest Full time $120,000 - $180,000 per yearJob Title: Site Reliability Engineer (SRE) – Observability & OpenTelemetryLocation: Berkeley heights NJ3 roles - 1 Senior W2 ) and 2 mid level (7-9 years - 60 W2 )Role OverviewWe're seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability platforms (Splunk, Datadog, New Relic) to enhance system reliability,...
-
Site Reliability Engineer
4 weeks ago
Berkeley Heights, United States Experis Full timeOur client, a leader in the technology sector, is seeking a Site Reliability Engineer to join their team. As a Site Reliability Engineer, you will be part of the Digital group supporting the Card Services organization. The ideal candidate will have strong problem-solving skills, excellent communication abilities, and a collaborative mindset which will align...
-
Site Reliability Engineer
4 days ago
Berkeley Heights, NJ, United States Matlen Silver Full timeTitle: Cloud Infrastructure Site Reliability EngineerLocations: Alpharetta, Georgia OR Berkeley Height, New JerseyDuration: 1 year contract to hireEnvironment: ONSITE, non negotiablePay: $65-$70/hour W2 ONLY (No C2C)** DUE TO CLIENT REQUIREMENTS, US CITIZEN OR GC HOLDERS ONLY **Requirements:8+ years of professional experience in Cloud Infrastructure, Site...