Senior HPC Cluster Systems Administrator
3 weeks ago
Berkeley Lab's (LBNL) Information Technology Division (IT) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters. This role will supply extensive expertise in HPC infrastructure and deliver advanced Linux solutions to further scientific endeavors at Berkeley Lab. The mission of Scientific Computing under ScienceIT is to enable groundbreaking fundamental research globally by providing essential computing tools, networks, and expertise. The position has an anticipated start date of January 5, 2026. Why join Berkeley Lab? Exceptional health and retirement benefits, including pension or 401(k)-style plans. Opportunities to grow in your career see our Tuition Assistance Program. A culture of belonging we are invested in our teams. Vacation, sick time, and an annual winter holiday shutdown. Parental bonding leave (for both mothers and fathers). Pet insurance. What You Will Do Perform Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware, and customization of user group working environments, troubleshooting, network monitoring, and crash recovery. Design, deploy, and manage scalable applications using Kubernetes, ensuring availability, performance, and readiness of the infrastructure. Automate deployment, scaling, and management of containerized applications, and collaborate with DevOps and development teams to streamline CI/CD pipelines. Design, deploy, and manage the global storage platform to ensure high performance, massive scalability, reliability, and future-proof solutions. Support storage technologies such as Lustre, VAST, and networks. Resolve I/O issues related to business applications, diagnosing and resolving complex storage, Linux, and networking challenges in a fast-paced environment. Research new storage management technologies, techniques, and provide recommendations. Participate in developing system administration, security, and network policies, documentation, and tools oriented towards efficient systems management. Participate in cluster support to staff and researchers, including initial installation, integration, and ongoing maintenance of Linux HPC cluster systems (including travel to remote sites as needed). Co?lead technical efforts with other senior system administrators in areas such as job schedulers, high-performance interconnects, parallel file systems, cybersecurity, cluster management, container orchestration, VM infrastructure, networking, performance tuning, or data center planning. Co?lead small to medium?size group projects to implement and deploy new computing technologies and associated services to the research community. What We Are Looking For A Bachelors Degree (or equivalent) in Computer Science, Engineering, or a related discipline and a minimum of 12 years of experience in Linux system administration within a large distributed computing environment. Demonstrated ability to manage large-scale, performance-critical environments, including capacity planning, scaling, and optimization. Significant experience deploying, scaling, and managing Kubernetes clusters with a strong understanding of its architecture (pods, deployments, services, ingress) and proficiency with CI/CD tools like Jenkins or GitLab CI. Experience with Red Hat derivatives (CentOS, Scientific Linux, Rocky Linux), Debian, Ubuntu, and large-scale system and configuration management tools (Kickstart, Ansible, Puppet, Chef, Warewulf). Expertise in supporting standard services (NFS, LDAP, SMB, MySQL, Apache/Nginx). Strong HPC expertise covering Linux, job schedulers, high-performance interconnects, parallel file systems, cybersecurity, container orchestration, cluster management, VM infrastructure, networking, performance tuning, scientific application support, and data center planning. Proficiency in Python and Bash for building, optimizing, and debugging scientific codes (C, C++, Fortran, Java), including experience with compilers (GCC, Intel), debuggers, Makefiles, and version control (git, Subversion). Expertise in storage system design and optimization (Lustre, S3, VAST, Weka, Ceph, DDN), including deep knowledge of the storage stack and performance tuning. Excellent oral and written communication skills for presenting technical data, reports, and projects to varied audiences. Strong interpersonal skills, including research facilitation and project management in multidisciplinary teams. Desired Qualifications An Advanced Degree (or equivalent) in Computer Science, Engineering, or a related discipline. Experience with software engineering and/or software development. Familiarity with Kubernetes-related tools such as Helm, Istio, and Prometheus. Demonstrated experience supporting research at a National Lab and/or in an academic or research environment. Additional Information Application Deadline: For full consideration, please apply with a resume and a cover letter describing your interest by December 5, 2025. Appointment type: Full?time career appointment, exempt (monthly paid) from overtime pay. Salary: $178,644 $218,364 annually; commensurate with qualifications and experience. Background Check: May be subject to a background check; convictions will be evaluated progressively. Work Modality: Eligible for a hybrid schedule combining teleworking and on?site work at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720. Hybrid employees must reside within 150 miles of Berkeley Lab. Starting May 7, REAL ID or other acceptable ID is required. Relocation: Eligible for relocation assistance. Work Authorization: Applicants must be legally authorized to work in the United States. No visa sponsorship. Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. All qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or any other protected category under State and Federal law. Berkeley Lab is a University of California employer. The policy of the University of California is to undertake affirmative action and anti?discrimination efforts, consistent with its obligations as a Federal and State contractor. Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose any administrative or judicial decisions within the last seven years determining misconduct, current investigations, investigations during which they left a position, or appeals filed with a previous employer. #J-18808-Ljbffr
-
Senior HPC Cluster Systems Administrator
3 weeks ago
Berkeley, United States Executive Placements, LLC. Full timeBerkeley Lab's (LBNL) Information Technology Division (IT) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team! In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters....
-
Senior HPC Cluster Systems Administrator
2 weeks ago
Berkeley, United States ExecutivePlacements.com Full timeBerkeley Lab's (LBNL) Information Technology Division (IT) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team! In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters....
-
Senior HPC
2 weeks ago
Berkeley, United States ExecutivePlacements.com Full timeA leading research institution seeks a Senior HPC Cluster Systems Administrator to support its research community by managing Linux-based resources and high-performance computing clusters. The ideal candidate will have over 12 years of experience with Linux system administration and strong skills in Kubernetes and performance tuning. This position offers a...
-
Senior Cluster Site Reliability Engineer
2 weeks ago
Berkeley, California, United States The Voleon Group Full time $205,000 - $235,000 per yearVoleon is a technology company that applies state-of-the-art machine learning techniques to real-world problems in finance. For nearly two decades, we have led our industry and worked at the frontier of applying machine learning to investment management. We have become a multibillion-dollar asset manager, and we have ambitious goals for the future.As a...
-
HPC Linux Systems Administrator
2 weeks ago
Berkeley, United States Jobot Full timeThis Jobot Job is hosted by: Kurt HolzmullerAre you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.Salary: $120,000 - $180,000 per yearA bit about us:We are a leading global tech company on the cutting edge of cloud and data solutions. With a strong emphasis on innovation, inclusivity, and work-life balance, we foster a...
-
HPC Linux Systems Administrator
7 days ago
Berkeley, United States Jobot Full timeThis Jobot Job is hosted by: Kurt Holzmuller Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your resume. Salary: $120,000 - $180,000 per year A bit about us: We are a leading global tech company on the cutting edge of cloud and data solutions. With a strong emphasis on innovation, inclusivity, and work-life balance, we foster a...
-
HPC Linux Systems Administrator
2 weeks ago
Berkeley, CA, United States Jobot Full timeA bit about us: We are a leading global tech company on the cutting edge of cloud and data solutions. With a strong emphasis on innovation, inclusivity, and work-life balance, we foster a dynamic environment for career growth. With a global customer base including all sectors, from Fortune 500 to up-and-coming startups, we over diverse product lines...
-
HPC Linux Systems Administrator
1 week ago
Berkeley, United States Jobot Full timeThis Jobot Job is hosted by: Kurt HolzmullerFind out exactly what skills, experience, and qualifications you will need to succeed in this role before applying below.Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.Salary: $120,000 - $180,000 per yearA bit about us:We are a leading global tech company on the cutting...
-
HPC Linux Systems Administrator
2 weeks ago
Berkeley, United States Jobot Full timeThis Jobot Job is hosted by: Kurt HolzmullerFind out exactly what skills, experience, and qualifications you will need to succeed in this role before applying below.Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.Salary: $120,000 - $180,000 per yearA bit about us:We are a leading global tech company on the cutting...
-
HPC Linux Systems Administrator
1 week ago
Berkeley, United States Jobot Full timeJob DescriptionJob DescriptionThis Jobot Job is hosted by: Kurt HolzmullerAre you a fit? Easy Apply now by clicking the "Apply Now" buttonand sending us your resume.Salary: $120,000 - $180,000 per yearA bit about us:We are a leading global tech company on the cutting edge of cloud and data solutions. With a strong emphasis on innovation, inclusivity, and...