Senior HPC DevOps Engineer
3 weeks ago
divh2Senior Hpc Devops Engineer/h2pPeraton Labs is seeking a poly cleared Senior HPC DevOps Engineer to own the operations and automation lifecycle for an existing HPC/AI compute cluster (Linux). You will work closely with Peraton team members, as well as directly with our Maryland-based customer, in a fast-paced environment at a customer site. In this role you will codify repeatable operations in Ansible and drive execution through an enterprise automation controller to enforce desired state, detect drift, accelerate node onboarding, and streamline incident response via runbook automation integrated with monitoring and ITSM. This position requires full-time on-site work at a customer site near College Park, MD./ppKey responsibilities may include:/pulliAutomation ownership: Own and manage automation workflows, including job templates, inventories, credentials, RBAC configurations, execution environments, and promotion across environments./liliDesired-state and drift detection: Enforce desired state across cluster services via code-driven configuration; implement drift detection and alert on deviations; reconcile runtime state vs configured state./liliCompute node onboarding (Bare-metal/VM): Build and maintain an automated node bootstrap workflow that installs/configures the OS, applies security and performance baselines, enrolls nodes into the scheduler and shared storage ecosystem, validates hardware and service readiness (CPU, network, accelerator, storage mounts), and reports pass/fail results./liliPatch vulnerability response: Implement rolling maintenance and patch automation to meet defined vulnerability response SLAs.
Maintain version-controlled container build definitions and integrate image scanning into the build/release lifecycle./liliLogging observability: Ensure automation and operational workflows emit auditable logs to centralized analytics and integrate with metrics/alerting to enable reliable incident response, proactive detection, and safe auto-remediation./liliIncident/problem management: Automate responses to common incidents (hung nodes, storage performance alarms, image vulnerabilities, hardware failures) leveraging out-of-band hardware management interfaces and standardized runbooks./liliDocs-as-code: Keep runbooks and operational documentation versioned alongside automation and publish operator guidance to the orgs documentation platform./li/ulpQualifications:/pulli12+ years of experience and a BS in computer science, IT, or related technical field, MS and 10 years of experience, or a Ph. D. with 8 years of experience. Four years of additional experience is required in lieu of a Bachelors degree for a total of 16 years of experience./lili7+ years in Linux systems / SRE / DevOps, including production cluster operations in an HPC or large-scale compute environment./lili3+ years of experience building and operating Ansible automation at scale (roles/collections, idempotency, inventories, secrets)./liliStrong Linux hardening compliance fundamentals (SELinux/AppArmor, SSH key automation, baseline config management)./liliDemonstrated experience operating or automating clustered compute environments (HPC, large Linux farms, or similar)./liliHands-on experience with container tooling in Linux environments, including image lifecycle/versioning./liliFamiliarity with incident response and runbook-driven operations; ability to automate common remediations./liliStrong Git workflow and documentation practices./liliMust hold at least one active/current technical certification from the following:/liulliSystems engineering (e.g., INCOSE)/liliInformation security (e.g., CISSP)/liliNetworking (e.g., CCNA)/liliSystem Administration (e.g., RHCE, MCSE)/liliVirtualization (e.g., VCP)/liliIT systems management (e.g., ITIL)/liliProject management (e.g., PMP, Agile)/li/ulliThis position requires an active/current TS/SCI w/ Polygraph./li/ulpPreferred qualifications:/pulliBare-metal provisioning experience (PXE/iPXE, Kickstart/Preseed, Foreman/MAAS) and hardware OOB management./liliCI/testing for automation and promotion pipelines for playbooks/liliExperience with tuned performance profiles, HPC performance troubleshooting, and GPU node health validation./li/ulpTarget Salary Range: $146,000 - $234,000. This represents the typical salary range for this position.
Salary
is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individuals experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay.
Benefits
Statement: Peraton offers eligible employees a variety of benefits including medical, dental, vision, life, health savings account, short/long term disability, EAP, parental leave, 401(k), paid time off (PTO) for vacation, and company paid holidays. A full listing of available benefits can be viewed at https://www.careers.peraton.com/benefits. Application Duration Statement: The application period for the job is estimated to be 30 days from the job posting date. However, this timeline may be shortened or extended depending on business needs and the availability of qualified candidates.
EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law./p/div
-
Senior HPC DevOps Engineer
2 days ago
College Park, United States Peraton Full timeSenior HPC DevOps Engineer Job Locations: US-MD-College Park Requisition ID: Position Category: Engineering Clearance: Top Secret/SCI w/Poly Responsibilities Peraton Labs is seeking a poly cleared Senior HPC DevOps Engineer to own the operations and automation lifecycle for an existing HPC/AI compute cluster (Linux). You will work closely with Peraton team...
-
College Park, United States Peraton Full timeA prominent national security company is seeking a Senior HPC DevOps Engineer based in College Park, Maryland. The successful candidate will manage automation workflows for an HPC/AI compute cluster, requiring strong expertise in Ansible and Linux systems. With a focus on operational efficiency and incident automation, the role mandates a TS/SCI clearance....
-
Senior Staff Software Engineer
2 days ago
College Park, United States IonQ Full timeSenior Staff Software Engineer - HPC Integration Bothell, Washington, United States; College Park, Maryland, United States IonQ is developing the world's most powerful full‑stack quantum computer based on trapped‑ion technology. We are pushing past the limits of classical physics and current supercomputing technology to unlock a new era of computing....
-
Process Engineer
1 week ago
Deer Park, TX, United States HPC Industrial Full timeHPC-Industrial, Powered by Clean Harbors is looking for a Process Engineer to join their safety conscious team in TX ! The Process Engineer provides technical leadership for identified engineering projects. Coordinates with engineering, operations and maintenance to implement engineering and process improvements and gain efficiencies. Provides technical...
-
Process Engineer
2 weeks ago
Deer Park, TX, United States HPC Industrial Full timeHPC-Industrial , Powered by Clean Harbors is looking for a Process Engineer to join their safety conscious team in TX ! The Process Engineer provides technical leadership for identified engineering projects. Coordinates with engineering, operations and maintenance to implement engineering and process improvements and gain efficiencies. Provides technical...
-
Process Engineer
1 week ago
Deer Park, TX, United States HPC Industrial Full timeHPC-Industrial , Powered by Clean Harbors is looking for a Process Engineer to join their safety conscious team in TX ! The Process Engineer provides technical leadership for identified engineering projects. Coordinates with engineering, operations and maintenance to implement engineering and process improvements and gain efficiencies. Provides technical...
-
Senior DevOps Engineer
2 weeks ago
Lexington Park, United States Spalding, a Saalex Company in Full timeOverview Senior DevOps Engineer (Finance) – Spalding, a Saalex Company is seeking a DevOps Engineer, SR in Patuxent River, MD . Spalding, a Saalex Company is a professional services company delivering cutting-edge solutions to the Department of Defense since 2001. Our expert-level solutions include software development, information technology, program...
-
Senior DevOps Engineer
2 days ago
Lexington Park, United States Saalex Corp. Full timeSpalding, a Saalex Company is seeking a DevOps Engineer, SR in Patuxent River, MD. Spalding, a Saalex Company is a professional services company delivering cutting‑edge solutions to the Department of Defense since 2001. Our expert-level solutions include software development, information technology, program management, financial management and business...
-
Florham Park, United States The Dignify Solutions, LLC Full timeAzure DevOps Engineer with GIT/DevOps Integration - Initial Remote 1 month ago Be among the first 25 applicants Responsibilities Experience reading existing build files and recreate them using devops and Jenkins Experience with Azure Devops, Jenkins. Create new pipelines on Azure. Create docker and YAML files. Primary Skill: Functional Testing, Lead...
-
Senior DevOps Engineer
4 weeks ago
Menlo Park, United States BillionToOne Full timeReady to redefine what's possible in molecular diagnostics? Join a team of brilliant, passionate innovators who wake up every day determined to transform healthcare. At BillionToOne, we've built something extraordinary—a culture where transparency fuels trust, collaboration drives breakthroughs, and every voice matters in our mission to make...