HPC Sr. Scientific Software Engineer
2 weeks ago
IT@JH Research Computing is seeking a HPC Sr. Scientific Software Engineer who will design, build, and support Johns Hopkins University's high-performance computing and AI research infrastructure. This role integrates elements of both systems and software engineering, ensuring scalable, secure, and reproducible environments for scientific and data-intensive research. The Engineer develops and automates system and application workflows across CPU/GPU clusters, parallel storage, and hybrid cloud platforms. Responsibilities include configuring and optimizing large-scale Linux environments, implementing job scheduling and orchestration frameworks, containerizing applications, and supporting researchers in optimizing performance and reproducibility. Work combines project-based engineering with operational support, requiring both independent problem-solving and close collaboration with the Research Computing team and faculty stakeholders.
Specific Duties & Responsibilities
Software Deployment and Design
-
Develop and refine deployment strategies for scientific software on HPC and AI systems.
-
Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation.
-
Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents).
Performance Optimization
-
Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing.
-
Implement parallel processing, distributed computing, and resource management techniques for efficient job execution.
Integration and Optimization
-
Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads.
-
Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance.
-
Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance.
HPC Scientific Software Support
-
Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities.
-
Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, EasyBuild, Spack, and Lua module files
Collaboration and Mentorship
-
Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges.
-
Mentor junior engineers and foster a culture of continuous learning.
Technical Support and Training Workshops and Troubleshooting
-
Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges.
-
Implement effective solutions to prevent recurrence and improve system reliability
-
Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems.
Learning and Development
-
Stay current with advances in HPC and AI technologies and methodologies.
-
Incorporate new research findings into existing systems to improve performance and capabilities.
Container Orchestration
-
Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications.
-
Oversee the container lifecycle from creation and deployment to scaling and removal.
Documentation and Compliance
-
Create comprehensive documentation for system designs, performance metrics, and project status.
-
Ensure compliance with security and regulatory standards for all HPC and AI systems.
In Addition to the Duties Described Above
-
Design, deploy, and maintain large-scale Linux HPC clusters with CPU/GPU resources, high-speed networks, and distributed storage.
-
Develop and maintain automation frameworks for provisioning, monitoring, and software lifecycle management.
-
Implement and optimize job scheduling, container orchestration, and workflow automation tools to support diverse research workloads.
-
Collaborate with faculty and research teams to parallelize, containerize, and scale computational workflows for multi-GPU and distributed environments.
-
Benchmark and tune application performance across architectures, documenting findings and sharing best practices.
-
Integrate and support AI/ML frameworks, scientific libraries, and workflow engines (Snakemake, Nextflow, Dask, Ray).
-
Ensure system and application reliability through proactive monitoring (Prometheus, Grafana, ELK) and incident response participation.
-
Support reproducibility and FAIR data principles through version-controlled, containerized environments.
-
Contribute to documentation, training materials, and technical guidance to enhance user experience and self-service capabilities.
-
Participate in evaluation and adoption of new technologies to advance performance, efficiency, and sustainability in research computing.
Minimum Qualifications
-
PhD in a quantitative discipline.
-
Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment.
-
Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.
Preferred Qualifications
-
Eight + years of professional experience in high-performance computing, large-scale systems, or research software engineering.
-
Deep proficiency in Linux systems administration, performance tuning, and automation tools (Ansible, Terraform, Jenkins, or similar).
-
Experience with cluster management, workload schedulers (e.g., Slurm), and distributed or parallel file systems (e.g., GPFS, Lustre, WekaFS, Ceph).
-
Strong background in programming or scripting (Python, Bash, C/C++, Go, or Rust).
-
Familiarity with containerization and orchestration technologies used in HPC (Singularity, Apptainer, Docker, Kubernetes).
-
Understanding of high-speed interconnects (InfiniBand, 100/400 Gb Ethernet) and storage/data access patterns for AI and analytics.
-
Experience developing or maintaining CI/CD pipelines and module environments (Lmod/Spack) for research software.
-
Knowledge of GPU computing (CUDA, ROCm), MPI/OpenMP, and AI/ML frameworks.
-
Demonstrated ability to collaborate with researchers on performance optimization, workflow design, and reproducible computing.
Classified Title: HPC Sr. Scientific Software Engineer
Job Posting Title (Working Title): HPC Sr. Scientific Software Engineer (IT@JH Research Computing)
Role/Level/Range: ATP/04/PG
Starting Salary Range: $99,800 - $175,000 Annually (Commensurate w/exp.)
Employee group: Full Time
Schedule: Mon-Fri, 8:30am-5pm
FLSA Status: Exempt
Location: Johns Hopkins Bayview
Department name: IT@JH Research Computing
Personnel area: University Administration
Equal Opportunity Employer
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
-
HPC Scientific Software Engineer
9 hours ago
Baltimore, MD, United States Johns Hopkins University Full timeIT@JH Research Computing is seeking a HPC Scientific Software Engineer who will support faculty, researchers, and students engaged in high-performance and AI-driven research across Johns Hopkins University. The position is responsible for deploying, optimizing, and maintaining scientific software and computational workflows on advanced HPC Systems and...
-
HPC Scientific Software Engineer
2 days ago
Baltimore, MD, United States Johns Hopkins University Full timeIT@JH Research Computing is seeking a HPC Scientific Software Engineer who will support faculty, researchers, and students engaged in high-performance and AI-driven research across Johns Hopkins University. The position is responsible for deploying, optimizing, and maintaining scientific software and computational workflows on advanced HPC Systems and...
-
HPC Scientific Software Engineer
3 days ago
Baltimore, MD, United States Johns Hopkins University Full timeIT@JH Research Computing is seeking a HPC Scientific Software Engineer who will support faculty, researchers, and students engaged in high-performance and AI-driven research across Johns Hopkins University. The position is responsible for deploying, optimizing, and maintaining scientific software and computational workflows on advanced HPC Systems and...
-
Baltimore, MD, United States ClearanceJobs Full timePrincipal Or Sr. Principal Hpc System Administrator At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at the forefront of many technological advancements in our nation's...
-
Baltimore, MD, United States ClearanceJobs Full timePrincipal Or Sr. Principal Hpc System Administrator At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at the forefront of many technological advancements in our nation's...
-
Baltimore, MD, United States ClearanceJobs Full timePrincipal Or Sr. Principal Hpc System Administrator At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come. Our pioneering and inventive spirit has enabled us to be at the forefront of many technological advancements in our nation's...
-
Scientific Software Engineer
4 days ago
Baltimore, MD, United States Johns Hopkins University Full timeThe Johns Hopkins Data Science and AI Institute (DSAI) is focused on revolutionizing discovery by advancing artificial intelligence that evolves collaboratively with human intelligence, combining the strengths of each for the betterment of society and the world in which we live. DSAI will bring together the mathematical, computational, and ethical...
-
Scientific Software Engineer
6 days ago
Baltimore, MD, United States Johns Hopkins University Full timeThe Johns Hopkins Data Science and AI Institute (DSAI) is focused on revolutionizing discovery by advancing artificial intelligence that evolves collaboratively with human intelligence, combining the strengths of each for the betterment of society and the world in which we live. DSAI will bring together the mathematical, computational, and ethical...
-
Scientific Software Engineer
15 hours ago
Baltimore, MD, United States Johns Hopkins University Full timeThe Johns Hopkins Data Science and AI Institute (DSAI) is focused on revolutionizing discovery by advancing artificial intelligence that evolves collaboratively with human intelligence, combining the strengths of each for the betterment of society and the world in which we live. DSAI will bring together the mathematical, computational, and ethical...
-
HPC Sr. Systems Administrator
1 week ago
Baltimore, MD, United States Johns Hopkins University Full timeIT@JH Research Computing is seeking a HPC Sr. Systems Administrator who will support the daily operation and maintenance of Johns Hopkins University's high-performance computing and AI infrastructure. This role ensures the reliability, availability, and security of compute, storage, and network resources used by faculty, students, and research staff....