GPU Compute Infrastructure Specialist

4 weeks ago


New York, New York, United States Aethir Full time

We are seeking a highly skilled and motivated Infrastructure Operations Engineer to join our dynamic team at Aethir. As an integral member of the InfraOps team, you will play a key role in managing and optimizing our GPU-based compute infrastructure, ensuring maximum performance, scalability, and reliability.

Key Responsibilities:

  • Deploy, configure, and maintain GPU-based compute infrastructure, including servers, storage, networking, and associated software stack.
  • Implement robust monitoring and alerting systems to proactively identify performance bottlenecks, resource constraints, and potential failures.
  • Develop automation scripts and tools to streamline deployment, configuration, and management of infrastructure components.
  • Implement infrastructure as code (IaC) principles to enable rapid provisioning and scaling.
  • Ensure security best practices are implemented and enforced to safeguard sensitive data and ensure compliance with relevant regulations and industry standards.
  • Provide tier-3 support for infrastructure-related issues, investigating root causes and implementing timely resolutions.

Requirements:

  • Experience in infrastructure operations, preferably in a DevOps or SRE role or Sales Engineering or Solution Architect role - focused on GPU compute.
  • Proficiency in managing GPU-based compute infrastructure, including NVIDIA GPUs and CUDA programming.
  • Strong expertise in Linux system administration and shell scripting (e.g., Bash, Python).
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet) and version control systems (e.g., Git).
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Solid understanding of networking concepts, protocols, and troubleshooting techniques.
  • Excellent analytical and problem-solving skills, with a proactive and results-oriented mindset.
  • Effective communication skills and the ability to collaborate effectively with cross-functional teams.

We operate in English, but speaking Mandarin as well is a big bonus as we have engineering teams in China and Southeast Asia.


  • Platform Engineer

    4 weeks ago


    New York, New York, United States Sapient Full time

    At Publicis Sapient, we are seeking a highly skilled Platform Engineer - AI & GPU Services to join our team. As a key member of our engineering team, you will be responsible for implementing and maintaining AI/ML platforms and GPU resource management across cloud (GCP) and on-premise infrastructure.Key Responsibilities:Architect, build, and maintain AI/ML...

  • Platform Engineer

    3 weeks ago


    New York, New York, United States Sapient Full time

    At Publicis Sapient, we are committed to helping our clients thrive in the digital age. As a Platform Engineer - AI & GPU Services, you will play a critical role in implementing and maintaining AI/ML platforms and GPU resource management across cloud and on-premise infrastructure.This role combines expertise in cloud services, AI/ML technologies, and...

  • Platform Engineer

    4 weeks ago


    New York, New York, United States Publicis Groupe Full time

    Job SummaryPublicis Sapient is seeking a skilled Platform Engineer to join our team. As a Platform Engineer, you will be responsible for implementing and maintaining AI/ML platforms and GPU resource management across cloud (GCP) and on-premise infrastructure. This role combines expertise in cloud services, AI/ML technologies, and infrastructure automation to...


  • New York, New York, United States NVIDIA Full time

    NVIDIA is a leader in computer graphics, artificial intelligence, and accelerated computing. We are at the forefront of research and engineering around the greatest advances in technology. Our history of innovation drives us to solve the world's hardest problems.We are looking for a Senior Cloud Infrastructure/DevOps Solutions Architect to join our NVIDIA...


  • New York, New York, United States NVIDIA Full time

    Job Title: Senior Cloud Infrastructure and DevOps Solutions ArchitectNVIDIA is a world leader in computer graphics, artificial intelligence, and accelerated computing. Our company has been at the forefront of research and engineering around the greatest advances in technology for over 25 years.About the RoleWe are seeking a Senior Cloud Infrastructure and...


  • New York, New York, United States Mizuho Americas Full time

    About the Role:Mizuho Americas is seeking a highly skilled Infrastructure Specialist to join our team. As an Infrastructure Specialist, you will be responsible for designing, implementing, and managing the backend infrastructure to support our business operations.Key Responsibilities:Collaborate with cross-functional teams to design and implement...


  • New York, New York, United States Batch Freight Full time

    Job Title: IT Infrastructure SpecialistAt Batch Freight, we are seeking a skilled IT Infrastructure Specialist to support our technology infrastructure. The ideal candidate will have a strong background in IT support, excellent communication skills, and a passion for technology.Key Responsibilities: Provide technical support for hardware, software, and...

  • DevOps Engineer

    4 weeks ago


    New York, New York, United States SpreeAI Corporation Full time

    Job Title: DevOps Engineer - AI Technology SpecialistAbout the Role:We are seeking a talented DevOps Engineer to join our dynamic team at SpreeAI Corporation. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining scalable tools and processes for quality and secure software development life cycle.Key...


  • New York, New York, United States Dewberry Engineers Incorporated Full time

    Job Title: Infrastructure Resilience SpecialistWe are seeking an experienced Infrastructure Resilience Specialist to join our team at Dewberry Engineers Incorporated. The successful candidate will provide technical support to ongoing and future resilience engineering and planning efforts related to transportation, water, energy, and community facility...


  • New York, New York, United States CoreWeave Full time

    About the RoleWe are seeking a seasoned FP&A professional to join our strategic finance team as a Cloud Infrastructure Finance Strategist. This key role will support the Head of FP&A, CFO, CPO, and CSO in preparing, analyzing, and reporting financial and operational information to drive performance and promote proactive business planning. The ideal candidate...


  • New York, New York, United States Open Systems Technologies Full time

    Cloud Infrastructure SpecialistWe are seeking a highly skilled Cloud Infrastructure Specialist to join our team in New York, NY. The ideal candidate will have a strong background in designing, implementing, and managing AWS cloud infrastructure using Terraform and CloudFormation.Key Responsibilities:Design and implement scalable and secure cloud...


  • New York, New York, United States Parallel Partners Full time

    Job SummaryParallel Partners is seeking a highly skilled Cloud Infrastructure Specialist to join our team. As a Cloud Infrastructure Specialist, you will be responsible for managing cloud infrastructure, providing resource allocation, system upgrades, user access control, and performing deep dives on complex system issues. You will also be responsible for...


  • New York, New York, United States Luxoft Full time

    **Key Responsibilities:**As a Cloud Infrastructure Specialist at Luxoft, you will be responsible for designing and implementing scalable cloud infrastructure solutions using Docker, Kubernetes, and Jenkins. Your expertise in uDeploy will be crucial in automating software delivery processes. Additionally, you must have a valid work permit and be willing to...


  • New York, New York, United States Elliot Partnership Full time

    OverviewWe are seeking a highly skilled Cloud Infrastructure Specialist to lead on-prem to cloud migrations at scale. This is a hands-on technical role within an engineering team responsible for high-performance trading and research infrastructure.Key ResponsibilitiesDesign and implement cloud infrastructure solutions, including operating system platforms,...


  • New York, New York, United States CoreWeave Full time

    About This RoleAt CoreWeave, we're revolutionizing the cloud infrastructure industry by putting bleeding-edge GPU technology on top of the industry's fastest and most adaptable infrastructure. We're seeking a talented Solutions Architect to join our team and help shape the future of cloud computing.As a Solutions Architect at CoreWeave, you'll play a vital...


  • New York, New York, United States NVIDIA Full time

    NVIDIA is the world leader in computer graphics, artificial intelligence, and accelerated computing. For over 25 years, we have been at the forefront of research and engineering around the greatest advances in technology.Our history of innovation drives us to solve the world's hardest problems.NVIDIA is looking for a Senior Industry SA/Customer...


  • New York, New York, United States Syntricate Technologies Full time

    Job Title: Cloud Infrastructure SpecialistAbout the Role:At Syntricate Technologies, we are seeking a skilled Cloud Infrastructure Specialist to join our team. The ideal candidate will have experience with AWS, SageMaker, and Python, and will be responsible for operationalizing the SageMaker platform and providing support for requests such as onboarding,...


  • New York, New York, United States TNN Capital Full time

    We are seeking a highly skilled Cloud Infrastructure Specialist to join our team at TNN Capital. As a key member of our IT department, you will be responsible for designing, implementing, and managing our cloud infrastructure to support our business objectives.About the RoleThis is a unique opportunity to work with a leading wealth management firm, providing...


  • New York, New York, United States Insight Global Full time

    Job Summary:The Infrastructure Specialist is responsible for designing, implementing, and managing the backend infrastructure. This includes compute, storage, and virtualization technologies. The ideal candidate will work closely with internal and external teams to advise and design infrastructure solutions for business problems and demands.Key...


  • New York, New York, United States Mobot Full time

    Job DescriptionThe RoleWe are seeking a skilled IT Specialist to join our team at Mobot. As our IT Specialist, you will play a vital role in the smooth day-to-day operations of our robot and mobile testing facility. You will triage hardware and software issues happening on-site, resolve them where possible, and otherwise document and route the issues to the...