Senior Engineer – Cognitive Infrastructure
4 days ago
Role: Senior Engineer – Cognitive Infrastructure
Location: Santa Clara, CA /Dallas, TX
Job Description:
This is a key strategic role for working with Nvidia and other key Tech OEMs like Dell, HPE, Cisco etc, internal stakeholders and customers to generate business opportunities in the US and EU region respectively. The person would be working with Sales, delivery and Pre-sales groups to identify, generate and manage opportunities related to AI and AI Factory tracks. This is a quota driven role that spans across on-premises infrastructure, private cloud, platforms and public cloud with reference to AI. This role involves working closely with sales, Pre-sales team, and delivery teams to understand customer needs, create opportunities and position the hybrid cloud AI and AI factory offerings effectively.
- A strategic professional responsible for executing and implementing AI infra and platform solutions
- This role requires deep technical and hands-on experience to deliver AI Infra and AI factory offerings
Responsibilities:
- Senior Engineer – Cognitive Infrastructure (Kubernetes | NVIDIA | MLOps | Gen‑AI) , Total Experience: 12+ Years
- Design & Operate Hybrid Kubernetes clusters on AWS/GCP/Azure and on‑prem (bare‑metal, DGX, Grace Hopper).
- Deploy & manage the NVIDIA GPU Operator (drivers, CUDA, MIG, device plugins) and create GPU‑aware scheduling policies.
- Build production‑grade MLOps pipelines with Kubeflow Pipelines, GitOps (Argo CD/Flux), MLflow/DVC.
- Deploy & operate LLMs using NVIDIA Triton, vLLM, TensorRT‑LLM, or custom FastAPI/GRPC services – include quantization, dynamic batching, safety‑filter integration and per‑tenant quota enforcement.
- Integrate vector databases (Milvus, Pinecone, Qdrant, Weaviate, FAISS) for retrieval‑augmented generation and similarity search.
- Implement observability (Prometheus, Grafana, Loki/ELK, OpenTelemetry) and define SLO/SLI dashboards.
- Enforce security & compliance – RBAC, OPA/Gatekeeper, Vault/KMS, image signing, GDPR/HIPAA guidelines.
- Optimize cost & capacity – GPU quota controls, spot‑instance usage, auto‑scaling, transparent cost reporting.
- Enable teams – turn notebooks into reproducible pipelines, run office‑hours, write docs/tutorials.
- Drive technology roadmap – evaluate new NVIDIA releases, open‑source projects (Kubeflow, LangChain, vLLM, TGI etc.) and lead PoCs.
Required Experience
- 8+ years building & operating production Kubernetes (cloud + on‑prem), Deep knowledge of NVIDIA GPU Operator stack (drivers, CUDA, MIG).
- Strong hands‑on with Kubeflow Pipelines or equivalent MLOps tools, Experience deploying LLMs at scale (quantization, LoRA, inference optimization).
- Proficiency in Python (PyTorch, TensorFlow, HuggingFace, LangChain) and IaC (Helm, Kustomize, Terraform).
- Experience with vector search engines (Milvus, Pinecone, etc.), Solid observability/SRE background (Prometheus, Grafana, OpenTelemetry).
- Security‑first mindset (RBAC, OPA, Vault, image signing).
Nice‑to‑Have :
- Work with NVIDIA DGX / Grace Hopper hardware, Knowledge of OpenShift, k3s, or edge‑focused deployments.
- Experience with LWS, Kserve, or serverless inference, Open‑source contributions (Kubernetes, Kubeflow, Triton, Milvus, vLLM).
- Certifications – CKA, Any Cloud AI/ML Certification.. Nvididia Certifications
Specifics:
- Hands on Job
- Techno-Commercial skills are a must
How You'll Grow
At HCLTech, we offer continuous opportunities for you to find your spark and grow with us. We want you to be happy and satisfied with your role and to really learn what type of work sparks your brilliance the best. Throughout your time with us, we offer transparent communication with senior level employees, learning and career development programs at every level, and opportunities to experiment in different roles or even pivot industries. We believe that you should be in control of your career with unlimited opportunities to find the role that fits you best.
Equality & Opportunity for All
As a company with employees representing 165 nationalities across the globe, we pride ourselves on being an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law.
-
Senior ASIC Front End Infrastructure Engineer
2 weeks ago
Santa Clara, California, United States Lensa Full timeLensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of its direct clients, recruitment ad agencies, and marketing partners. Lensa partners with DirectEmployers to promote this job for NVIDIA. Clicking "Apply Now" or...
-
Santa Clara, California, United States ORAU Full timeOrganizationNational Aeronautics and Space Administration (NASA)Reference Code0114-NPP-MAR26-ARC-BioSciHow To ApplyAll applications must be submitted in ZintellectPlease visit the NASA Postdoctoral Program website for application instructions and requirements: How to Apply | NASA Postdoctoral Program )A complete application to the NASA Postdoctoral Program...
-
Senior Software Engineer
3 days ago
Santa Clara, California, United States Microsoft Full time $119,800 - $234,700Overview Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft's expanding Cloud Infrastructure and responsible for powering Microsoft's "Intelligent Cloud" mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365,...
-
Network Engineer, AI/ML Infrastructure
1 day ago
Santa Clara, California, United States Boson AI Full time $150,000 - $250,000About The RoleWe're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage,...
-
Senior Building Engineer
3 days ago
Santa Clara, California, United States RPO International Full timeNew Journey, a Genesis10 company, is looking for a Senior Building Engineer for a full time position with our client.Compensation: $ $77.00 per hourCandidate needs to be flexible on shift options - multiple shift options availableLooking for a Senior Building Engineer who is responsible for performing complex preventive and corrective maintenance,...
-
Senior Azure DevSecOps Engineer
1 day ago
Santa Clara, California, United States Jobs via Dice Full timeDice is the leading career destination for tech experts at every stage of their careers. Our client, Triune Infomatics Inc, is seeking the following. Apply via Dice todayRole: Senior Azure DevSecOps EngineerLocation: Santa Clara, CA, (5 days Onsite)Duration: 6 Months + extensions >> ongoing (Potential Fulltime hire)Overview:We are seeking aSenior DevOps /...
-
Senior Software Engineer, Authorization
1 day ago
Santa Clara, California, United States ExecutivePlacements Full timeSenior Software Engineer, Authorization page is loaded## Senior Software Engineer, Authorizationlocations:US, CA, Santa ClaraUS, Remotetime type:Full Timeposted OnPosted Yesterdayjob requisition id:JR2009507At NVIDIA, our Cloud Engineering Services organization is at the heart of ensuring that our massive scale operates with zero friction We are responsible...
-
Senior Software Test Engineer
3 days ago
Santa Clara, California, United States Cobot Full timeJoin us to reimagine the future of human-robot interaction.Collaborative Robotics is a team of innovators and builders redefining the future of human-robot interaction. We are working to realize a world where robots are a trusted extension of your surroundings. They work, adapt, and react around you. Not the other way around.Are you excited by the challenge...
-
Senior DevOps Engineer, Enterprise Systems
2 days ago
Santa Clara, California, United States Sustainable Talent Full timeSustainable Talent is looking for aSenior DevOps Engineer, Enterprise Systemsto support our client's IPP's (Infrastructure, Planning and Process) Team. This is a W-2 full-timecontractbased inSanta Clara, CA.We offer competitivepay $80-$120/hrbased on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company...
-
Santa Clara, California, United States AMD Full timeWHAT YOU DO AT AMD CHANGES EVERYTHINGAt AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...