AI Infrastructure Engineer
4 weeks ago
Job Title: AI Infrastructure Engineer - Scalable Solutions
Xpeng Motors is a leading smart electric vehicle company that designs, develops, and manufactures smart EVs with advanced Internet, AI, and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers.
We are seeking a talented AI/ML Infrastructure Engineer to enhance the efficiency of our skilled ML teams. In this role, you will identify and resolve infrastructure gaps to provide reliable, efficient, and scalable solutions.
Key Responsibilities:
- Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
- Develop advanced AI/ML infrastructure solutions that enhance the efficiency of our skilled ML teams
- Design and implement solutions for critical areas, including distributed storage systems, scheduling systems, high availability capabilities, and core reliability issues within our large-scale GPU clusters
- Monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization
- Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks
- Work with various teams, including ML developers, data engineers, and DevOps professionals, to create a cohesive and integrated AI/ML infrastructure ecosystem
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related technical field
- 5-8+ years of experience in software engineering, with a strong background in developing and managing large-scale distributed systems, ideally within the AI/ML infrastructure domain
- Proficiency in programming languages such as Python, Go, or C++, with knowledge of cloud computing platforms like AWS, Azure, etc.
- Strong communication and collaboration abilities, effective in working with diverse teams and individuals
Preferred Requirements:
- In-depth understanding of AI/ML workflows, including model training, data processing, and inference pipelines
- Practical experience with containerization technologies (i.e., Docker, Kubernetes), automation tools (i.e., Ansible, Terraform), and monitoring solutions (i.e., Prometheus, Grafana)
- Exceptional problem-solving skills, capable of analyzing complex systems, identifying bottlenecks, and implementing scalable solutions
- A passion for continuous learning and staying abreast of new technologies and best practices in the AI/ML infrastructure space
What We Offer:
- A fun, supportive, and engaging environment
- Opportunity to make significant impact on the transportation revolution by advancing autonomous driving
- Opportunity to work on cutting-edge technologies with top talent in the field
- Competitive compensation package
- Snacks, lunches, and fun activities
The base salary range for this full-time position is $180,000-$300,000, in addition to bonus, equity, and benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status, or marital status or any other prescribed category set forth in federal or state regulations.
-
AI Infrastructure Engineer
4 weeks ago
Santa Clara, California, United States Nvidia Full timeNVIDIA is seeking a highly skilled and experienced engineer to join our growing team. The successful candidate will work at the intersection of GPU chip design and AI, responsible for the design, development, and maintenance of the infrastructure around Nvidia's internal large language model aimed at facilitating chip design.Key Responsibilities:Develop and...
-
Santa Clara, California, United States XPENG Motors Full timeJob Title: Senior Staff AI Infrastructure SREXpeng Motors is a leading smart electric vehicle company that designs, develops, and manufactures cutting-edge EVs with advanced Internet, AI, and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers.About the...
-
Santa Clara, California, United States Oracle Full timeAbout the RoleWe are seeking a highly motivated and experienced Senior Principal Product Manager to join our AI Infrastructure organization at Oracle Cloud Infrastructure (OCI). This person will be responsible for defining and delivering on a software product roadmap across the AI Infrastructure stack, from bare metal and cluster management to AI services...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionNVIDIA is seeking a Senior Site Reliability Engineer to join our AI Efficiency Team. As a key member of this team, you will contribute to the development of infrastructure that powers our innovative AI research.The AI Efficiency Team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data...
-
Santa Clara, California, United States Oracle Full timeLead the Future of AI Workload OrchestrationOracle is seeking a highly experienced Senior Director of Engineering to lead the development and operation of our AI workload orchestration platforms. As a key member of our AI Infrastructure organization, you will be responsible for building and managing a team of software engineers to design, develop, and deploy...
-
Machine Learning Engineer
1 month ago
Santa Clara, California, United States XPENG Motors Full timeJob Title: Machine Learning Engineer - AI FoundationXpeng Motors is a leading smart electric vehicle company that designs, develops, manufactures, and markets smart EVs with advanced Internet, AI, and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers.We are...
-
AI Applications Engineer
4 days ago
Santa Clara, California, United States NVIDIA Full timeAI Applications EngineerOverview: We are seeking a highly skilled Ai Applications Engineer to join our team at NVIDIA. As a key member of our team, you will be responsible for designing and building the tools used by millions of AI practitioners deploying AI applications scalable to thousands of GPUs.Key Responsibilities:Crafting a code generation system to...
-
Solutions Architect, AI Infrastructure
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionNVIDIA is seeking an experienced Solutions Architect to join our AI Infrastructure team. As a key member of our team, you will be responsible for driving our end-to-end technology solutions integration with strategic technology customers.Key Responsibilities:Work with NVIDIA Consumer Internet and IT Services customers on data center GPU server...
-
AI Software Development Engineer
3 weeks ago
Santa Clara, California, United States Rivos Full timeJob Title: AI Software Development EngineerAbout the Role:We are looking for a highly skilled AI Software Development Engineer to join our team at Rivos. As a key member of our silicon, software, and platform design team, you will be responsible for building and maintaining our AI software stack. Key Responsibilities:* Build-up components of an AI Software...
-
Principal Engineer for AI Resilience
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionWe are seeking a highly skilled Principal Engineer to lead the development of AI software resilience for our cutting-edge AI supercomputers.As a key member of our team, you will play a critical role in defining and implementing critical resiliency features for our AI systems, ensuring they remain robust and reliable at all times.Your expertise...
-
AI Systems Engineer
4 weeks ago
Santa Clara, California, United States Meshy Full timeAbout MeshyWe are a leading 3D generative AI company headquartered in the Silicon Valley, on a mission to unleash 3D creativity.We simplify the creation of distinctive 3D assets for both professional artists and hobbyists by transforming text and images into stunning 3D models in minutes.Our global team of experts in computer graphics, AI, and art includes...
-
Senior Staff AI Performance Engineer
4 weeks ago
Santa Clara, California, United States XPENG Motors Full timeWe are seeking a highly skilled AI Performance Engineer to join our team at XPeng Motors, a leading smart electric vehicle company.As a key member of our software engineering team, you will be responsible for optimizing the training and inference performance of state-of-art ML infrastructure and foundation models for autonomous driving.With a strong...
-
Principal Engineer for AI Software Resilience
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionWe are seeking a highly skilled Principal Engineer to lead the development of AI software resiliency for our cutting-edge AI supercomputers. As a key member of our team, you will play a pivotal role in defining and implementing critical resiliency features to ensure our AI systems remain robust and reliable at all times.Key...
-
Cloud Infrastructure Engineer
4 weeks ago
Santa Clara, California, United States Astera Labs Full timeAstera Labs: Transforming Data-Driven ApplicationsAstera Labs is a global leader in purpose-built connectivity solutions that unlock the full potential of AI and cloud infrastructure.Our Intelligent Connectivity Platform integrates PCIe, CXL, and Ethernet semiconductor-based solutions and the COSMOS software suite of system management and optimization tools...
-
Senior Software Engineer, AI
4 days ago
Santa Clara, California, United States Couchbase Full timeEmpower Modern ApplicationsEvery day, we tackle new and exciting challenges to empower developers to build modern cloud, mobile, and edge applications that deliver a premium user experience. Couchbase's fast, flexible, and affordable cloud database platform, Capella, enables organizations to quickly build applications that deliver premium experiences to...
-
Senior Product Marketing Manager
1 month ago
Santa Clara, California, United States Astera Labs Full timeProduct Marketing Manager for Cloud and AI InfrastructureAstera Labs is a leading provider of purpose-built connectivity solutions that unlock the full potential of cloud and AI infrastructure. We are seeking a highly skilled Senior Product Marketing Manager to lead the product marketing efforts for our semiconductor-based connectivity solutions.As a key...
-
Senior Software Engineer
4 weeks ago
Santa Clara, California, United States Couchbase, Inc. Full timeEmpower the Future of Database TechnologyCouchbase is seeking a highly skilled Senior Software Engineer to join our AI team. As a key member of our engineering team, you will design and implement cutting-edge database and AI features and tools using the latest techniques to evolve Couchbase products and Capella service.Key Responsibilities:Design and...
-
AI 3D Model Engineer
4 weeks ago
Santa Clara, California, United States Meshy Full timeAbout MeshyWe are a leading 3D generative AI company headquartered in the Silicon Valley, on a mission to Unleash 3D Creativity. Our platform simplifies the creation of distinctive 3D assets for both professional artists and hobbyists by transforming text and images into stunning 3D models in minutes.Our global team of 30 experts in computer graphics, AI,...
-
AI Application Architect
4 weeks ago
Santa Clara, California, United States ServiceNow Full timeTransforming Work with AIAt ServiceNow, we're revolutionizing the way work is done with cutting-edge AI technology. As an AI Application Architect, you'll play a pivotal role in shaping the future of AI-driven search and digital assistant solutions. Your work will directly impact how employees and customers access and interact with information, driving...
-
Software Architect for AI Compute Engine
4 weeks ago
Santa Clara, California, United States d-Matrix Full timeJob Description:d-Matrix is revolutionizing the field of memory-compute integration with our cutting-edge digital in-memory compute (DIMC) engine. This innovative technology has the potential to break through the memory wall, minimizing data movements and paving the way for significant advancements in AI compute.We are seeking a highly skilled Software...