Network Engineer, AI/ML Infrastructure
2 days ago
About The Role
We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers.
You'll be hands-on with the full lifecycle of our network infrastructure: planning, building, testing, deploying, and keeping everything running at peak performance. That means troubleshooting issues as they arise, monitoring network performance and throughput, developing automation to streamline operations, and working closely with HPC and ML teams to ensure they have the bandwidth they need. You'll also help us plan for future capacity and evaluate emerging network technologies as we scale to meet increasingly demanding workloads.
Responsibilities- Configure and maintain InfiniBand and high-speed Ethernet fabrics
- Optimize network performance for RDMA, and GPU-to-GPU communication
- Manage network switches (Mellanox, NVIDIA, Micas Networks)
- Troubleshoot network bottlenecks and latency issues
- Plan and execute network upgrades and expansions
- Network security implementation (firewalls, VLANs, ACLs)
- Collaborate on storage network optimizationInfrastructure monitoring
- 4+ years of network engineering experience in production environments
- Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
- Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand)
- Hands-on experience with network security (firewalls, ACLs, network segmentation)
- Knowledge of HPC network topologies
- Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
- Strong troubleshooting and problem-solving skills
- Experience in data center environments or AI/ML infrastructure
- Hands-on experience with high-performance Ethernet switches (e.g., Broadcom Tomahawk), and latest InfiniBand switches (e.g., Nvidia/Mellanox)
- Experience optimizing networks for GPU-to-GPU communication
- Experience with open-source firewall solutions (OPNsense, pfSense, or similar)
- Experience with network automation tools
- Understanding of distributed storage networking (Ceph cluster networks)
- Familiarity with network monitoring and observability tools (Prometheus, Grafana)
- Knowledge of multi-site network connectivity and WAN optimization
- Familiarity with cloud networking in at least one platform (AWS, GCP, or Azure) including VPC design, site-to-site VPN configuration, Direct Connect/ExpressRoute/Cloud Interconnect, hybrid cloud connectivity, and cloud-to-datacenter network integration
If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you.
-
Network Engineer, AI/ML Infrastructure
2 weeks ago
Santa Clara, California, United States Boson AI Full time $120,000 - $180,000 per yearAbout The RoleWe're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage,...
-
HPC Engineer, AI/ML Infrastructure
1 week ago
Santa Clara, California, United States Boson AI Full time $150,000 - $250,000About The RoleWe're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers.You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...
-
Santa Clara, California, United States Qualcomm Full timeCompanyQualcomm Atheros, Inc.Job AreaEngineering Group, Engineering Group > Software EngineeringGeneral SummaryWe are a leading technology company dedicated to innovation and excellence in the field of artificial intelligence, machine learning, and networking solutions. Our mission is to create cutting-edge technologies that drive the future of connectivity...
-
Santa Clara, California, United States Netskope Full timeAbout NetskopeToday, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security. Since 2012, we have built...
-
Santa Clara, California, United States d-Matrix Full timeAtd-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one ofrespect and collaboration.We value humility and believe in direct communication. Ourteam is inclusive, and our...
-
Santa Clara, California, United States Plus Full timePlus, also known as PlusAI, is a Physical AI company pioneering AI-based virtual driver software for factory-built autonomous trucks. Headquartered in Silicon Valley with operations in the United States and Europe, Plus was named by Fast Company as one of the World's Most Innovative Companies. Partners including TRATON GROUP's Scania, MAN, and International...
-
Network Integration Engineer
4 days ago
Santa Clara, California, United States SoftHQ Inc Full timeNetwork Integration EngineerLocation: Santa Clara CADuration: 6 monthsONSITE - Santa Clara CA● 10+ years of experience in fast-paced networking environments, specializing in wireless network design and implementation● Extensive knowledge of wireless technologies and protocols, including Wi-Fi 6, 802.11ax/ac/n, and RF design principles● Proven ability...
-
Vice President, AI Platform Engineering
2 weeks ago
Santa Clara, California, United States LaBine and Associates Full time $150,000 - $250,000 per yearVice President, AI Platform EngineeringAbout the RoleA leading enterprise undergoing a bold AI transformation is seeking a Vice President of AI Platform Engineering to architect and scale its next-generation AI platform. Reporting directly to the Chief AI Officer, this executive will build the foundation that powers AI-native products and intelligent agentic...
-
Staff AI Software Engineer
2 days ago
Santa Clara, California, United States Qualcomm Full timeCompanyQualcomm Atheros, Inc.Job AreaEngineering Group, Engineering Group > Software EngineeringGeneral SummaryWe are seeking a highly skilled and experienced Staff Software Engineer with 5-10+ years of expertise in AI/ML to join our dynamic team. The ideal candidate will have a strong background in model development, including Convolutional Neural Networks...
-
Software Infrastructure Engineer, Senior Staff
2 weeks ago
Santa Clara, California, United States d-Matrix Full time $150,000 - $250,000 per yearAt d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of respect and collaboration.We value humility and believe in direct communication. Our team is inclusive, and our...