Software Engineer
6 days ago
About Luma AI
Luma's mission is to build multimodal AI to expand human imagination and capabilities. This requires a massive, reliable, and performant GPU infrastructure that pushes the boundaries of scale. Our SRE team is the foundation of our research and product velocity, responsible for the thousands of NVIDIA and AMD GPUs across multiple providers that power our work.
Where You Come In
This is not a typical cloud SRE role. We are looking for a hands-on, first-principles engineer who is fluent in Linux and comfortable operating close to the metal. You will build, maintain, and scale Luma's large-scale GPU infrastructure, working directly on on-prem and multi-vendor cloud clusters. You'll solve complex systems problems, ensure reliability through clear SLOS/SLIs, and build automation that allows us to operate at an unprecedented scale with a lean team.
What You'll Do
- Own GPU Cluster Reliability: Take end-to-end ownership of our GPU clusters for training and inference, ensuring high availability and peak performance across multiple cloud providers.
- Drive Reliability Metrics: Define and maintain service-level objectives (SLOs) and indicators (SLIs) to measure and improve reliability as our infrastructure scales.
- Deep Linux Expertise: Use your mastery of Linux systems to troubleshoot and optimize performance at the OS level.
- Build Robust Automation: Write high-quality tools and automation in Python, Go, or Bash to manage, monitor, and heal our infrastructure.
- Master Kubernetes at Scale: Operate and scale Kubernetes clusters beyond managed services, ensuring reliability across diverse workloads.
- Modern Operations Practices: Implement and manage observability stacks (Prometheus, Grafana) and GitOps workflows (Argo CD, Flux) to keep infrastructure transparent and resilient.
Who You Are
- 5+ years of experience as an SRE, production engineer, or infrastructure engineer in a fast-paced, large-scale environment.
- Deep, hands-on expertise in Linux and containerized systems.
- Strong experience with Kubernetes in production environments at meaningful scale.
- Proficient in Python and/or Go, with a track record of building infrastructure tooling.
- Strong understanding of networking, cloud infrastructure (AWS/GCP), and IaC tools like Terraform.
- A tenacious troubleshooter who thrives on solving complex, low-level problems.
- Experience managing large-scale GPU clusters for AI/ML workloads (training or inference).
What Sets You Apart (Bonus Points)
- Familiarity with job management systems based on Kubernetes or orchestration frameworks like Ray.
- Experience debugging GPU performance issues with specialized tools.
-
Software Engineering Manager
6 days ago
Palo Alto, California, United States TabaPay Full time $150,000 - $200,000 per yearWho We AreThe world is moving towards instant digital payments and TabaPay is leading the way. We help thousands of Fintechs in the US and Canada instantly move money in and out of accounts and we are actively expanding into other countries. Our customers represent the hottest verticals in the financial service industry such as neobanks, challenger brokers,...
-
Graduate Software Engineer
6 days ago
Palo Alto, California, United States THRivve by intelletec Full time $120,000 - $180,000 per yearGraduate Software Engineer — AI Platform | Palo Alto (Hybrid)Role with a stealth, well-funded AI venture backed by a leading enterprise software investor.We're partnering with a new AI startup that is building a next-generation platform designed to transform how enterprises operate and collaborate through intelligent, autonomous workflows. They are...
-
Software Engineer Intern
6 days ago
Palo Alto, California, United States Deep Infra Inc. Full time $48,000 - $60,000 per yearDeepInfra is seeking a talented and motivated Software Engineering Intern to join our team. As an intern, you will be working closely with our experienced engineering team to design, develop, and deploy the top open AI models at scale. This is an excellent opportunity to gain hands-on experience in building scalable and efficient software systems, while...
-
Lead Software Engineer
6 days ago
Palo Alto, California, United States JPMorgan Chase Full time $120,000 - $200,000 per yearWe have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.As a Lead Software Engineer at JPMorganChase within the Corporate sector, infrastructure platforms team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a...
-
Senior Software Engineer
4 days ago
Palo Alto, California, United States PEBL Full time $120,000 - $180,000 per yearPurpose in Every PositionPebl puts a world of talent at your fingertips. With our AI-powered Global Work Platform, companies can hire, pay, and manage employees in 185+ countries—removing risk, red tape, and guesswork from global growth. Backed by more than a decade of compliance leadership and local expertise, Pebl helps businesses move fast, stay...
-
Software Engineering Manager
2 days ago
Palo Alto, California, United States Assured Full time $230,000 - $250,000 per yearAssured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?), while often overlooked, is the foundation of the entire industry. It's currently highly manual, involving phone calls, faxes, and gut instinct—costing tens of billions of dollars a year. We can do better.At Assured, we provide large insurers with the...
-
Software Engineer, Cloud
4 days ago
Palo Alto, California, United States 1X Technologies AS Full time $137,861 - $240,000 per yearJob description We're scaling humanoid robots from prototypes to global deployment. Every robot, every customer, and every internal operation depends on the software you and your team will build.You will own all systems that connect the digital and physical layers, deployment tools, fleet management, customer interfaces, and internal operations platforms....
-
Software Engineer
6 days ago
Palo Alto, California, United States Rubrik Full time $126,500 - $189,700 per yearAbout The TeamAs a member of the Developer Platform team at Rubrik, you will be focused on solving challenging problems with scale and stability in the area of Developer Experience that will enable the company to deliver a product at high-scale without compromising on quality, velocity, and coding standards. Rubrik positions you with a unique opportunity to...
-
Staff Software Engineer
5 days ago
Palo Alto, California, United States Navan Full time $146,250 - $255,000The Staff Full-stack Software Engineer in Security will be responsible for securing Navan products by identifying unaddressed areas of weakness and driving cleverly engineered, scalable solutions that improve our defense-in-depth. You will be responsible for design and development of core services related to authentication, authorization, encryption within...
-
Software Engineer
6 days ago
Palo Alto, California, United States Rubrik Full time $152,400 - $228,700 per yearAbout The TeamThe Forge Team: Engineering the Backbone of Rubrik's PlatformThe Forge team is at the core of Rubrik's mission to secure the world's data. As the platform and systems engineering team, founded by one of Rubrik's co-founders and CTO, our mission is to build a highly reliable, secure, and scalable software-defined platform.We are the architects...