Data Engineer
7 days ago
We are looking for a Data Engineer passionate about LLMs, VLMs, post-training, and reinforcement learning. You will design and implement scalable data systems that power dataset generation, filtering, and evaluation for model alignment and agentic reasoning. You'll collaborate closely with our research and infrastructure teams to ship real systems that train the next generation of intelligent models.
Key Responsibilities
- Build and maintain scalable data pipelines for mid-training and post-training.
- Design high-throughput systems for data collection, deduplication, and quality measurement.
- Work with researchers to implement reward models, benchmarks, and feedback loops.
- Collaborate cross-functionally with infra and research teams to integrate new data modalities and tasks.
Qualifications
- Strong software engineering background.
- Experience with LLMs, RLHF/RLAIF, and/or post-training pipelines (SFT, DPO, PPO, etc.).
- Familiarity with modern data tooling (e.g., PySpark, Ray, Hugging Face Datasets, Arrow, Parquet).
- Comfort with large-scale data manipulation, storage, and retrieval.
- Understanding of data curation principles, filtering heuristics, and annotation workflows.
- (Bonus) Experience with training reward models.
- (Bonus) Experience with coding, tool-using or agentic LLM datasets.
- (Bonus) Experience building and maintaining hybrid compute clusters (Kubernetes, Slurm).
What We Offer
- Work with a world-class, research-driven team shaping the future of data-centric AI.
- Early technical ownership and influence in a fast-moving, well-funded startup.
- Competitive compensation with equity.
- Hybrid flexibility (SF Bay Area preferred, remote considered).
- Impactful open-source contributions (papers, codes) recognized by top research and industry labs.
Job Types: Full-time, Contract, Internship
Projected Total Compensation: $132, $156,000.00 per year
Benefits:
- 401(k)
- Health insurance
- Vision insurance
Work Location: In person
-
Data Engineer
1 week ago
San Francisco, California, United States Zapier Full timeAbout ZapierWe're humans who simply think computers should do more work.At Zapier, we're not just making software—we're building a platform to help millions of businesses globally scale with automation and AI. Our mission is to make automation work for everyone by delivering products that delight our customers. You'll collaborate with brilliant people, use...
-
Data Engineer
4 days ago
San Francisco, California, United States InterSources Inc Full timeJob Title:AI Data EngineerDuration:3-6 Months CTHExperience:Minimum 12+ YearsJob Description:Design, build, and maintain large-scale data pipelines and architectures to support AI and machine learning initiatives.Collaborate with data scientists, AI engineers, and analytics teams to ensure seamless data flow for model development and deployment.Develop and...
-
Data Engineer
4 days ago
San Francisco, California, United States Air Apps Full timeAbout Air AppsAt Air Apps, we believe in thinking bigger—and moving faster. We're a family-founded company on a mission to create the world's first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. Born in Lisbon, Portugal in 2018—and now with offices in...
-
Data Engineer
2 weeks ago
San Francisco, California, United States Sigma Full time $140,000 - $155,000 per yearAbout The RoleWe're hiring our first Data Engineer within Tech Operations at Sigma. In this role, you'll build the data foundation that powers critical insights across Engineering and Tech Operations. You will architect, scale, and optimize data models and pipelines across Snowflake and Databricks, fueling everything from internal decision-making to...
-
Data Engineer
4 days ago
San Francisco, California, United States Stefanini Group Full timeJob DescriptionStefanini Group is hiringStefanini is looking for a Data Engineer for various location across USA (Hybrid).For quick apply, please connect with Prakhar Goel: / W2 Candidates OnlyPosition SummaryAs a Data Engineer, this CW will be responsible for collecting, parsing, managing, analyzing, and visualizing large sets of data to turn...
-
Data Engineer
2 weeks ago
San Francisco, California, United States Kikoff Full timeResponsibilitiesWe are looking for a Data Engineer or Analytics Engineer to join our Data team. You will collaborate with the data scientist and engineers to design, build, and scale high-leverage data models, foundational datasets and scalable infrastructure that enables analytics, modeling, and experimentation. Your responsibilities includes:Build and...
-
Data Engineer
1 week ago
San Francisco, California, United States Haystack Full timeWe're working with Cognizant on this role.Azure Data EngineerSan Francisco, HybridAbout the RoleCognizant is looking for an experienced Azure Data Engineer with strong skills across Azure Databricks, Azure Synapse Analytics, PySpark, and Azure Data Factory. You will design and implement scalable data solutions that support advanced analytics and enterprise...
-
Data Engineer
1 week ago
San Francisco, California, United States Mercor Full timeAbout MercorMercor is at the intersection of labor markets and AI research. We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast talent network trains frontier AI models in the same way teachers teach students: by sharing knowledge, experience, and context that can't be captured in code alone....
-
Data Engineer
2 days ago
San Francisco, California, United States Career Mentors Full timeJob description Location: San Francisco, CA and Jersey City, NJWork Mode: Hybrid (3 Days Onsite / 2 Days Remote)Employment Type: W2 Only (No C2C or 1099)Candidate Requirement: Local candidates preferred (must be located near San Francisco or Jersey City)As an AWS Data Engineer, you will be a key member of our data engineering team, responsible for building...
-
Platform Data Engineer
2 days ago
San Francisco, California, United States Neon Redwood Full timeAbout Neon RedwoodNeon Redwood is a data services consulting company, working on cutting-edge AI and data-driven solutions. We are a team of passionate engineers and data experts, and we are currently looking for a Data Engineer to join our team and help us develop and expand our data infrastructure and analytics capabilities.The RoleWe are seeking an...