Principal / Senior Data Engineer
1 day ago
Troveo is the largest licensable video library for AI model training. We partner with thousands of content licensors—ranging from top-tier studios and production houses to leading YouTube creators—to supply video content to the world's foremost research labs. Our mission is to rapidly deliver massive volumes of video content, to exact specifications, fueling next-generation generative and world-understanding AI models.
Data Engineering is central to our success. Each week, we process petabytes of video data—quickly, cost-effectively, and with uncompromising quality. As a data engineer at Troveo, you'll focus on:
Lowering costs and reducing turnaround times for processing content.
Enhancing and transforming video data for our customers, to make it easier to discover and more valuable.
We are seeking a Principal or Senior Data Engineer with demonstrated expertise in Python and large-scale data management. Practical experience with AWS services (S3, EC2, etc.), search, and large databases is essential. Familiarity with video data is a plus, but not required.
ResponsibilitiesData Pipeline Development: Design, build, and maintain scalable, efficient data pipelines in Python.
AWS Ecosystem: Leverage services like S3 for data storage (including multiple tiers of storage) and EC2 for compute (currently running clusters of 50k G instances), retrieval, and processing in production environments.
Big Data Handling: Develop and optimize systems to handle petabyte-scale datasets with a focus on performance, reliability, and cost-effectiveness.
Metadata Generation: Leveraging self-hosted open source LLMs and managed APIs to generate reliable metadata to power discovery and enhance the value of the content we deliver.
Discovery: Building from the ground up search capabilities leveraging visual, semantic and taxonomic data to deliver the right content to our customers.
Monitoring & Reliability: Implement robust monitoring, alerting, and logging to ensure smooth data flow and quickly troubleshoot issues.
Collaboration: Work cross-functionally with data scientists, software engineers, and product teams to understand data needs and deliver optimized solutions.
Video Processing (Preferred): If applicable, process and manage video data for analytics, quality control, and other use cases.
Python Proficiency: Strong coding skills in Python (including familiarity with libraries for data manipulation and analysis).
AWS Expertise: Hands-on experience using core AWS services (S3, EC2, possibly Lambda, EMR, or ECS).
Big Data Skills: Demonstrated ability to work with large-scale datasets (petabyte-level), ensuring high performance and scalability.
Database & Storage: Familiarity with large Postgres databases.
Automation & Scripting: Comfortable building CI/CD pipelines and automating repetitive tasks.
Video Processing: Experience handling or transforming video data (e.g., transcoding, extracting metadata, compiling FFMPEG).
Machine Learning Pipelines: Familiarity with ML and Computer Vision workflows or frameworks (OpenCV, TensorFlow, PyTorch, etc.).
Security Best Practices: Understanding of AWS IAM, encryption, and SOC II compliance standards.
An opportunity to work with massive data sets and cutting-edge technologies in the cloud serving the biggest companies in tech building the next generation of AI models
A collaborative environment with a talented, diverse team of engineers and data experts.
Competitive compensation and benefits with room for career growth and professional development.
This job is remote/work from home with the option of meeting up from time to time if you are located in the SF Bay Area.
-
Principal Data Engineer
4 days ago
San Francisco, California, United States Upbound Full timeUpbound is redefining how modern infrastructure is built. As the creators of Crossplane and the pioneers of the Intelligent Control Plane, we are leading the shift toward agentic infrastructure: platforms that reason, adapt, and operate alongside AI-native systems. We're seeking an exceptional Principal Data Engineer to serve as the technical leader for data...
-
Senior · Staff · Principal Backend Engineer
4 days ago
San Francisco, California, United States Lead Allies Full timeSenior / Staff / Principal Backend EngineerLocation: Onsite San FranciscoWe have multiple startups interested in talent. Here is a generic summary. Instead of a perfect job description, we present talented individuals to companies and allow them to share how that talent fits in the organization. A Back-end Developer is a software developer specializing in...
-
Principal Engineer
1 week ago
San Francisco, California, United States Strativ Group Full timePrincipal Engineer - AI Infra & InferenceWe are partnered with a Stealth AI Infra startup (backed by a Tier 1 AI Lab and advised by 2 of the world's most prominent ML thought-leaders), who are hiring a Principal SW Engineer (genuine progression to HoE / Chief Engineer).The business already have enterprise customer traction & are backed by Perplexity and the...
-
Principal Product Manager, Data
1 week ago
San Francisco, California, United States Gemini Full time $192,500 - $275,000About the CompanyGemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and institutions in over 70 countries. Our mission is to unlock the next era of financial, creative, and personal freedom by providing trusted access to...
-
Principal Backend Engineer
2 weeks ago
San Francisco, California, United States Austin Werner Full timePrincipal Backend Engineer (Platform Engineering Team)Level: Staff+ / PrincipalLocation: Onsite – San FranciscoOur client, a high-growth fintech company operating at the intersection of blockchain and next-generation financial infrastructure, is seeking a Backend Engineer (Platform) to design and scale the systems and infrastructure that power their core...
-
Principal Frontend Engineer
3 days ago
San Francisco, California, United States Austin Werner Full timePrincipal Frontend Engineer (Platform Engineering Team)Level:Staff+ / PrincipalLocation:Onsite – San FranciscoOur client, a high-growth fintech company operating at the intersection of blockchain and next-generation financial infrastructure, is seeking aFrontend Engineer (Platform)to own and evolve the architecture that powers all of their web...
-
Principal Software Engineer
3 days ago
San Francisco, California, United States Strategic Employment Partners (SEP) Full timeWe're looking for a seasoned Senior or Principal Platform Engineer to help build and scale the backbone of our platform. This role is 50% backend/API development and 50% infrastructure engineering, with a strong emphasis on designing the APIs and services that power customer deployment experiences. You'll own critical systems that enable provisioning,...
-
Senior / Principal Data Scientist, Matchmaking
2 weeks ago
San Mateo, California, United States Roblox Full timeWHY DATA SCIENCE & ANALYTICS?The Data Science & Analytics organization's mission is to increase our speed, frequency and acumen of making decisions at scale by instilling a data-influenced approach to building products. We cover a wide area of the data spectrum including analytical data engineering, product analytics, experimentation, causal inference,...
-
Senior Data Engineer
2 weeks ago
San Francisco, California, United States Getalembic Full time $190,000 - $209,000 per yearAbout AlembicAlembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you'll help build the tools that provide unprecedented visibility into how marketing...
-
Principal Software Engineer
5 days ago
San Francisco, California, United States Demandbase Full time*Introduction To Demandbase:*Demandbase is the only pipeline AI platform that empowers GTM teams to automate growth at scale. With a unified view of data, insights, actions, and outcomes, B2B enterprises can seamlessly align and execute their account-based GTM strategies with confidence. Thousands of businesses trust Demandbase to maximize revenue, minimize...