Data Scientist

1 month ago


Santa Clara, United States Net2Source Inc. Full time

Data Scientist

Location - Santa Clara, CA (Hybrid)

Contract/ Full Time


This is data scientist role focused on transforming complex and large (100s of terabytes) multi-dimensional data e.g. tabular(relational) , unstructured data such as images, videos, audio files and other various file formats. The key responsibility is to be able to curate high quality training data for the large language model training.


Responsibilities:

  • Curate high-quality datasets and synthesize training data where needed to improve model capabilities.
  • Championing modelling , EDA, Transformation, Modernization and Curation of high-quality training data for GPT-4 and GPT-4 Vision
  • Providing data curation leadership on tabular, unstructured (images, video, logs files etc.) data
  • Creating data definitions and data lineages for effective LLM training for high accuracy
  • Helping build and test prompts to render high quality insights
  • Train and fine-tune language models using frameworks like PyTorch and TensorFlow
  • Rigorously test models to evaluate accuracy, bias, toxicity, and other attributes using statistical analysis
  • Monitor metrics and logs from LLMs in deployment to proactively identify any degraded performance or anomalies.
  • Diagnose root causes when models err or behave unexpectedly using techniques like saliency maps, heatmap visualizations and interactive debugging.
  • Improve model robustness by analyzing model behavior and identifying failure modes. Recommend data augmentation, training modifications etc.
  • Perform model surgery by carefully editing model weights and architectures to fix incorrect or unsafe behavior while maintaining performance.
  • Run A/B experiments to measure impact of model tweaks and fixes on performance, accuracy, toxicity, bias etc.
  • Continuously inspect models for signs of concept drift or staleness and recommend retraining cadence.
  • Document LLM version changes, experiments, and incident response postmortems.
  • Stay updated on the latest techniques from research and industry conferences for responsible and reliable deployment of LLMs.

Requirements:

  • 8+ years experience training, deploying and monitoring natural language models
  • Strong stats skills and large-scale data manipulation capabilities
  • Proficiency of Azure Machine Learning Studio and deploying models in Azure Cloud environments
  • Deep knowledge of Azure SQL and vector databases
  • Proficiency in Python, PyTorch, TensorFlow, NLP libraries and other ML tools
  • Knowledge of responsible AI principles around transparency, fairness and accountability
  • Monitor metrics and logs from LLMs in deployment to proactively identify any degraded performance or anomalies.
  • Diagnose root causes when models err or behave unexpectedly using techniques like saliency maps, heatmap visualizations and interactive debugging.
  • Improve model robustness by analyzing model behavior and identifying failure modes. Recommend data augmentation, training modifications etc.
  • Perform model surgery by carefully editing model weights and architectures to fix incorrect or unsafe behavior while maintaining performance.
  • Run A/B experiments to measure impact of model tweaks and fixes on performance, accuracy, toxicity, bias etc.
  • Continuously inspect models for signs of concept drift or staleness and recommend retraining cadence.
  • Document LLM version changes, experiments, and incident response postmortems.
  • Stay updated on the latest techniques from research and industry conferences for responsible and reliable deployment of LLMs.
  • Knowledge of Autogen, LangChain/Llama Index frameworks


  • Data Scientist

    3 weeks ago


    Santa Clara, United States Sigmaways Inc Full time

    Job DescriptionJob DescriptionDuties: We are looking for a highly motivated Principal Software Engineer to help us build cutting edge analysis, visualization and compute pipelines for analyzing sequencer data. The job requires advanced python expertise and data science skills, in addition to solid computer science skills. You will be involved in...

  • Data Scientist

    1 month ago


    Santa Clara, United States Sigmaways Inc Full time

    Job DescriptionJob DescriptionDuties: We are looking for a highly motivated Principal Software Engineer to help us build cutting edge analysis, visualization and compute pipelines for analyzing sequencer data. The job requires advanced python expertise and data science skills, in addition to solid computer science skills. You will be involved in...


  • Santa Clara, United States Hexaware Technologies Full time

    Role: Sr Data Scientist leadLocation: Santa Clara, CAOnsiteSkill: Python scripting, AI, ML,NLP, LLP, deep learningJob Requirements1. A minimum of 7 years of experience in data science, with a proven track record of delivering impactful data-driven solutions.2. In-depth expertise in statistical analysis, machine learning algorithms, and predictive modeling...


  • Santa Clara, United States Hexaware Technologies Full time

    Role: Sr Data Scientist lead Location: Santa Clara, CA Onsite Skill: Python scripting, AI, ML,NLP, LLP, deep learning Job Requirements 1. A minimum of 7 years of experience in data science, with a proven track record of delivering impactful data-driven solutions. 2. In-depth expertise in statistical analysis, machine learning algorithms, and predictive...

  • Data Scientist

    2 weeks ago


    Santa Clara, United States Ehub Global Inc Full time

    Role - Data Scientist with LLM and Deep learningLocation: Santa Clara, CA (Onsite)Skills :: RAG (indexing and chunking) and NLP and GenAIJob Description – Excellent knowledge in LLM specifically expertise in open AI skills including GPT 3.5, GPT 4 Experience of developing solutions using AI & ML technology.• Proficient with deep learning algorithms...

  • Data Scientist

    1 month ago


    Santa Clara, California, United States Amazon Full time

    Amazon is looking for a passionate, talented, and inventive Data Scientist with a strong machine learning background to help build industry-leading language technology.Our mission is to provide a delightful experience to Amazon's customers by pushing the envelope in Natural Language Processing (NLP), Generative AI, Large Language Model (LLM), Natural...

  • Sr Data Scientist lead

    10 hours ago


    Santa Clara, United States Hexaware Technologies Full time

    What Working at Hexaware offers: Hexaware is a dynamic and innovative IT organization committed to delivering cutting-edge solutions to our clients worldwide. We pride ourselves on fostering a collaborative and inclusive work environment where every team member is valued and empowered to succeed. Hexaware provides access to a vast array of tools that...

  • Sr Data Scientist lead

    14 hours ago


    Santa Clara, United States Hexaware Technologies Full time

    What Working at Hexaware offers:Hexaware is a dynamic and innovative IT organization committed to delivering cutting-edge solutions to our clients worldwide. We pride ourselves on fostering a collaborative and inclusive work environment where every team member is valued and empowered to succeed.Hexaware provides access to a vast array of tools that enhance,...

  • Senior Data Scientist

    3 weeks ago


    Santa Clara, United States Quess US Full time

    Our client is seeking a Senior Data Scientist. Experience Building, Training, and Deploying ML Models using Python9+ years of Experience in building Machine Learning models in PythonExperience working on AWS Sagemaker, Azure ML StudioAbility to deploy ML models with REST-based APIsExpertise in time series forecasting for use cases like Demand Forecasting,...

  • Data Scientist

    1 month ago


    Santa Clara, United States Net2Source Inc. Full time

    Data ScientistLocation - Santa Clara, CA (Hybrid)Contract/ Full Time This is data scientist role focused on transforming complex and large (100s of terabytes) multi-dimensional data e.g. tabular(relational) , unstructured data such as images, videos, audio files and other various file formats. The key responsibility is to be able to curate high quality...

  • Data Scientist

    3 weeks ago


    Santa Clara, United States Quess IT Staffing Full time

    Job Title: Sr. Data ScientistDuration: Full timeLocation: Santa Clara, CA (Hybrid)Job description:Experience Building, Training and Deploying ML Models using Python.9+ years of Experience in building Machine Learning models in Python.Experience working on AWS Sagemaker, Azure ML Studio.Ability to deploy ML models with REST based APIs.Expertise in time series...

  • Data Scientist

    3 days ago


    Santa Clara, United States Quess IT Staffing Full time

    Job Title: Sr. Data ScientistDuration: Full timeLocation: Santa Clara, CA (Hybrid)Job description:Experience Building, Training and Deploying ML Models using Python.9+ years of Experience in building Machine Learning models in Python.Experience working on AWS Sagemaker, Azure ML Studio.Ability to deploy ML models with REST based APIs.Expertise in time series...


  • Santa Clara, United States NVIDIA Full time

    NVIDIA has been innovating computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, we’re tapping into the unlimited potential of generative AI to define the next era of computing. An era in which accelerated computing is powered by our GPUs, and generative AI foundational models for the enterprise. Doing what’s never been...

  • Scientist

    2 weeks ago


    Santa Clara, United States Planet Pharma Full time

    Scientist needed!Job summary:The Experimental Biomarker Research and Early Development team (discovery sciences) is seeking a talented and motivated Scientist to support the development of assays and reagents for digital PCR. The qualified candidate will join a cross functional team of scientists developing a new biomarker detection and feasibility.Essential...


  • Santa Clara, United States SoundHound Full time

    SOUNDHOUND INC. TURNS SOUND INTO UNDERSTANDING AND ACTIONABLE MEANING. We believe in enabling humans to interact with the things around them in the same way we interact with each other: by speaking naturally to mobile phones, cars, TVs, music speakers, coffee machines, and every other part of the emerging 'connected' world. Our latest product, Hound,...


  • Santa Clara, United States SoundHound Full time

    SOUNDHOUND INC. TURNS SOUND INTO UNDERSTANDING AND ACTIONABLE MEANING. We believe in enabling humans to interact with the things around them in the same way we interact with each other: by speaking naturally to mobile phones, cars, TVs, music speakers, coffee machines, and every other part of the emerging 'connected' world. Our latest product, Hound,...


  • Santa Clara, United States Palo Alto Networks Full time

    Job DescriptionJob DescriptionCompany DescriptionOur MissionAt Palo Alto Networks® everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting...

  • Data Scientist

    2 weeks ago


    Santa Cruz, United States Fullpower Full time

    Description Fullpower®-AI delivers a complete B2B IoT platform for AI-powered algorithms, remote contactless biosensing together with end-to-end engineering services, and customization of software in the field of life sciences, health, and biotechnology. Fullpower's platform is vetted and deployed as a PaaS, backed by a patent portfolio of 135+ patents, and...


  • Santa Clara, United States Laksan Technologies LLC Full time

    Job DescriptionJob DescriptionExcellent knowledge in LLM specifically expertise in open AI skills including GPT 3.5, GPT 4 Experience of developing solutions using AI & ML technology.Proficient with deep learning algorithms especially towards developing computer vision applications.Minimum 5 years' experience working as a Data Scientist Excellent...


  • Santa Clara, United States Laksan Technologies LLC Full time

    Job DescriptionJob DescriptionExcellent knowledge in LLM specifically expertise in open AI skills including GPT 3.5, GPT 4 Experience of developing solutions using AI & ML technology.Proficient with deep learning algorithms especially towards developing computer vision applications.Minimum 5 years' experience working as a Data Scientist Excellent...