← Back to opportunities

Senior Scientist, Synthetic Data Generation

📍 Location
Santa Clara
⏰ Job Type
Full-time
📅 Posted
June 20, 2026

About the Role

NVIDIA is at the forefront of the AI revolution, and our research is shaping the future of large language models. We are looking for a Senior Scientist to join our team and help advance our capabilities in synthetic data generation for training frontier models. You will contribute to open-source libraries within the NVIDIA NeMo ecosystem that generate synthetic datasets across text, code, structured, and multimodal data, directly feeding the pre- and post-training of LLMs such as Nemotron. This role combines hands-on software engineering with applied research in generative methods, and you will collaborate with research, engineering, product, and model teams as well as external labs.


What you'll be doing:
+ Build synthetic data generation pipelines using LLM-based methods and automated quality evaluation, producing datasets that improve the pre- and post-training of LLMs such as Nemotron — reasoning, coding, structured output, and multimodal understanding.
+ Advan...

Ready to Join Through a Referral?

Apply now and get connected directly with the hiring team

Apply for this Position