Principal High-Performance LLM Training Engineer

NVIDIA • Santa Clara, United States

📍 Location

Santa Clara

⏰ Job Type

Full-time

📅 Posted

June 06, 2026

About the Role

                NVIDIA is seeking a Principal Engineer to drive the performance of large-scale AI training and post-training workloads across NVIDIA’s full hardware and software stack. This role sits at the intersection of distributed training, GPU architecture, systems software, deep learning frameworks, and performance engineering. You will analyze and optimize frontier-scale LLM workloads running on thousands of GPUs, drive improvements across frameworks such as PyTorch, JAX, NeMo, and NeMo RL, and use insights from real workloads to help shape future NVIDIA GPU, system, and software roadmaps. 
  
 We are looking for a deeply technical leader who can operate across abstraction layers: from application-level training behavior to framework/runtime internals, CUDA libraries, communication collectives, memory systems, networking, and GPU architecture. At this level, success means both directly improving performance directly as well as setting technical direction, raising the bar for the org...

Ready to Join Through a Referral?

Apply now and get connected directly with the hiring team

Apply for this Position