Research Intern — Coding LLMs

Tencent • singapore, Singapore

📍 Location

singapore

⏰ Job Type

Full-time

📅 Posted

June 19, 2026

About the Role

We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding. 


Responsibilities 
* Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis. 
* Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi-step coding tasks. 
* Run experiments to analyze how data, model, and training strategies affect coding capabilities 
* Work with large-scale code corpora, developer activity data, and agentic coding trajectories. 

Requirements 
Currently pursuing a phd degree  
Strong programming skills in Python. 
Solid understanding of machine learning and large language models. 
Familiarity with LLM ...
            

Ready to Join Through a Referral?

Apply now and get connected directly with the hiring team

Apply for this Position