ML Research Platform Engineer (Distributed Training & HPC)

QNT Partners • Singapore, Singapore

📍 Location

Singapore

⏰ Job Type

Full-time

📅 Posted

June 10, 2026

About the Role

Location: Singapore, Hong Kong or Shanghai

About the role
We are looking for a platform engineer to build the infrastructure that powers our next-generation machine learning research. Think: large-scale experimentation, distributed training, and reproducibility.

This is not an applied ML role. You will not be fine-tuning LLMs or building agents. Instead, you will build the systems that enable researchers to train models at scale

What you will own
Distributed training pipelines for GPU-accelerated workloads (PyTorch, JAX)
Experiment management and model versioning
Resource scheduling on on-premise HPC clusters and cloud (Slurm, Kubernetes)
Observability and debugging for complex training jobs
Data lineage

            
        

Ready to Join Through a Referral?

Apply now and get connected directly with the hiring team
Apply for this Position