← Back to opportunities
💼
AIGC Distributed Training & Optimisation Engineer
dadaconsultants pte. ltd. • singapore, Singapore
About the Role
About our client A technology group establishing a new AI Centre of Excellence in Singapore is looking for an engineer to own the distributed training infrastructure for large-scale AIGC model development. What you'll work on Design and build distributed training toolchains supporting ultra-large-scale model training Optimise across compute, communication, and storage layers Diagnose and resolve training bottlenecks improve stability and throughput Track and apply frontier distributed training techniques end-to-end What we're looking for Master's or above in CS or related field 2+ years of relevant experience Deep hands-on experience with distributed training paradigms: Data / Pipeline / Tensor / Expert Parallelism Proficient in PyTorch, DeepSpeed, Megatron-LM Familiar with GPU architecture and CUDA programming experience in CUDA kernel development and NCCL/cuDNN Understanding of AIGC pre-training, Transformer architectures, and Diffusion models (Stable Diffusion, Flux)
About Us Dad...
About Us Dad...
Ready to Join Through a Referral?
Apply now and get connected directly with the hiring team
Apply for this Position