On-prem Platform Engineer

Apolis • Charlotte, North Carolina, United States

📍 Location

Charlotte, North Carolina

⏰ Job Type

Full-time

📅 Posted

May 16, 2026

About the Role

  On-prem Platform Engineer

  Location:  Charlotte, NC 

  Key Skills: 

  Must-Have Skills (Mandatory Keywords) 

  LLM Inference & Optimization 

 vLLM, TensorRT-LLM, Triton Inference Server, SGLang

 Inference optimization techniques:
 
 Continuous batching

 Speculative decoding

 KV cache / Prefix caching

 Model optimization:
 
 FP8, AWQ, GPTQ

  Distributed & GPU Systems 

 Tensor parallelism and large model scaling

 CUDA, NCCL, GPU architecture

 GPU partitioning & optimization (MIG)

  Kubernetes & ML Serving 

 Kubernetes-based ML serving...

Ready to Join Through a Referral?

Apply now and get connected directly with the hiring team

Apply for this Position