← Back to opportunities
About the Role
This is a permanent position with candidates required to do hybrid working in either Cambridge or London.
Our client are looking for AI Researchers specialising in Reinforcement Learning with Human Feedback (RLHF) and Generative AI. In this role, you will design and optimise the algorithms that align large-scale generative models with human preferences, ensuring they are safe, controllable, and capable of producing high-quality outputs across multiple modalities. You’ll sit at the intersection of RL, LLMs, and generative modelling, helping us build the next generation of foundation models
Responsibilities
- Develop and refine RLHF algorithms for large language and generative models.
- Research and implement deep reinforcement learning methods (policy gradients, actor‑critic, off‑policy learning) for model alignment.
- Train, fine‑tune, and evaluate LLMs and diffusion models at scale.
- Design experiments to align generative ou...
Ready to Join Through a Referral?
Apply now and get connected directly with the hiring team
Apply for this Position