RLHF (Reinforcement Learning from Human Feedback)
A training technique where human evaluators rank or rate model outputs, and the model is fine-tuned using reinforcement learning to prefer outputs that humans rate highly. This is how ChatGPT and Claude were trained to be helpful, harmless, and honest.