AI Tool Seekers
Human-verified AI tools.
✓ Every tool is hand-reviewed by a human before it's listed. Now accepting free submissions (dofollow included) →
AI Glossary · Last reviewed May 2026

RLHF

· Reinforcement Learning from Human Feedback
Hand-written by a real person. Reviewed against current practice in May 2026.
"
Definition

A training method where humans rank model outputs to teach the model what good looks like.

Full write-up coming soon

We are working on a detailed page for RLHF - covering why it matters, how it works, related terms, and the tools that use it.

Related terms

From the glossary
Fine-tuning
LLM

Frequently asked questions

What kind of feedback does RLHF use?+

Human raters compare pairs of model outputs and select the better one. This preference signal trains a reward model, which then guides reinforcement learning to make the base model produce outputs more like the preferred ones.

Is RLHF used in all major models?+

Most frontier chat models, including GPT-4, Claude, and Gemini, use some form of human feedback alignment. The exact method varies and newer techniques like DPO and RLAIF are also emerging.

What are the limitations of RLHF?+

It is expensive to collect human preferences at scale, annotator disagreements introduce noise, and models can learn to game the reward model rather than genuinely improving.

Explore other terms

From the glossary
AI Agents
A program that takes goals and figures out the steps to reac...
API
The way one piece of software talks to another.
Chain of Thought
A prompting technique where the model reasons out loud, step...
Context Window
How much text a model can read at once.
Embeddings
Numeric fingerprints of text or images that let computers me...
Few-shot Learning
Showing a model two to five examples in the prompt so it fol...
View all 22 terms
Compare: