Alignment (AI Alignment)

Designing AI systems so their behavior is consistent with human values, safety constraints, and intended goals. In practice, this includes policy design, reinforcement learning from human feedback (RLHF), and rigorous evaluations.