Moderation Classifier

A model that detects policy‑violating content (e.g., hate, self‑harm, sexual content) in inputs or outputs to enforce safety policies.