Best AI Tools
Tools
Top 100
AI News
Learn
Compare
Partner
Submit Tool
AI Glossary
/
Moderation Classifier
Moderation Classifier
A model that detects policy‑violating content (e.g., hate, self‑harm, sexual content) in inputs or outputs to enforce safety policies.
Related terms
Safety (AI Safety)
Guardrails (AI)
Prompt Injection
View on glossary index