Vision-Language Model (VLM)

AI models that can process and understand both images and text, enabling tasks like image captioning, visual question answering, and multimodal reasoning. Examples include GPT-4V (Vision), Claude 3, and Gemini Pro Vision.