Open vocabulary detectors such as OWL ViT v2 are trained on web scale image text pairs. They generalize well on natural images, yet they struggle when categories are subtle, for example chimney versus ...
import { agent, llmOpenAI, llmAnthropic, mcp } from "volcano-ai"; // Setup: two LLMs, two MCP servers const planner = llmOpenAI({ model: "gpt-5-mini", apiKey: process ...