Multimodal OCR @aya-vision for image, @video-infer for video

Query Input