Vision LLM & Video Understanding
Building advanced systems that can process and understand video content using local LLMs with Ollama. These systems extract metadata from transcripts, summaries, and OCR to enable interactive "conversations" with video content, leveraging models with 128k context windows.