Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks artwork
The Quantum Drift

Meet Leopard: The AI That Excels in Multi-Image, Text-Rich Tasks

  • S2E54
  • 09:38
  • November 6th 2024

In this episode, Robert and Haley explore the latest breakthrough in AI multimodal models: Leopard, a new AI developed to tackle complex, text-rich image tasks. Designed by researchers from the University of Notre Dame, Tencent AI Seattle Lab, and UIUC, Leopard is the first model to truly excel at understanding and reasoning across multiple text-heavy images, like presentation slides, web snapshots, and scanned documents.

Join us as we break down how Leopard’s adaptive high-resolution multi-image encoding and innovative pixel shuffling set it apart from traditional models. Unlike its predecessors, Leopard can keep high-resolution details without sacrificing accuracy, meaning it’s primed for real-world uses like analyzing multi-page reports, data charts, and visual presentations. We discuss:

  • Leopard’s Unique Dataset: A tailored instruction-tuning dataset of over a million data points.
  • Dynamic Encoding: How Leopard keeps crucial details while managing multiple images at once.
  • Performance Gains: Over 9% improvement on benchmarks like SlideVQA and Multi-page DocVQA.

Get ready to dive into how this model reshapes the landscape for AI in business, education, and research. Leopard just might be the game-changer multimodal AI has been waiting for!


The Quantum Drift

Join hosts Robert Loft and Haley Hanson on Quantum Drift as they navigate the ever-evolving world of artificial intelligence. From breakthrough innovations to the latest AI applications shaping industries, this podcast brings you timely updates, expert insights, and thoughtful analysis on all things AI. Whether it's ethical debates, emerging tech trends, or the impact on society, The Quantum Drift keeps you informed on the news driving the future of intelligence.