Unlocking Document Intelligence with Open-Source AI
2025-11-22 , Breakout Room

Most organizational knowledge is still locked inside complex documents, making it difficult to extract and use the information effectively. Traditional tools often fail when working with real-world PDFs. Tables lose their structure, figures are separated from captions, and multi-column layouts are flattened into unreadable text. These issues create a significant barrier to using AI on real document data.

The open-source project Docling presents a new approach to document ingestion that mirrors human comprehension using open-source deep learning models in a neat Python package. The system extracts structured information through consistent APIs, preserving original document hierarchy while ensuring machine readability.

With support for over ten of the most common file formats and a consistent API, Docling enables production-ready document processing pipelines and provides seamless integration with established frameworks including LangChain and LlamaIndex, as well as multilingual support. Its MIT license and local execution model make it suitable for sensitive enterprise applications.


In this session, you'll get an in-depth introduction to the open source project Docling and how it can streamline your workflow. With over 36,000 stars on Github in a year, Docling is fastest growing open source project out of IBM. We will go through live coding demos that will walk you through the basics of how to use Docling, and showcase the advanced features Docling has that make your real world data more valuable in AI applications.


What is the anticipated audience for your presentation?:

Anyone

See also: Docling Github Repo