IBM

Expo Demonstration

West Exhibition Hall A

Enterprise applications present unique challenges for vision and language foundation models, as they frequently involve visual data that diverges significantly from the typical distribution of web images and require understanding of nuanced details such as small text in scanned documents, or tiny defects in industrial equipment images. Motivated by these challenges, we will showcase our IBM Granite Vision model, a foundation model with state-of-the-art performance in document image understanding tasks, such as the analysis of charts, plots, infographics, tables, flow diagrams, and more. We will provide a detailed overview of our methodology and present a live demonstration of our model's capabilities, illustrating its key features and applications. Our model will be open-sourced, allowing the community to access and contribute to its development.

Chat is not available.