Expo Demonstration
West Exhibition Hall A

Explore how to build and deploy local LLM-based assistants on the AI PC or at the edge. Our pipeline leverages a real-time speech transcription model (Distil-Whisper), and Large Language Model (LLama3) powered chatbots leveraging Retrieval Augmented Generation (RAG), to personalize user interactions via text generation and summarization over prior interaction history. We discuss how the Intel® Core™ Ultra enables efficient deployment of LLMs on the CPU, iGPU, and NPU, through optimization techniques such as quantization and OpenVINO™ compression libraries. Live demos will be presented throughout the session to ensure developers can see the work in action.

Chat is not available.