[ West Exhibition Hall C ]
Abstract
This tutorial delves into the critical and complex domain of evaluating large language models (LLMs), focusing on the unique challenges presented when assessing generative outputs. Despite the difficulty in assigning precise quality scores to such outputs, our tutorial emphasizes the necessity of rigorous evaluation throughout the development process of LLMs. This tutorial will provide an extensive presentation of evaluation scopes, from task-specific metrics to broader performance indicators such as safety and fairness. Participants will be introduced to a range of methodological approaches, including both computation and model-based assessments. The session includes hands-on coding demonstrations, providing the tools and knowledge needed to refine model selection, prompt engineering, and inference configurations. By the end of this tutorial, attendees will gain a comprehensive understanding of LLM evaluation frameworks, contributing to more informed decision-making and ensuring the responsible deployment of these models in real-world applications.
[ West Exhibition Hall C + B3 ]
Abstract
Flow matching is a simple yet effective generative modeling paradigm that has found widespread adoption in diverse domains and large-scale applications. It is inspired by the efficient training of diffusion models, but offers a simpler perspective and enables easy implementation and generalization. At its core, flow matching follows a simple blueprint: regress onto conditional velocities that generate single data examples, and the result is a model that generates the full distribution.
Our objective in this tutorial is to provide a comprehensive yet self-contained introduction to flow matching, beginning with the continuous Euclidean setting. Afterwards, we will explore extensions and generalizations, including adaptations to non-Euclidean geometries, as well as generalizations to discrete domains and even arbitrary Markov processes. Lastly, we will discuss post-training and fine-tuning methodologies for improved inference and conditioning. The tutorial will survey applications of flow matching ranging from image and video generation to molecule generation and language modeling, and will be accompanied by coding examples and a release of an open source flow matching library. We hope this tutorial will serve as a soft entry point for researchers, as well as provide all attendees with both a theoretical and practical understanding of flow matching with an outlook for …
[ West Ballroom B ]
Abstract
Language models (LMs) have become a critical technology for tackling a wide range of natural language processing tasks, making them ubiquitous in both AI research and commercial products. As their commercial importance has surged, the most powerful models have become more secretive, gated behind proprietary interfaces, with important details of their training data, architectures, and develop- ment undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. In this tutorial, we provide a detailed walkthrough of the language model development pipeline, including pretraining data, model architecture and training, adaptation (e.g., instruction tuning, RLHF). For each of these development stages, we provide examples using open software and data, and discuss tips, tricks, pitfalls, and other- wise often inaccessible details about the full language model pipeline that we’ve uncovered in our own efforts to develop open models. We have opted not to have the optional panel given the extensive technical details and examples we need to include to cover this topic exhaustively.
[ West 109 + 110 ]
Abstract
In this tutorial, we will explore the intersection of causality and large language models (LLMs). Our goal is to provide a comprehensive understanding of how causal inference can enhance the performance, interpretability, and robustness of LLMs. The tutorial will cover foundational concepts in both fields, discuss emerging trends, present three paradigms for causality for LLM research, and corresponding practical applications. We also include a panel of experts with diverse backgrounds, including Yoshua Bengio, to engage the NeurIPS community with a comprehensive overview and diverse perspectives.
[ West Ballroom A ]
Abstract
Generative AI has significantly advanced, particularly in natural language processing, exemplified by models like ChatGPT, but these advancements have raised concerns about misuse, such as generating fake news or plagiarizing content. This tutorial explores text watermarking as a solution, embedding detectable patterns within AI-generated text to verify its origin. We will cover the evolution of text watermarking, its modern techniques, and challenges, along with model watermarking for copyright protection. Participants will gain a solid understanding of watermarking methods, their practical applications, and future research directions in this critical field.
[ West Ballroom C ]
Abstract
Recent advancements in machine learning have caused a shift from traditional sparse modeling, which focuses on static feature selection in neural representations, to dynamic sparsity, where different neural pathways are activated depending on the input. This line of work is fueling, among other directions, new architectures for foundation models, such as sparse Mixtures of Experts. In this tutorial, we explore how dynamic sparsity provides several advantages, especially: i) incorporating structural constraints in model representations and predictions; ii) performing conditional computation, adaptively adjusting the model size based on the input complexity; iii) attaining the performance of dense models while accelerating training and inference. This tutorial connects these lines of work through a unified perspective, including pedagogical materials with concrete examples in a wide array of applications (including Natural Language Processing, Computer Vision, and Reinforcement Learning) to familiarize general research audiences with this new, emerging paradigm and to foster future research. The tutorial information is available at https://dynamic-sparsity.github.io/
[ West Exhibition Hall A ]
Abstract
Simulation comparisons are often used in machine learning to argue for the superiority of one model or method over another. However, the conclusions that can be drawn from such studies are only as robust as the forethought that is put into their design and analysis. We discuss core techniques used in the experimental sciences (e.g., medicine and psychology) that are too often sidestepped by AI researchers. Topics include: classical statistical inference, hypothesis testing, one-way and multi-factor ANOVA, within- and between-subjects designs, planned vs. post-hoc contrasts, visualization of outcomes and uncertainty, and modern standards of experimental practice. We then focus on two topics of particular interest to AI researchers: (1) human evaluations of foundation models (LLMs, MLMs), e.g., in domains like intelligent tutoring; and (2) psycholinguistic explorations of foundation models, in which models are used as subjects of behavioral studies in order to reverse engineer their operation, just as psychologists and psycholinguists have done with human participants over the past century.
[ West Ballroom B ]
Abstract
In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two …
[ West Exhibition Hall C + B3 ]
Abstract
Aligning the behavior of AI systems and agents with human goals and values continues to be a major challenge. But the problem is not novel: many disciplines such as economics, political science, legal theory, and cultural evolutionary theory, have grappled for decades if not centuries with the question of how to align the behaviors of individuals with the well-being of other individuals and entire societies. Markets, legal institutions and rules, and political processes are mechanisms on which human societies rely to achieve goals such as well-being, fair treatment, and economic innovation and growth. In this tutorial, we will provide an introduction to these mechanisms: how they work and how they can inform a more robust approach to AI alignment. For example, a key misunderstanding in the current alignment literature is the idea that AI alignment can be achieved by fine-tuning AI agents and systems with a pre-defined set of human preferences; this is the principle underlying reinforcement learning from human feedback (RLHF) for large language models. But regulated market systems take a different approach to alignment: they encourage self-interested firms and individuals to take actions that generate wealth and do not impose excessive costs (externalities) on others and use a …
[ West 109 + 110 ]
Abstract
In the world of large model development, model details and training data are increasingly closed down, pushing privacy to the forefront of machine learning – how do we protect privacy of the data used to train the model, permitting more widespread data sharing collaborations? How will individuals trust these technologies with their data? How do we verify that the integration of individual’s data is both useful to the rest of the participating federation, and, more importantly - safe for the data owner? How do the regulations integrate into this complex infrastructure?
These open questions require a multitude of considerations between the incentives of model development, the data owning parties, and the overseeing agencies. Many cryptographic solutions target these incentives problems, but are they covering all essential components of trustworthy data sharing? Are they practical, or likely to be practical soon?
In this tutorial, we attempt to answer questions regarding specific capabilities of privacy technologies in three parts: 1. overarching incentive issues with respect to data and evaluations, 2. Where cryptographic and optimisation solutions can help; for evaluations, we delve deep into secure computation and machine unlearning. 3. Cultural, societal, and research agendas relating to practically implementing these technologies.
Our website …
[ West Ballroom C ]
Abstract
Data selection is a critical step in training and fine-tuning foundation models, significantly impacting model performance and training efficiency. Existing approaches deployed in foundation models' data curation pipelines have primarily relied on heuristic methods, which, while practical, often lack a theoretical basis and can lead to suboptimal performance. This tutorial aims to bridge the gap between heuristic practices and emerging principled methods that offer systematic, theoretically grounded approaches to data selection.
We will begin by discussing the algorithmic foundations for data selection. This includes attribution-based approaches, diversity-based approaches, and methods that directly optimize for final model performance. These techniques will be introduced as instantiations of the unified framework of utility function maximization. Next, we will review the data selection techniques currently deployed in the foundation model training pipeline, such as rule-based data filtering, examining their strengths and limitations. Finally, we will introduce recent advances in developing principled data selection methods for foundation models, including both data point-level and source-level data selection. By the end of this tutorial, attendees will gain a deeper understanding of the theoretical underpinnings of data selection, practical knowledge of current data selection heuristics for foundation models, and insights into the research frontier in principled data selection …
[ West Exhibition Hall C ]
Abstract
One of the most striking findings in modern research on large language models (LLMs) is that, given a model and dataset of sufficient scale, scaling up compute at training time leads to better final results. However, there is also another lesser-mentioned scaling phenomenon, where adopting more sophisticated methods and/or scaling compute at inference time can result in significantly better output from LLMs. We will present a tutorial on past and present classes of generation algorithms for generating text from autoregressive LLMs, ranging from greedy decoding to sophisticated meta-generation algorithms used to power compound AI systems. We place a special emphasis on techniques for making these algorithms efficient, both in terms of token costs and generation speed. Our tutorial unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems. In turn, we aim to make attendees aware of (meta-)generation algorithms as a promising direction for improving quality, increasing diversity, and enabling resource-constrained research on LLMs.
[ West Exhibition Hall A ]
Abstract
In this tutorial, we will present recent advances in program synthesis that enable the generation of programmatic policies for reinforcement learning and production software programs that satisfy user intent. The tutorial consists of two parts. In the first part of this tutorial, we consider the reinforcement learning (RL) setting, where the goal is to learn a policy that observes environments and acts optimally. Instead of representing policies using deep neural networks, programmatic RL (PRL) methods aim to synthesize program policies structured in a human-readable domain-specific language. PRL reformulates the RL into learning to write a program that can be executed in an environment and maximize the return, potentially yielding improved interpretability and generalizability. We will cover different families of algorithms that rely on search and learning-based methods, including those using large language models to help with the search for programmatic policies. In the second part of the tutorial, we consider code generation problems, where users provide their intent as input to a program synthesizer, which generates a program attempting to satisfy that intent. With the advancement of deep learning, neural networks and large language models (LLMs), with their impressive capabilities of understanding and reasoning over natural language and code, have …
[ West Ballroom A ]
Abstract
Machine learning models often face challenges due to distribution shifts, leading to compromised performance during testing and limiting their use in high-stakes applications. For example, vision models have mistakenly relied on the height of shoulders in images to classify radiographs of COVID-19 patients, influenced by specific scanning techniques used during the pandemic's onset. Similarly, language models exhibit susceptibility to misleading syntactic patterns in natural language inference tasks like determining entailment, persisting as models grow in size. Addressing these issues requires characterizing relevant distribution shifts and establishing desired model behaviors under them.
This tutorial aims to provide a holistic perspective on distribution shifts due to spurious correlations and shortcut learning, as exemplified by the aforementioned instances. We situate existing research within a unified formal framework, discuss challenges in practical application of methods, and delineate the evolving landscape of research on spurious correlations in the era of foundation models. This tutorial serves as a compact and self-contained resource for students and researchers learning the topic, as well as practitioners seeking to deepen their understanding of the issues and of the tools to tackle them.
We will provide an overview of research trends, discuss available benchmarks, and propose best practices for future endeavors. …