[ Hall B2 (level 1) ]
Abstract
Tasks enabled by data contribution estimation (DCE) aid model improvement through data improvement. While benchmark DCE evaluation tasks show application across many ML domains, DCE has limited visibility in other research domains that stand to benefit from its use cases. We propose a tutorial on data contribution for machine learning to address this. This tutorial will provide an overview of DCE for machine learning and natural language processing. Following this tutorial, attendees will have gained an understanding of 1) broadly, what questions data contribution estimation aims to answer; 2) the theory and methods that are widely in use within the DCE community that can be applied to a broad range of domains; 3) DCE from the perspectives of large language models and privacy.
[ Hall E (level 1) ]
Abstract
The tutorial aims to familiarize the ML community with major existing AI governance frameworks, ongoing AI policy proposals worldwide, and the concrete tools the research community has developed to adhere to standards and regulations applicable to ML systems in socially high-stakes domains. As a concrete governance challenge, we will focus on issues of bias and unfairness and overview pipeline-centric approaches to operationalize algorithmic harm prevention. As we will discuss, this approach is particularly relevant to challenges around leveraging the disparate impact doctrine for algorithmic harm prevention and recent FTC advanced notice of proposed rulemakings (ANPRMs). The concluding expert panel is an opportunity for the ML community to hear diverse perspectives on the key AI governance challenges in the near future and how the ML research community can prepare for and support efforts to address those challenges.
[ Hall B1 (level 1) ]
Abstract
As more and more AI systems are deployed in the real world, it becomes imperative to study these systems with real humans to avoid unexpected negative consequences during deployment. Yet, this can be challenging for researchers with more experience designing algorithms and less experience running human participant experiments, or deploying systems in the real world. In this tutorial, we will discuss the state of the human-AI collaboration field, emphasizing (i) incorporating humans into AI systems, including multi-agent, machine learning, and reinforcement learning systems, (ii) investigating when to rely on human vs. AI strengths, and (iii) designing human-AI studies to evaluate algorithms with real humans.
[ Hall D2 (level 1) ]
Abstract
Large language models (LMs) have achieved remarkable success in many language tasks.
Recent works have also shown that knowledge of the world can emerge from large LMs, enabling large LMs to assist decision-making for embodied tasks. However, the world knowledge exhibited by the current large LMs is often not robust and cannot be grounded in physical environments without additional models. This hinders large LMs’ abilities to perform complex reasoning and planning tasks reliably. For example, in creating action plans to move blocks to a target state, GPT-3 achieves a success rate of only 1%, compared to 78% for humans.
On the other hand, humans perform deliberate reasoning and planning based on the mental model of the world (i.e., world model, WMs) that enables us to simulate actions and their effects on the world’s state. WMs encoding the knowledge of the physical world can drastically improve the data efficiency and robustness of intelligent agents. However, WMs were typically studied in reinforcement learning and robotics, which are conceptually distinct from problems studied in language modeling.
This gap indicates enormous new opportunities for connecting WMs and LMs, to enhance LM capabilities of reasoning/planning in both embodied and general settings, and address the aforementioned …
[ Hall C2 (level 1 gate 9 south of food court) ]
Abstract
Large matrices arise in many ML applications, including as representations of datasets, graphs, model weights, first and second-order derivatives, etc. Randomized Numerical Linear Algebra (RandNLA) is an area that uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity, and current efforts of incorporating RandNLA algorithms into core numerical libraries, as well as recent advances in ML, Statistics, and Random Matrix Theory, have led to new theoretical and practical challenges. This tutorial will provide a self-contained overview of RandNLA in light of these important developments.
[ La Nouvelle Orleans Ballroom A-C (level 2) ]
Abstract
The success of the Transformer model has pushed the limits of deep learning to operate on the scale of trillions of parameters. This proliferation of large model size has outpaced the advances in hardware, resulting in an urgent need to distribute enormous models across multiple GPUs. Despite this trend, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across both deep learning and parallel computing.
This drives researchers to question deeply about: How to improve the training and inference efficiency of large models to reduce costs? Can we accommodate larger models with limited resources? What efforts can we make to enable more AI community members to access big models easily? In this tutorial, we investigate the efforts to solving above problems. A diverse set of parallelism is an important tool to improving the efficiency of large model training and inference. Heterogeneous memory management can enhance the model accommodation capacity of processors (e.g. GPUs).Further, deep learning systems for large AI models will significantly reduce the specialized background knowledge required from users, allowing AI users to quickly get started with larger models. We believe that with the benefits of these effective and extensive technologies …
[ Hall E2 (level 1) ]
Abstract
Diffusion models have emerged as a powerful class of generative models and demonstrated astonishing results, in particular in image synthesis. However, training high-resolution diffusion models in pixel space can be highly expensive. Overcoming these limitations, Latent Diffusion Models (LDMs) first map high-resolution data into a compressed, typically lower-dimensional latent space using an autoencoder, and then train a diffusion model in that latent space more efficiently. Thereby, LDMs enable high-quality image synthesis while avoiding excessive compute demands. Furthermore, the LDM paradigm with an autoencoder, which can be tailored to specific problems and data, and a separate diffusion model in latent space offers significant flexibility with respect to architecture and model design. This has allowed LDMs to be successfully extended to various tasks beyond image generation, such as video synthesis, 3D object and scene generation, language modeling, and more. Most prominently, the well-known text-to-image model Stable Diffusion leverages the LDM framework. LDMs have become very popular and widely used in the generative modeling literature.
In this tutorial, we aim to provide an introduction to LDMs. While the literature on diffusion models has become broad, the LDM paradigm stands out as a particularly powerful approach due to its flexibility and excellent trade-off with …
[ Hall E (level 1) ]
Abstract
Peer review is fundamental to scientific research, impacting scientific progress, grant funding allocation, researcher well-being, career paths, and the public's view of science. This tutorial provides a scientific lens on the systemic issues in peer review. It aims to stimulate discussions and inform policy-making based on scientific evidence (rather than individual opinions or anecdotes), addressing a topic that directly affects us all. To this end, the tutorial will delve into various inherent challenges, drawing on experiments on the peer-review process in diverse scientific disciplines. It will also discuss viable solutions and important open problems. The tutorial material will be available at https://cs.cmu.edu/~nihars/tutorials/NeurIPS2023. Finally, the presenter is excited about two things—peer review and minions—and both of these will be reflected generously in the tutorial.
[ Hall B2 (level 1) ]
Abstract
Machine learning, especially large language models (LLMs), has shown promise in proving formal theorems using proof assistants such as Coq, Isabelle, and Lean. Theorem proving is an important challenge for machine learning: Formal proofs are computer programs whose correctness can be verified. Therefore, theorem proving is a form of code generation with rigorous evaluation and no room for the model to hallucinate, opening up a new avenue for addressing LLMs’ flaws in factuality. Despite its potential, learning-based theorem proving has significant entry barriers, primarily due to the steep learning curve for proof assistants. This tutorial aims to bridge this gap and make theorem proving accessible to researchers with a general machine learning background. To that end, our presentation will contextualize theorem proving from a machine learning perspective and demonstrate how to develop LLMs for theorem proving, using newly available open-source tools that provide interfaces to proof assistants without requiring in-depth knowledge of their internals. Furthermore, we will cover advanced topics and open problems in learning-based theorem proving, including its synergies with natural language processing and software verification. Throughout the presentation, we will highlight several conceptual themes recurring in theorem proving that are also critical for machine learning, such as mathematical …
[ Hall B1 (level 1) ]
Abstract
Data heterogeneity is a key determinant of the performance of ML systems. Standard algorithms that optimize for average-case performance do not consider the presence of diversity within data. As a result, variations in data sources, data generation mechanisms, and sub-populations lead to unreliable decision-making, poor generalization, unfairness, and false scientific discoveries. Carefully modeling data heterogeneity is a necessary step in building reliable data-driven systems. Its rigorous study is a nascent field of research spanning several disciplines, including statistics, causal inference, machine learning, economics, and operations research. In this tutorial, we develop a unified view of the disparate intellectual threads developed by different communities. We aim to foster interdisciplinary research by providing a unified view based on a shared language. Drawing upon several separate literatures, we establish a taxonomy of heterogeneity and present quantitative measures and learning algorithms that consider heterogeneous data. To spur empirical progress, we conclude by discussing validation protocols and benchmarking practices.
[ Hall E2 (level 1) ]
Abstract
The rise of large language models (LLMs) offers a new approach for quickly building AI applications. While LLMs such as ChatGPT, Bard, and Bing chat are widely understood as consumer tools, the best practices for developers to effectively use these models through API calls remain poorly understood. This tutorial will share with the NeurIPS audience best practices for building AI applications using LLMs.
This course will include, but also go significantly beyond, “prompt engineering.” We will share best practices for integrating LLMs into more complex software systems, evaluating and continually improving their performance, and enhancing their safety. We will discuss best practices for using LLMs in common operations such as summarizing, making inferences, transforming text, and expanding text, as well as in-context learning, fine-tuning, and the utilization of both open-source and proprietary cloud-hosted LLMs.
LLMs are transforming the development process of AI applications. For example, a sentiment classifier that used to take weeks to build, via a process of collecting and labeling training examples, tuning a supervised model, and then finally deploying the model to make inferences, can now be built in hours by prompting an LLM API.
Through this tutorial, we hope to connect research and practice, and also …
[ La Nouvelle Orleans Ballroom A-C (level 2) ]
Abstract
Data-Centric AI has recently been raised as an important paradigm shift in machine learning and AI — placing the previously undervalued “data work’ at the center of AI development. This tutorial aims to illuminate the fundamentals of Data-Centric AI and articulate its transformative potential. We will explore the motivation behind the data-centric approach, highlighting the power to improve model performance, engender more trustworthy, fair, and unbiased AI systems, as well as discuss benchmarking from a data-centric perspective. Our examination extends to standardized documentation frameworks, exposing how they form the backbone of this new paradigm. The tutorial will cover state-of-the-art methodologies that underscore these areas, which we will contextualize around the high-stakes setting of healthcare. A focus of this tutorial is providing participants with an interactive and hands-on experience. To this end, we provide coding/software tools and resources, thereby enabling practical engagement. The panel discussion, with experts spanning diverse industries, will provide a dynamic platform for discourse, enabling a nuanced understanding of the implications and limitations of Data-Centric AI across different contexts. Ultimately, our goal is that participants gain a practical foundation in data-centric AI, such that they can use or contribute to Data-Centric AI research.
[ Hall C2 (level 1 gate 9 south of food court) ]
Abstract
AI desires to imitate human intelligence for designing efficient decision-making systems, but are we really training them the way humans learn every day or take decisions? Studies have shown humans are inherently more comfortable making decisions on a relative scale or choosing alternatives from a set, which often helps us converge to an optimal decision faster. In recent times, as we are employing more and more AI tools for executing everyday tasks, it’s becoming necessary to align machine behavior with human-like decisions. Another critical challenge in training user-friendly systems lies in the requirement of a huge amount of human feedback, which is often costly and hard to obtain. The solution lies in learning to train our machines through human preferences! Our tutorial aims to address the critical need for educating researchers on different types of preference models by exploring real-world problems and showcasing how training systems through preference feedback can provide cutting-edge solutions. We will equip attendees with a comprehensive understanding of diverse preference models and inference techniques. Another goal of the tutorial is to encourage collaboration among various communities that have significant connections to preference-based learning, including bandits, multiagent games, econometrics, social choice theory, RL, optimization, robotics, and more. …
[ Hall D2 (level 1) ]
Abstract
Large, overparameterized models such as neural networks are now the workhorses of modern machine learning. These models are often trained to near-zero error on noisy datasets and simultaneously generalize well to unseen data, in contrast to the textbook intuition regarding the perils of overfitting. At the same time, near-perfect data-fitting can have severe issues in the context of robustness, privacy, and fairness. Classical theoretical frameworks provide little guidance for navigating these questions due to overparameterization. It is thus crucial to develop new intuition regarding overfitting and generalization that are reflective of these empirical observations. In this tutorial, we discuss recent work in the learning theory literature that provides theoretical insights into these phenomena.