Skip to yearly menu bar Skip to main content


Poster
in
Workshop: CtrlGen: Controllable Generative Modeling in Language and Vision

Fair Data Generation using Language Models with Hard Constraints

SK Mainul Islam · Abhinav Nagpal · Balaji Ganesan · Pranay Lohia


Abstract:

Natural language text generation has seen significant improvements with the advent of pre-trained language models. Using such language models to predict personal data entities, in place of redacted spans in text, could help generate synthetic datasets. In order to address privacy and ethical concerns with such datasets, we need to ensure that the masked entity predictions are also fair and controlled by application specific constraints. We introduce new ways to inject hard constraints and knowledge into the language models that address such concerns and also improve performance on this task.

Chat is not available.