Lightning Talk
in
Workshop: Data Centric AI
Automatic Data Quality Evaluation for Text Classification
Abstract:
Data quality is critical for machine learning, but its evaluation usually relies on the performance of used models. A model-independent data quality evaluation metric is needed. This paper proposes a convenient metric called DQTC to quantify the data quality for text classification based on information theory. And an experiment is conducted to verify the relevance between DQTC and model performance. Finally, we describe the linguistic improvement that should be considered. The code is available online.