Modern ML systems rely on pre-trained and fine-tuned models that achieve state-of-the-art results without the use of specialized training datasets.
As these datasets are expensive and often difficult to obtain, building on such general models enables quickier prototyping and reasonable prediction quality.
However, when such a model is deployed in production, it starts affecting real users, making it prone to data drift and lack of specificity.
In this talk, Fedor Zhdanov, Head of ML Projects at Toloka, a global tech company that supports data-related processes across the entire ML lifecycle, will discuss how HITL techniques can address these two deficiencies of ML.
Fedor will start his talk introducing adaptive ML models, a new HITL product that enables hosting a model as an endpoint with a crowdsourced curation. This makes it possible to catch the data drifts and perform the model retraining in a way that is automatically tailored to the needs of each customer by gathering the real human feedback from Toloka’s global crowd.
Then, he will present the recent academic results of the Toloka team, which are focused on subjective and noisy labeling. First, Fedor will overview the problem of crowdsourced audio transcriptions and show the lessons learned at the audio transcription shared task at VLDB 2021 and the CrowdSpeech benchmark for noisy sequence aggregation. Second, he will present a problem of learning from subjective data on the example of the IMDB-WIKI-SbS benchmark featured at Data-Centric AI workshop at NeurIPS 2021. Finally, he will exhibit: data clustering with crowdsourcing, reinforcement learning without reward engineering using crowdsourcing, and human evaluation of stable diffusion text-to-image models.