2nd International Workshop on
Data-Efficient Machine Learning (DeMaL)
Training, retraining and deploying web-scale machine learning models requires large amounts of high-quality data. Often, this is achieved via a time-consuming, labor intensive human annotation process. While in web-scale applications, there is an abundance of unlabeled, often extremely noisy data, there is a severe lack of high quality labeled data from which practitioners can train ML models that perform well on customer-facing applications. To this end, it is imperative that ML scientists and engineers devise innovative ways to deal with the constrained setting of small amounts of labeled data, and make the best use of limited (time and monetary) budget available to obtain annotated data. Thus, one needs to train dataefficient machine learning models. This has led to the proliferation of creative techniques such as data augmentation, transfer learning, self-supervised learning, active learning, multi-task learning to name a few. While many of these techniques have shown to work well under specific settings, web data offers additional challenges. Web data is multi-modal in nature, it has implicit signals from user-interactions, and often involves multiple agents.
Given the uniqueness, importance, and growing interest in these problems, the workshop on Data-efficient Machine Learning (DeMaL) is a venue to present ideas and solutions to these problems. The full day workshop aims to bring together practitioners in both academia and industry working on the collection, annotation and usage of labeled data for large scale web applications. Check out the Call for Contributions for topics of interest.