Title: Overton and Bootleg: Elements of a Software 2.0 System
Abstract: Overton is a system whose main design goal is to support engineers in building, monitoring, and improving production machine learning systems. Key challenges engineers face are monitoring fine-grained quality, diagnosing errors in sophisticated applications, and handling contradictory or incomplete supervision data. Overton automates the life cycle of model construction, deployment, and monitoring by providing a set of high-level, declarative abstractions. Overton’s vision is to shift developers to these higher-level tasks instead of lower-level machine learning tasks. Using Overton, engineers can build deep-learning-based applications without writing any code in frameworks like TensorFlow. Since 2018, Overton has been used in production to support multiple applications in both near-real-time applications, e.g. question answering, and back-of-house processing, e.g. entity resolution. In that time, Overton-based applications have answered billions of queries in multiple languages and processed trillions of records reducing errors 1.7−2.9x versus production systems.
A second design goal of Overton is to natively support and maintain pretrained embeddings. This talk will describe recently open-sourced embedding model for named entity disambiguation, called Bootleg, which sets new state-of-the-art quality in named entity disambiguation, outperforms BERT-based baselines by over 50 points on entities unseen during training, and improves production use cases by up to 8% in multiple languages. We will also discuss challenges of how to build, monitor, and improve these weakly self-supervised systems over time.
Bootleg is open source at http://hazyresearch.stanford.edu/bootleg/. A great deal of credit goes to the Search, Knowledge, and Platform team in Apple AI.
Bio: Christopher (Chris) Re is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with widely used products from technology and enterprise companies including Google Ads, GMail, YouTube, and Apple. He has cofounded four companies based on his research into machine learning systems, SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
His research contributions have spanned database theory, database systems, and machine learning. His work has won best paper or test-of-time awards at the premier venues in each area. He still can't believe he won the MacArthur Foundation Fellowship.
Title: Anomaly Detection in Large Graphs
Abstract: Given a large graph, like who-calls-whom, or who-likes-whom, what behavior is normal and what should be surprising, possibly due to fraudulent activity? How do graphs evolve over time? We focus on these topics: (a) anomaly detection in large static graphs and (b) patterns and anomalies in large time-evolving graphs.
For the first, we present a list of static and temporal laws, including advances patterns like 'eigenspokes'; we show how to use them to spot suspicious activities, in on-line buyer-and-seller settings, in FaceBook, in twitter-like networks. For the second, we show how to handle time-evolving graphs as tensors, as well as some surprising discoveries such settings.
Bio: Christos Faloutsos is a Professor at Carnegie Mellon University and an Amazon Scholar. He received the Fredkin Professorship in Artificial Intelligence (2020); the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, the SIGKDD Innovations Award (2010), the PAKDD Distinguished Contributions Award (2018), 28 ``best paper'' awards (including 7 ``test of time'' awards), and four teaching awards.
Eight of his advisees or co-advisees have attracted KDD or SCS dissertation awards. He is an ACM Fellow, he has served as a member of the executive committee of SIGKDD; he has published over 400 refereed articles, 17 book chapters and three monographs. He holds 8 patents (and several more are pending), and he has given over 50 tutorials and over 25 invited distinguished lectures.
Title: Self-driving product understanding for thousands of categories
Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph.
In this talk we describe our efforts for self-driving knowledge collection for products of thousands of types. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. Our system is a) automatic, requiring little human intervention, b) multi-scalable, scalable in multiple dimensions including many domains, products, and attributes, and c) integrative, exploiting rich customer behavior logs. We describe what we learned in building this product graph and applying it to support customer-facing applications.
Xin Luna Dong is a Senior Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, and VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”. She serves in VLDB endowment and PVLDB advisory committee, and is a PC co-chair for WSDM'2022, VLDB'2021, KDD'2020 ADS Invited Talk Series.