6th Workshop on Automated Knowledge Base Construction (AKBC) 2017
at NIPS 2017 in Long Beach, California, December 8th, 2017.
Extracting knowledge from Web pages, and integrating it into a coherent knowledge base (KB) is a task that spans the areas of natural language processing, information extraction, information integration, databases, search, and machine learning. Recent years have seen significant advances here, both in academia and industry. Most prominently, all major search engine providers (Yahoo!, Microsoft Bing, and Google) nowadays experiment with semantic KBs. Our workshop serves as a forum for researchers on knowledge base construction in both academia and industry.
Unlike many other workshops, our workshop puts less emphasis on conventional paper submissions and presentations, but more on visionary papers and discussions. In addition, one of its unique characteristics is that it is centered on keynotes by high-profile speakers. AKBC 2010, AKBC 2012, AKBC 2013, AKBC 2014 and AKBC 2016 each had a dozen invited talks from leaders in this area from academia, industry, and government agencies. We had senior invited speakers from Google, Microsoft, Yahoo, several leading universities (MIT, Stanford, University of Washington, CMU, University of Massachusetts, and more), and DARPA. With this year’s workshop, we aim to resume this positive experience. By established researchers for keynotes, and by focusing particularly on vision paper submissions, we aim to provide a vivid forum of discussion about the field of automated knowledge base construction. The AKBC 2017 workshop will serve as a forum for researchers working in the area of automated knowledge harvesting from text. By having invited talks by leading researchers from industry, academia, and the government, and by focusing particularly on vision papers, we aim to provide a vivid forum of discussion about the field of automated knowledge base construction.
|Tom Mitchell||Carnegie Mellon University|
|Maximilian Nickel||Facebook AI Research|
|Sebastian Riedel||Bloomsbury AI / University College London|
|Sameer Singh||University of California, Irvine|
|Ivan Titov||University of Edinburgh|
|Luke Zettlemoyer||University of Washington/Allen Institute for Artificial Intelligence|
|8:50||9:00||AKBC Organizers||Opening Remarks|
|9:00||9:30||Luna Dong||Challenges and Innovations in Building a Product Knowledge Graph
Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe four scientific directions we are investigating in building and using such a graph, namely, harvesting product knowledge from the web, hands-off-the-wheel knowledge integration and cleaning, human-in-the-loop knowledge learning, and graph mining and graph-enhanced search. This talk will present our progress to achieve near-term goals in each direction, and show the many research opportunities towards our moon-shot goals.
Bio: Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the "Google Truth Machine" by Washington Post. She has got the VLDB Early Career Research Contribution Award for "advancing the state of the art of knowledge fusion". She co-authored book "Big Data Integration", and is the PC co-chair for Sigmod 2018 and WAIM 2015.
|9:30||10:00||Luke Zettlemoyer||End-to-end Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond
Deep learning with large supervised training sets has had significant impact on many research challenges, from speech recognition to machine translation. However, applying these ideas to problems in computational semantics has been difficult, at least in part due to modest dataset sizes and relatively complex structured prediction tasks. In this talk, I will present two recent results on end-to-end deep learning for classic challenge problems in computational semantics: semantic role labeling and coreference resolution. In both cases, we will introduce relative simple deep neural network approaches that use no preprocessing (e.g. no POS tagger or syntactic parser) and achieve significant performance gains, including over 20% relative error reductions when compared to non-neural methods. I will also discuss our first steps towards scaling the amount of data such methods can be trained on by many orders of magnitude, including semi-supervised learning via contextual word embeddings and supervised learning through crowdsourcing. Our hope is that these advances, when combined, will enable very high quality semantic analysis in any domain from easily gathered supervision.
Bio: Luke Zettlemoyer is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and also leads the AllenNLP project at the Allen Institute for Artificial Intelligence. His research focuses on empirical computational semantics, and involves designing machine learning algorithms and building large datasets. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.
|10:00||10:30||Ivan Titov||Graph Convolutional Networks for Extracting and Modeling Relational Data
Graph Convolutional Networks (GCNs) is an effective tool for modeling graph structured data. We investigate their applicability in the context of both extracting semantic relations from text (specifically, semantic role labeling) and modeling relational data (link prediction). For semantic role labeling, we introduce a version of GCNs suited to modeling syntactic dependency graphs and use them as sentence encoders. Relying on these linguistically-informed encoders, we achieve the best reported scores on standard benchmarks for Chinese and English. For link prediction, we propose Relational GCNs (RGCNs), GCNs developed specifically to deal with highly multi-relational data, characteristic of realistic knowledge bases. By explicitly modeling neighbourhoods of entities, RGCNs accumulate evidence over multiple inference steps in relational graphs and yield competitive results on standard link prediction benchmarks. Joint work with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianna van den Berg and Peter Bloem.
Bio: Ivan Titov is an Associate Professor at the University of Edinburgh and the University of Amsterdam. His research interests are in statistical natural language processing and machine learning. He serves as an action editor for TACL, JMLR as well as on the JAIR editorial board and the advisory board of the European chapter of Association for Computational Linguistics. He holds an ERC starting grant in the area of natural language processing.
|10:30||11:30||Morning Poster Session (Posters 1-11!) and Coffee Break|
|11:30||12:00||Maximilian Nickel||Learning Hierarchical Representations of Relational Data
Representation learning has become an invaluable approach for making statistical inferences from relational data. However, while complex relational datasets often exhibit a latent hierarchical structure, state-of-the-art embedding methods typically do not account for this property. In this talk, I will introduce a novel approach to learning such hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincaré ball. I will discuss how the underlying hyperbolic geometry allows us to learn parsimonious representations which simultaneously capture hierarchy and similarity. Furthermore, I will show that Poincaré embeddings can outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
Bio: Maximilian Nickel is a research scientist at Facebook AI Research in New York. Before joining FAIR, he was a postdoctoral fellow at MIT where he was with the Laboratory for Computational and Statistical Learning and the Center for Brains, Minds and Machines. In 2013, he received his PhD with summa cum laude from the Ludwig Maximilian University Munich. From 2010 to 2013 he worked as a research assistant at Siemens Corporate Technology. His research centers around geometric methods for learning and reasoning with relational knowledge representations and their applications in artificial intelligence, machine reading, and question answering.
|12:00||12:30||Sebastian Riedel||Reading and Reasoning with Neural Program Interpreters
We are getting better at teaching end-to-end neural models how to answer questions about content in natural language text. However, progress has been mostly restricted to extracting answers that are directly stated in text. In this talk, I will present our work towards teaching machines not only to read, but also to reason with what was read and to do this in a interpretable and controlled fashion. Our main hypothesis is that this can be achieved by the development of neural abstract machines that follow the blueprint of program interpreters for real-world programming languages. We test this idea using two languages: an imperative (Forth) and a declarative (Prolog/Datalog) one. In both cases we implement differentiable interpreters that can be used for learning reasoning patterns. Crucially, because they are based on interpretable host languages, the interpreters also allow users to easily inject prior knowledge and inspect the learnt patterns. Moreover, on tasks such as math word problems and relational reasoning our approach compares favourably to state-of-the-art methods.
Bio: Sebastian Riedel is a reader in Natural Language Processing and Machine Learning at the University College London (UCL), where he is leading the Machine Reading lab. He is also the head of research at Bloomsbury AI and an Allen Distinguished Investigator. He works in the intersection of Natural Language Processing and Machine Learning, and focuses on teaching machines how to read and reason. He was educated in Hamburg-Harburg (Dipl. Ing) and Edinburgh (MSc., PhD), and worked at the University of Massachusetts Amherst and Tokyo University before joining UCL.
|2:00||2:30||Sameer Singh||Multimodal KB Extraction and Completion
Existing pipelines for constructing KBs primarily support a restricted set of data types, such as focusing on the text of the documents when extracting information, ignoring the various modalities of evidence that we regularly encounter, such as images, semi-structured tables, video, and audio. Similarly, approaches that reason over incomplete and uncertain KBs are limited to basic entity-relation graphs, ignoring the diversity of data types that are useful for relational reasoning, such as text, images, and numerical attributes. In this work, we present a novel AKBC pipeline that takes the first steps in combining textual and relational evidence with other sources like numerical, image, and tabular data. We focus on two tasks: single entity attribute extraction from documents and relational knowledge graph completion. For each, we introduce new datasets that contain multimodal information, propose benchmark evaluations, and develop models that build upon advances in deep neural encoders for different data types.
Bio: Dr. Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. He is working on large-scale and interpretable machine learning applied to information extraction and natural language processing. Before UCI, Sameer was a Postdoctoral Research Associate at the University of Washington. He received his PhD from the University of Massachusetts, Amherst in 2014, during which he also interned at Microsoft Research, Google Research, and Yahoo! Labs.
|2:30||2:30||Best Paper Award|
|2:30||2:45||Contributed Talk: Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning|
|2:45||3:00||Contributed Talk: Multi-graph Affinity Embeddings for Multilingual Knowledge Graphs|
|3:00||3:15||Contributed Talk: A Study of Automatically Acquiring Explanatory Inference Patterns from Corpora of Explanations: Lessons from Elementary Science Exams|
|3:15||3:45||Tom Mitchell||NELL: Lessons and Future Directions
The Never Ending Language Learner (NELL) research project has produced a computer program that has been running continuously since January 2010, learning to build a large knowledge base by extracting structured beliefs (e.g., PersonFoundedCompany(Gates,Microsoft), BeverageServedWithBakedGood(tea,crumpets)) from unstructured text on the web. This talk will provide an update on new NELL research results, reflect on the lessons learned from this effort, and discuss specific challenges for future systems that attempt to build large knowledge bases automatically.
Bio: Tom M. Mitchell is the E. Fredkin University Professor at Carnegie Mellon University, where he founded the world's first Machine Learning Department. His research uses machine learning to develop computers that are learning to read the web, and uses brain imaging to study how the human brain understands what it reads. Mitchell is a member of the U.S. National Academy of Engineering, of the American Academy of Arts and Sciences, and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI).
|3:45||4:45||Afternoon Poster Session (Posters 12-22!) and Coffee Break|
|4:45||5:45||Xin Luna Dong, Ivan Titov, Maximillian Nickel, Luke Zettlemoyer, Sameer Singh, Tom Mitchell||Speaker Panel|
|5:45||6:00||AKBC Organizers||Best Poster Awards and Closing Remarks|
Deadlines are at 11:59pm PDT, and subject to change.