Harvesting knowledge from the semi-structured web

Xin Luna Dong / Amazon

Talk: , -

Abstract: Knowledge graphs have been used to support a wide range of applications and enhance search and QA for Google, Amazon Alexa, etc. However, we often miss long-tail knowledge, including unpopular entities, unpopular relations, and unpopular verticals. In this talk we describe our efforts in harvesting knowledge from semi-structured websites, which are often populated according to some templates using vast volume of data stored in underlying databases. We describe our AutoCeres ClosedIE system, which improves the accuracy of fully automatic knowledge extraction from 60%+ of state-of-the-art to 90%+ on semi-structured data. We also describe OpenCeres, the first ever OpenIE system on semi-structured data, that is able to identify new relations not readily included in existing ontologies. In addition, we describe our other efforts in ontology alignment, entity linkage, graph mining, and QA, that allow us to best leverage the knowledge we extract for search and QA.

Bio: Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, VLDB Early Career Research Contribution Award for "advancing the state of the art of knowledge fusion", and Best Demo award in Sigmod 2005. She serves in VLDB endowment and PVLDB advisory committee, and is a PC co-chair for VLDB 2021, ICDE Industry 2019, VLDB Tutorial 2019, Sigmod 2018 and WAIM 2015.​​