Denoising Sequence-to-Sequence Pre-training

Luke Zettlemoyer / University of Washington / Facebook

Talk: , -

Abstract: Denoising auto-encoders can be pre-trained at a very large scale by noising and then reconstructing any input text. Existing methods, based on variations of masked languages models, have transformed the field and are now provide the de facto initialization to be tuned for nearly every task. In this talk, I will present our work on sequence-to-sequence pre-training that allows arbitrary noising, by simply learning to translate any corrupted text back to the original with standard Tranformer-based neural machine translation architectures. I will show the resulting mono-lingual (BART) and multi-lingual (mBART) models are highly effective for a wide range of discrimination and generation tasks, including question answer, summarization, and machine translation. A key contribution of our generalized noising is that we can replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance, as I will describe. Finally, I will highlight many of the ways BART is already being used by other researchers, and discuss opportunities to further push models that pre-train for generating and understanding text in many languages.

Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Facebook. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms; introducing new tasks and datasets; and, most recently, studying how to best develop self-supervision signals for text. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.