Nlp Highlights

Nlp Highlights

Autor: Vários
Narrador: Vários
Editora: Podcast
Duração: 80:57:29

Mais informações

Escolher este livro

Sinopse

Discussing recent and interesting work related to natural language processing. Matt Gardner and Waleed Ammar, research scientists at the Allen Institute for Artificial Intelligence, give short discussions of papers, mostly in interviews with authors about their work.

Mostrar mais

Episódios

64 - Neural Network Models for Sentence Pair Tasks, with Wuwei Lan and Wei Xu

08/08/2018 Duração: 36min

Best reproduction paper at COLING 2018, by Wuwei Lan and Wei Xu. This paper takes a bunch of models for sentence pair classification (including paraphrase identification, semantic textual similarity, natural language inference / entailment, and answer sentence selection for QA) and compares all of them on all tasks. There's a very nice table in the paper showing the cross product of models and datasets, and how by looking at the original papers this table is almost empty; Wuwei and Wei fill in all of the missing values in that table with their own experiments. This is a very nice piece of work that helps us gain a broader understanding of how these models perform in diverse settings, and it's awesome that COLING explicitly asked for and rewarded this kind of paper, as it's not your typical "come look at my shiny new model!" paper. Our discussion with Wuwei and Wei covers what models and datasets the paper looked at, why the datasets can be treated similarly (and some reasons for why maybe they should be t

Ouvir
63 - Neural Lattice Language Models, with Jacob Buckman

02/08/2018 Duração: 30min

TACL 2018 paper by Jacob Buckman and Graham Neubig. Jacob tells us about marginalizing over latent structure in a sentence by doing a clever parameterization of a lattice with a model kind of like a tree LSTM. This lets you treat collocations as multi-word units, or allow words to have multiple senses, without having to commit to a particular segmentation or word sense disambiguation up front. We talk about how this works and what comes out. One interesting result that comes out of the sense lattice: learning word senses from a language modeling objective tends to give you senses that capture the mode of the "next word" distribution, like uses of "bank" that are always followed by "of". Helpful for local perplexity, but not really what you want if you're looking for semantic senses, probably. https://www.semanticscholar.org/paper/Neural-Lattice-Language-Models-Buckman-Neubig/f36b961ea5106c19c341763bd9942c1f09038e5d

Ouvir
62 - Sounding Board: A User-Centric and Content-Driven Social Chatbot, with Hao Fang

30/07/2018 Duração: 31min

NAACL 2018 demo paper, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, and Mari Ostendorf Sounding Board was the system that won the 2017 Amazon Alexa Prize, a competition to build a social chatbot that interacts with users as an Alexa skill. Hao comes on the podcast to tell us about the project. We talk for a little bit about how Sounding Board works, but spend most of the conversation talking about what these chatbots can do - the competition setup, some example interactions, the limits of current systems, and how chatbots might be more useful in the future. Even the best current systems seem pretty limited, but the potential future uses are compelling enough to warrant continued research. https://www.semanticscholar.org/paper/Sounding-Board%3A-A-User-Centric-and-Content-Driven-Fang-Cheng/b540fd427a02b19c6ea55dd7d9758ebf15ec3965

Ouvir
61 - Neural Text Generation in Stories, with Elizabeth Clark and Yangfeng Ji

23/07/2018 Duração: 30min

NAACL 2018 Outstanding Paper by Elizabeth Clark, Yangfeng Ji, and Noah A. Smith Both Elizabeth and Yangfeng come on the podcast to tell us about their work. This paper is an extension of an EMNLP 2017 paper by Yangfeng and co-authors that introduced a language model that included explicit entity representations. Elizabeth and Yangfeng take that model, improve it a bit, and use it for creative narrative generation, with some interesting applications. We talk a little bit about the model, but mostly about how the model was used to generate narrative text, how it was evaluated, and what other interesting applications there are of this idea. The punchline is that this model does a better job at generating coherent stories than other generation techniques, because it can track the entities in the story better. We've been experimenting with how we record the audio, trying to figure out how to get better audio quality. Sadly, this episode was a failed experiment, and there is a background hiss that we couldn'

Ouvir
60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne

28/06/2018 Duração: 28min

NAACL 2018 paper by James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal James tells us about his paper, where they created a dataset for fact checking. We talk about how this dataset relates to other datasets, why a new one was needed, how it was built, and how well the initial baseline does on this task. There are some interesting side notes on bias in dataset construction, and on how "fact checking" relates to "fake news" ("fake news" could mean that an article is actively trying to deceive or mislead you; "fact checking" here is just determining if a single claim is true or false given a corpus of assumed-correct reference material). The baseline system does quite poorly, and the lowest-hanging fruit seems to be in improving the retrieval component that finds relevant supporting evidence for claims. There's a workshop and shared task coming up on this dataset: http://fever.ai/. The shared task test period starts on July 24th - get your systems ready! https://www.semanticscho

Ouvir
59 - Weakly Supervised Semantic Parsing With Abstract Examples, with Omer Goldman

12/06/2018 Duração: 35min

ACL 2018 paper by Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant Omer comes on to tell us about a class project (done mostly by undergraduates!) that made it into ACL. Omer and colleagues built a semantic parser that gets state-of-the-art results on the Cornell Natural Language Visual Reasoning dataset. They did this by using "abstract examples" - they replaced the entities in the questions and corresponding logical forms with their types, labeled about a hundred examples in this abstracted formalism, and used those labels to do data augmentation and train their parser. They also used some interesting caching tricks, and a discriminative reranker. https://www.semanticscholar.org/paper/Weakly-supervised-Semantic-Parsing-with-Abstract-Goldman-Latcinnik/5aec2ab5bf2979da067e2aa34762b589a0680030

Ouvir
58 - Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers, with André Martins

08/06/2018 Duração: 47min

EMNLP 2017 paper by André F. T. Martins and Julia Kreutzer André comes on the podcast to talk to us the paper. We spend the bulk of the time talking about the two main contributions of the paper: how they applied the notion of "easy first" decoding to neural taggers, and the details of the constrained softmax that they introduced to accomplish this. We conclude that "easy first" might not be the right name for this - it's doing something that in the end is very similar to stacked self-attention, with standard independent decoding at the end. The particulars of the self-attention are inspired by "easy first", however, using a constrained softmax to enforce some novel constraints on the self-attention. https://www.semanticscholar.org/paper/Learning-What's-Easy%3A-Fully-Differentiable-Neural-Martins-Kreutzer/252571243aa4c0b533aa7fc63f88d07fd844e7bb

Ouvir
57 - A Survey Of Cross-lingual Word Embedding Models, with Sebastian Ruder

05/06/2018 Duração: 31min

Upcoming JAIR paper by Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Sebastian comes on to tell us about his survey. He creates a typology of cross-lingual word embedding methods, and we discuss why you might use cross-lingual embeddings (low-resource languages in particular), what information they capture (semantics? syntax? both?), how the methods work (lots of different ways), and how to evaluate the embeddings (best when you have an extrinsic task to evaluate on). https://www.semanticscholar.org/paper/A-survey-of-cross-lingual-embedding-models-Ruder/3dbd28c63a7807280c9531735c715d4598024166

Ouvir
56 - Deep contextualized word representations, with Matthew Peters

04/04/2018 Duração: 30min

NAACL 2018 paper, by Matt Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Chris Clark, Kenton Lee, and Luke Zettlemoyer. In this episode, AI2's own Matt Peters comes on the show to talk about his recent work on ELMo embeddings, what some have called "the next word2vec". Matt has shown very convincingly that using a pre-trained bidirectional language model to get contextualized word representations performs substantially better than using static word vectors. He comes on the show to give us some more intuition about how and why this works, and to talk about some of the other things he tried and what's coming next. https://www.semanticscholar.org/paper/Deep-contextualized-word-representations-Peters-Neumann/4b17597b856c087f109381ce77d60d9017cb6f9a

Ouvir
55 - Matchbox: Dispatch-driven autobatching for imperative deep learning, with James Bradbury

28/03/2018 Duração: 31min

In this episode, we take a more systems-oriented approach to NLP, looking at issues with writing deep learning code for NLP models. As a lot of people have discovered over the last few years, efficiently batching multiple examples together for fast training on a GPU can be very challenging with complex NLP models. James Bradbury comes on to tell us about Matchbox, his recent effort to provide a framework for automatic batching with pytorch. In the discussion, we talk about why batching is hard, why it's important, how other people have tried to solve this problem in the past, and what James' solution to the problem is. Code is available here: https://github.com/salesforce/matchbox

Ouvir
54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut

26/03/2018 Duração: 36min

ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some possible future directions are. https://www.semanticscholar.org/paper/Simulating-Action-Dynamics-with-Neural-Process-Bosselut-Levy/dc01c9401d1caab7f5e6d2f1280f5815f6919977

Ouvir
53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle

21/03/2018 Duração: 26min

NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their loss functions. They introduced a notion of a "pseudo reference", replacing the target output sequence with the highest scoring output on the beam during decoding, and we talk about some of the implications there. It also turns out the minimizing expected risk was the best overall training procedure that they found for these structured models. https://www.semanticscholar.org/paper/Classical-Structured-Prediction-Losses-for-Sequence-Edunov-Ott/20ae11c08c6b0cd567c486ba20f44bc677f2ed23

Ouvir
52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman

15/03/2018 Duração: 23min

EMNLP 2016 paper by Sam Wiseman and Sasha Rush. In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Optimization. (Note: while we said in the episode that BSO with beam size of 2 is equivalent to a token-level hinge loss, that's not quite accurate; it's close, but there are some subtle differences.) https://www.semanticscholar.org/paper/Sequence-to-Sequence-Learning-as-Beam-Search-Optim-Wiseman-Rush/28703eef8fe505e8bd592ced3ce52a597097b031

Ouvir
51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae

12/03/2018 Duração: 16min

NIPS 2017 paper by Vlad Niculae and Mathieu Blondel. Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have developed a theoretical framework for analyzing the options in this space, and in this episode we talk about that framework, some concrete instantiations of attention mechanisms from the framework, and how well these work.

Ouvir
50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza

14/02/2018 Duração: 27min

ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum. There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how many_ related entities there are (in this case, two). There are a lot of challenges in getting this to work well, and Paramita describes some of those, and how she solved them. https://www.semanticscholar.org/paper/Cardinal-Virtues-Extracting-Relation-Cardinalities-Mirza-Razniewski/01afba9f40e0df06446b9cd3d5ea8725c4ba1342

Ouvir
49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang

05/02/2018 Duração: 26min

EMNLP 2017 paper by Bishan Yang and Tom Mitchell. Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "relational" models), and (2) use constraints on arguments across frames in the same sentence to get a more coherent global labeling of the sentence. We talk about these contributions, and also touch on "open" versus "closed" semantics, in both predicate-argument structure and information extraction. https://www.semanticscholar.org/paper/A-Joint-Sequential-and-Relational-Model-for-Frame-Yang-Mitchell/a1deb609e3758519cbe3f1a542bdf1ea52b6f224

Ouvir
48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth

29/01/2018 Duração: 27min

AAAI 2017 paper, by Dan Roth. In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models or other multi-task signals to do better representation learning for your end task. Dan argues that we need to be thinking about this more explicitly in our research, instead of learning everything "end-to-end", as we will never have enough data to learn complex tasks directly from annotations alone. https://www.semanticscholar.org/paper/Incidental-Supervision-Moving-beyond-Supervised-Le-Roth/2997dcfc6d5ffc262d57d0a26f74d091de096573

Ouvir
47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn

24/01/2018 Duração: 35min

How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer. https://arxiv.org/abs/1706.02596

Ouvir
46 - Parsing with Traces, with Jonathan Kummerfeld

08/01/2018 Duração: 39min

TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein. Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current parsers totally ignore these phenomena. The second third of the conversation is about how the parser works, and we conclude by talking about some of the implications of the work, and where to go next - should we really be pushing harder on capturing linguistic structure when everyone seems to be going towards end-to-end learning on some higher-level task? https://www.semanticscholar.org/paper/Parsing-with-Traces-An-O-n-4-Algorithm-and-a-Struc-Kummerfeld-Klein/af89e56b3d9b720d43cae9f4971928c5cb95cbe3 Jonathan also

Ouvir
45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao

02/01/2018 Duração: 37min

How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the workshop, what a "minimal pair" is, what tasks the workshop focused on and why, and what the main takeaways of the workshop were. https://www.semanticscholar.org/paper/Towards-Linguistically-Generalizable-NLP-Systems-A-Ettinger-Rao/8472e999f723a9ccaffc6089b7be1865d8a1b863

Ouvir

|<
<<
>>
>|

página 5 de 8