Nlp Highlights

  • Autor: Vários
  • Narrador: Vários
  • Editora: Podcast
  • Duração: 80:57:29
  • Mais informações

Informações:

Sinopse

Discussing recent and interesting work related to natural language processing. Matt Gardner and Waleed Ammar, research scientists at the Allen Institute for Artificial Intelligence, give short discussions of papers, mostly in interviews with authors about their work.

Episódios

  • 64 - Neural Network Models for Sentence Pair Tasks, with Wuwei Lan and Wei Xu

    08/08/2018 Duração: 36min

    Best reproduction paper at COLING 2018, by Wuwei Lan and Wei Xu. This paper takes a bunch of models for sentence pair classification (including paraphrase identification, semantic textual similarity, natural language inference / entailment, and answer sentence selection for QA) and compares all of them on all tasks. There's a very nice table in the paper showing the cross product of models and datasets, and how by looking at the original papers this table is almost empty; Wuwei and Wei fill in all of the missing values in that table with their own experiments. This is a very nice piece of work that helps us gain a broader understanding of how these models perform in diverse settings, and it's awesome that COLING explicitly asked for and rewarded this kind of paper, as it's not your typical "come look at my shiny new model!" paper. Our discussion with Wuwei and Wei covers what models and datasets the paper looked at, why the datasets can be treated similarly (and some reasons for why maybe they should be t

  • 63 - Neural Lattice Language Models, with Jacob Buckman

    02/08/2018 Duração: 30min

    TACL 2018 paper by Jacob Buckman and Graham Neubig. Jacob tells us about marginalizing over latent structure in a sentence by doing a clever parameterization of a lattice with a model kind of like a tree LSTM. This lets you treat collocations as multi-word units, or allow words to have multiple senses, without having to commit to a particular segmentation or word sense disambiguation up front. We talk about how this works and what comes out. One interesting result that comes out of the sense lattice: learning word senses from a language modeling objective tends to give you senses that capture the mode of the "next word" distribution, like uses of "bank" that are always followed by "of". Helpful for local perplexity, but not really what you want if you're looking for semantic senses, probably. https://www.semanticscholar.org/paper/Neural-Lattice-Language-Models-Buckman-Neubig/f36b961ea5106c19c341763bd9942c1f09038e5d

  • 62 - Sounding Board: A User-Centric and Content-Driven Social Chatbot, with Hao Fang

    30/07/2018 Duração: 31min

    NAACL 2018 demo paper, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, and Mari Ostendorf Sounding Board was the system that won the 2017 Amazon Alexa Prize, a competition to build a social chatbot that interacts with users as an Alexa skill. Hao comes on the podcast to tell us about the project. We talk for a little bit about how Sounding Board works, but spend most of the conversation talking about what these chatbots can do - the competition setup, some example interactions, the limits of current systems, and how chatbots might be more useful in the future. Even the best current systems seem pretty limited, but the potential future uses are compelling enough to warrant continued research. https://www.semanticscholar.org/paper/Sounding-Board%3A-A-User-Centric-and-Content-Driven-Fang-Cheng/b540fd427a02b19c6ea55dd7d9758ebf15ec3965

  • 61 - Neural Text Generation in Stories, with Elizabeth Clark and Yangfeng Ji

    23/07/2018 Duração: 30min

    NAACL 2018 Outstanding Paper by Elizabeth Clark, Yangfeng Ji, and Noah A. Smith Both Elizabeth and Yangfeng come on the podcast to tell us about their work. This paper is an extension of an EMNLP 2017 paper by Yangfeng and co-authors that introduced a language model that included explicit entity representations. Elizabeth and Yangfeng take that model, improve it a bit, and use it for creative narrative generation, with some interesting applications. We talk a little bit about the model, but mostly about how the model was used to generate narrative text, how it was evaluated, and what other interesting applications there are of this idea. The punchline is that this model does a better job at generating coherent stories than other generation techniques, because it can track the entities in the story better. We've been experimenting with how we record the audio, trying to figure out how to get better audio quality. Sadly, this episode was a failed experiment, and there is a background hiss that we couldn'

  • 60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne

    28/06/2018 Duração: 28min

    NAACL 2018 paper by James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal James tells us about his paper, where they created a dataset for fact checking. We talk about how this dataset relates to other datasets, why a new one was needed, how it was built, and how well the initial baseline does on this task. There are some interesting side notes on bias in dataset construction, and on how "fact checking" relates to "fake news" ("fake news" could mean that an article is actively trying to deceive or mislead you; "fact checking" here is just determining if a single claim is true or false given a corpus of assumed-correct reference material). The baseline system does quite poorly, and the lowest-hanging fruit seems to be in improving the retrieval component that finds relevant supporting evidence for claims. There's a workshop and shared task coming up on this dataset: http://fever.ai/. The shared task test period starts on July 24th - get your systems ready! https://www.semanticscho

  • 59 - Weakly Supervised Semantic Parsing With Abstract Examples, with Omer Goldman

    12/06/2018 Duração: 35min

    ACL 2018 paper by Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant Omer comes on to tell us about a class project (done mostly by undergraduates!) that made it into ACL. Omer and colleagues built a semantic parser that gets state-of-the-art results on the Cornell Natural Language Visual Reasoning dataset. They did this by using "abstract examples" - they replaced the entities in the questions and corresponding logical forms with their types, labeled about a hundred examples in this abstracted formalism, and used those labels to do data augmentation and train their parser. They also used some interesting caching tricks, and a discriminative reranker. https://www.semanticscholar.org/paper/Weakly-supervised-Semantic-Parsing-with-Abstract-Goldman-Latcinnik/5aec2ab5bf2979da067e2aa34762b589a0680030

  • 58 - Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers, with André Martins

    08/06/2018 Duração: 47min

    EMNLP 2017 paper by André F. T. Martins and Julia Kreutzer André comes on the podcast to talk to us the paper. We spend the bulk of the time talking about the two main contributions of the paper: how they applied the notion of "easy first" decoding to neural taggers, and the details of the constrained softmax that they introduced to accomplish this. We conclude that "easy first" might not be the right name for this - it's doing something that in the end is very similar to stacked self-attention, with standard independent decoding at the end. The particulars of the self-attention are inspired by "easy first", however, using a constrained softmax to enforce some novel constraints on the self-attention. https://www.semanticscholar.org/paper/Learning-What's-Easy%3A-Fully-Differentiable-Neural-Martins-Kreutzer/252571243aa4c0b533aa7fc63f88d07fd844e7bb

  • 57 - A Survey Of Cross-lingual Word Embedding Models, with Sebastian Ruder

    05/06/2018 Duração: 31min

    Upcoming JAIR paper by Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Sebastian comes on to tell us about his survey. He creates a typology of cross-lingual word embedding methods, and we discuss why you might use cross-lingual embeddings (low-resource languages in particular), what information they capture (semantics? syntax? both?), how the methods work (lots of different ways), and how to evaluate the embeddings (best when you have an extrinsic task to evaluate on). https://www.semanticscholar.org/paper/A-survey-of-cross-lingual-embedding-models-Ruder/3dbd28c63a7807280c9531735c715d4598024166

  • 56 - Deep contextualized word representations, with Matthew Peters

    04/04/2018 Duração: 30min

    NAACL 2018 paper, by Matt Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Chris Clark, Kenton Lee, and Luke Zettlemoyer. In this episode, AI2's own Matt Peters comes on the show to talk about his recent work on ELMo embeddings, what some have called "the next word2vec". Matt has shown very convincingly that using a pre-trained bidirectional language model to get contextualized word representations performs substantially better than using static word vectors. He comes on the show to give us some more intuition about how and why this works, and to talk about some of the other things he tried and what's coming next. https://www.semanticscholar.org/paper/Deep-contextualized-word-representations-Peters-Neumann/4b17597b856c087f109381ce77d60d9017cb6f9a

  • 55 - Matchbox: Dispatch-driven autobatching for imperative deep learning, with James Bradbury

    28/03/2018 Duração: 31min

    In this episode, we take a more systems-oriented approach to NLP, looking at issues with writing deep learning code for NLP models. As a lot of people have discovered over the last few years, efficiently batching multiple examples together for fast training on a GPU can be very challenging with complex NLP models. James Bradbury comes on to tell us about Matchbox, his recent effort to provide a framework for automatic batching with pytorch. In the discussion, we talk about why batching is hard, why it's important, how other people have tried to solve this problem in the past, and what James' solution to the problem is. Code is available here: https://github.com/salesforce/matchbox

  • 54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut

    26/03/2018 Duração: 36min

    ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some possible future directions are. https://www.semanticscholar.org/paper/Simulating-Action-Dynamics-with-Neural-Process-Bosselut-Levy/dc01c9401d1caab7f5e6d2f1280f5815f6919977

  • 53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle

    21/03/2018 Duração: 26min

    NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their loss functions. They introduced a notion of a "pseudo reference", replacing the target output sequence with the highest scoring output on the beam during decoding, and we talk about some of the implications there. It also turns out the minimizing expected risk was the best overall training procedure that they found for these structured models. https://www.semanticscholar.org/paper/Classical-Structured-Prediction-Losses-for-Sequence-Edunov-Ott/20ae11c08c6b0cd567c486ba20f44bc677f2ed23

  • 52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman

    15/03/2018 Duração: 23min

    EMNLP 2016 paper by Sam Wiseman and Sasha Rush. In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Optimization. (Note: while we said in the episode that BSO with beam size of 2 is equivalent to a token-level hinge loss, that's not quite accurate; it's close, but there are some subtle differences.) https://www.semanticscholar.org/paper/Sequence-to-Sequence-Learning-as-Beam-Search-Optim-Wiseman-Rush/28703eef8fe505e8bd592ced3ce52a597097b031

  • 51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae

    12/03/2018 Duração: 16min

    NIPS 2017 paper by Vlad Niculae and Mathieu Blondel. Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have developed a theoretical framework for analyzing the options in this space, and in this episode we talk about that framework, some concrete instantiations of attention mechanisms from the framework, and how well these work.

  • 50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza

    14/02/2018 Duração: 27min

    ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum. There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how many_ related entities there are (in this case, two). There are a lot of challenges in getting this to work well, and Paramita describes some of those, and how she solved them. https://www.semanticscholar.org/paper/Cardinal-Virtues-Extracting-Relation-Cardinalities-Mirza-Razniewski/01afba9f40e0df06446b9cd3d5ea8725c4ba1342

  • 49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang

    05/02/2018 Duração: 26min

    EMNLP 2017 paper by Bishan Yang and Tom Mitchell. Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "relational" models), and (2) use constraints on arguments across frames in the same sentence to get a more coherent global labeling of the sentence. We talk about these contributions, and also touch on "open" versus "closed" semantics, in both predicate-argument structure and information extraction. https://www.semanticscholar.org/paper/A-Joint-Sequential-and-Relational-Model-for-Frame-Yang-Mitchell/a1deb609e3758519cbe3f1a542bdf1ea52b6f224

  • 48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth

    29/01/2018 Duração: 27min

    AAAI 2017 paper, by Dan Roth. In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models or other multi-task signals to do better representation learning for your end task. Dan argues that we need to be thinking about this more explicitly in our research, instead of learning everything "end-to-end", as we will never have enough data to learn complex tasks directly from annotations alone. https://www.semanticscholar.org/paper/Incidental-Supervision-Moving-beyond-Supervised-Le-Roth/2997dcfc6d5ffc262d57d0a26f74d091de096573

  • 47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn

    24/01/2018 Duração: 35min

    How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer. https://arxiv.org/abs/1706.02596

  • 46 - Parsing with Traces, with Jonathan Kummerfeld

    08/01/2018 Duração: 39min

    TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein. Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current parsers totally ignore these phenomena. The second third of the conversation is about how the parser works, and we conclude by talking about some of the implications of the work, and where to go next - should we really be pushing harder on capturing linguistic structure when everyone seems to be going towards end-to-end learning on some higher-level task? https://www.semanticscholar.org/paper/Parsing-with-Traces-An-O-n-4-Algorithm-and-a-Struc-Kummerfeld-Klein/af89e56b3d9b720d43cae9f4971928c5cb95cbe3 Jonathan also

  • 45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao

    02/01/2018 Duração: 37min

    How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the workshop, what a "minimal pair" is, what tasks the workshop focused on and why, and what the main takeaways of the workshop were. https://www.semanticscholar.org/paper/Towards-Linguistically-Generalizable-NLP-Systems-A-Ettinger-Rao/8472e999f723a9ccaffc6089b7be1865d8a1b863

página 5 de 8