Photo by Kai Gradert on Unsplash

NLP News Cypher | 11.03.19

Ricky Costa
3 min readNov 3, 2019

Google Creates BERT, Google Adopts BERT

Summarization Meets Fact Checking

Batch Inference vs. Online Inference

Summary of Machine Learning Evaluation Metrics

Goal-Oriented Dialogue + Knowledge Base

Chitchat Dialogue is Hard!

Quick note: Today the EMNLP conference gets underway thru Nov. 7th. Quantum Stat will be dishing out research news and other highlights from top NLP researchers on our twitter feed HERE. Personally, I’m excited to see what comes away from the recent trend of coupling language models with knowledge graphs and further advancements of distilling large transformers.

Google Creates BERT, Google Adopts BERT

BERT, Google’s transformer is now being executed on its creator’s search engine which may impact 10% of all queries. BERT will be utilized to better serve longer search queries that requires more contextual understanding of natural language. Wonder how this will impact the way users conduct search going forward? If Google sees an uptick in question answering, BERT is a win with search. Time will tell.

Summarization Meets Fact Checking

In this @Salesforce Research paper, Richard Socher introduces a weakly-supervised model for fact checking summarized sentences vs. the source corpus.

The chart below shows the fragility of summarization and how it doesn’t take much to change the semantics of a transformed sentence.

https://arxiv.org/pdf/1910.12840.pdf

Batch Inference vs. Online Inference

When taking your next machine learning model to production, developers must understand the consequences of dealing with batch (static) inference models vs. online (dynamic) models. The latter is harder and you must maintain high quality of inference for your users in real-time. When do you choose one vs. the other?

“If the predictions do not need to be served immediately, you may opt for the simplicity of batch inference. If predictions need to served on an individual basis and within the time of a single web request, online inference is the way to go.”

Learn more here:

Summary of Machine Learning Evaluation Metrics

This article from FloydHub discusses the most popular machine learning metrics for model evaluation. Below are the metrics in question:

  • Confusion Matrix
  • Accuracy
  • Precision
  • Recall
  • Precision-Recall Curve
  • F1-Score
  • Area Under the Curve (AUC)

Goal-Oriented Dialogue + Knowledge Base Research

Meet the Neural Assistant, the AI takes user utterance alongside a knowledge base triple in helping to generate a KB guided response from the assistant. Below is an example conversation for restaurant search:

User: “Find me an inexpensive Italian restaurant in San Francisco”

(KB Triple: The Great Italian, cuisine, Italian)

Agent Response: “How about The Great Italian?

At the moment, knowledge bases with greater than 2,000 triples negatively impacts AI performance.

“The model is able to incorporate external knowledge effectively as long as the KB size is 2000 triples or smaller.”

Chitchat Dialogue is Hard!

Rasa, who open-sources its own chatbot framework, recently released their video archive of presenters from their recent dev conference. Below, Nouha Dziri from Google AI shares the difficulty of benchmarking dialogue quality and offers options for workarounds. Stanford & Facebook AI’s work, which we recently deployed ai.quantumstat.com, was discussed!

Finally, this past Friday we got a chance to check out Sebastian Ruder’s (DeepMind) presentation at NYU.

\

This column is a weekly round-up of NLP News and Code drops from researchers worldwide.

Follow us on Twitter for NLP News, Code & Demos: @Quantum_Stat

www.quantumstat.com

--

--

Ricky Costa
Ricky Costa

Written by Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟

No responses yet