NLP News Cypher | 01.05.20
Where Eagles Dare…
“Before the first frost, a brilliant flash of blue-green light lit the snow and reminded us that winter was almost here.” — GPT-2 (1st iteration)
It’s 2020, can you believe it? It’s been 19 years since the monolith created a baby in space!
But seriously, it is 2020, and in my opinion, one of the best outcomes from the recent advancements in NLP/deep learning is the ease of fine-tuning and inference with only a couple lines of code:
Happy New Year!
This Week:
AI Recap For the New Year
The Common Voice
GPT-2 for the Twitter
The Italian BERT
RASA and the Community
Where is AI Going?
Keras for OCR
Yann Goes Deep
AI Recap For the New Year
And If you need to recap on all things deep learning check out this repo highlighting everything from preprocessing to transfer learning on notebooks.
Top models and libraries in use today:
The Common Voice
For those looking to dive into the speech-enabled app world. Check out Mozilla’s amazing set of audio datasets (multi-lingual too!).
GPT-2 for the Twitter
If you are looking to have your GPT-2 text generator fine-tuned on the text of a Twitter account, you first need to have your data arranged in the appropriate format. Max’s repo give us this, and afterwards, you can use his other repo, GPT-2 Simple, to generate the text!
GPT-2-simple GitHub:
The Italian BERT
In one of our Cypher’s back in November, I joked about how Mr. Di Sipio wanted an Italian BERT:
Well, we have one now! Turns out it’s called GilBERTo (Sorry BERTini)! And it’s architecture is based on RoBERTa:
GitHub:
RASA and the Community
From RASA, the open-sourced Conversational AI platform, you can now see how peeps are deploying dialogue systems (aka el chatbots) with their framework. Cool page to see how people are managing the chatbot hype.
Where is AI Going?
Top brass shares their thoughts on AI’s path. NLP gets a big shout out.
For what it’s worth, In 2020, I expect to see more multi-modal learning (Merging pictures/video and text) research and newer datasets. I find there is not enough entropy in raw text to model the world. In addition, expect to see more deployments into other languages other than English and more Symbolic/Connectionist integration (aka deep knowledge graphs).
Let’s see what they say:
Keras for OCR
Hey, remember OCR? (pip install tesseract) Well text detection is of importance if you want to convert images of text into digital text. Check out this wonderful repo using Keras implementation of Convolutional Recurrent Neural Network.
In addition, its performance is robust 👇!
GitHub (he looks happy):
Yann Goes Deep
DeepMind:
Yann LeCunn:
Me:
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends or social media!
For complete coverage, follow our twitter: @Quantum_Stat