NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher | 05.01.22
>>> curl -L http://git\.io/unix
Hey Welcome back! Want to start off by giving a few shout outs!!!
- Hectiq.AI hosted the big-yaml — Neural Magic’s config to serve 19 BERT models simultaneously, all under 16GBs of RAM! 😍 Demo
- If you want to know more about the demo above, you can read about it here: Thank you KDnuggets and Towards AI! 🚀
- Scrape tweets with Twint and classify it with a Neural Magic Sparse Transformer: Code 🧙♂️
ICLR Happened:
Stanford
Meta
Amazon
Apple
DeepMind
FormNet: A New Model for Document Understanding
Gets SOTA performance on the CORD, FUNSD, and Payment benchmarks.
WaNLI: Generate Your Own NLI dataset
Run Python in the Browser via HTML
The makers of Anaconda came out with this. 🍾❤️
Has only been tested on Chrome thus far.
Code:
DALL-E-2 | Performance and Limitations
Limitations of DALL-E-2 | a thread 🧵
DALL-E-2 PyTorch Implementation
Lucidrains for the win!
SEAL 🦭 Search Engines w/ Autoregressive LMs
Thread 🧵
Code
GPT-NeoX Annotated
DiffCSE — Meta’s New Sentence Embeddings Library
OpenAIs New Clip Model
(a silent drop)
Get Stoic
Papers to Read 📚
NLP Index👨💻
A collection of recently released repos that caught our 👁
LitMind Dictionary
An open-source online generative dictionary that takes a word and context containing the word as input and automatically generates a definition as output.
SkillSpan Repository
A novel skill extraction dataset consisting of 14.5K sentences and over 12.5K annotated spans from job postings.
Models
LM-Debugger
An interactive debugger tool for transformer-based LMs, which provides a fine-grained interpretation of the model’s internal prediction process.
SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues
The first large-scale dataset of dialogues transitioning from chit-chat to task-oriented scenarios.
PLOD: An Abbreviation Detection Dataset
An abbreviation detection dataset for scientific documents.