prompt: UFO in the Sky | AI Generated


The NLP Cypher | 12.12.21


Ricky Costa
6 min readDec 12, 2021


Is Moore’s Law Finito?

NeurIPS Research Papers by Institution

Here’s a collection of papers by your favorite big tech and educational institutions.

A New and Blazing Fast WordPiece Tokenizer

GLaM| 1.2 Trillion Param Sparse Model

“The Generalist Language Model (GLaM), a trillion weight model that can be trained and served efficiently (in terms of computation and energy use) thanks to sparsity, and achieves competitive performance on multiple few-shot learning tasks. GLaM’s performance compares favorably to a dense language model, GPT-3 (175B) with significantly improved learning efficiency across 29 public NLP benchmarks in seven categories, spanning language completion, open-domain question answering, and natural language inference tasks.”

Glam vs. GPT-3 on NLG and NLU Tasks

Awesome Take Away:

This large sparse model is competitive with dense counterparts while training on much less data and consuming less energy.

Information Extraction from Scanned Receipts: Fine-tuning LayoutLM on SROIE

An OCR demo with LayoutLM fine-tuned for information extraction on receipts data.

AI Predictions Survey

Improving GitHub Search

Gopher — Deepmind’s Language Model

GauGAN2 | Photorealistic Text 2 Image

Transformers From Scratch

“I procrastinated a deep dive into transformers for a few years. Finally the discomfort of not knowing what makes them tick grew too great for me. Here is that …”

PyTorch | Julia (but not exactly like Julia)

“When trying to predict how PyTorch would itself get disrupted, we used to joke a bit about the next version of PyTorch being written in Julia. This was not very serious: a huge factor in moving PyTorch from Lua to Python was to tap into Python’s immense ecosystem (an ecosystem that shows no signs of going away) and even today it is still hard to imagine how a new language can overcome the network effects of Python.”

Decoding Text Generation Tutorial Top-K and Top-P

One of the most intuitive tutorials out there.

Punctuation Model

Attention Neural Networks Slides



Lemmatize spaCy

spaCy’s new lemmatizer is super accurate and blows XLM-RoBERTa out of the water! This blog post presents inner workings, benchmarks and quick start snippets. 😎

Awesome Papers 📚

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

Causal Distillation for Language Models

Distillation library that uses a third objective that encourages the student to imitate the causal computation process of the teacher through interchange intervention training (IIT).

Connected Papers 📈

CALVIN — A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

A simulated benchmark to learn long-horizon language-conditioned tasks. The aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language.

Connected Papers 📈



Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟