NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 07.11.21

Plata o Plomo

5 min readJul 11, 2021

--

Welcome back! Hope you had a great week. We have a new leader on the SuperGLUE benchmark with a new Ernie model from Baidu comprising of 10 billion parameters trained on on a 4TB corpus. FYI, human baseline was already beat by Microsoft’s DeBERTa model at the beginning of the year… time for a new SuperSuperGLUE benchmark???

Paper

SuperGLUE Benchmark

SuperGLUE is a new benchmark styled after original GLUE benchmark with a set of more difficult language understanding…

super.gluebenchmark.com

The Codex Paper

BTW, if you are still interested in GitHub’s CoPilot, I stumbled upon the Codex paper this week:

Paper

DeepMind’s Perceiver

DeepMind’s Perceiver transformer allows it to take a variety of modalities (vision, audio, text) as its input and able to achieve competitive outcomes in benchmark performance. Usually a model architecture is specialized to a specific domain, however what the Perceiver is attempting to do here is being able to generalize to any domain using a single architecture. 😎

Paper

The Long-Short Transformer

Adding to the list of efficient transformers, comes the LS-Transformer that be both used for autoregressive and bi-directional models and for both language and vision domains. Model obtains SOTA results on the Long Range Arena, char-level language modeling and ImageNet classification.

Paper:

LINK

Deep Learning Videos

170 video lectures from Sebastian Raschka in 2021 using PyTorch.

Table of Contents

Introduction to Deep Learning

I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be…

sebastianraschka.com

Python Deep Learning Notebooks

Jupyter notebooks implementing the code samples found in the book Deep Learning with Python, 2nd Edition.

fchollet/deep-learning-with-python-notebooks

This repository contains Jupyter notebooks implementing the code samples found in the book Deep Learning with Python…

github.com

Hugging Face’s Model Parallelism Intro

A conceptual intro to model parallelism touching on several techniques highlighted below. HF also highlights which of the techniques are currently implemented in their library.

DataParallel (DP) — the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) — each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) — the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) — Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model does’t need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP — is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.

Source

Model Parallelism

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of…

huggingface.co

Faster Inference in Haystack’s QA System

Reducing the ‘top_k_retriever’ parameter is the trick here. This parameter represents the number of documents the reader model evaluates.

Parameter-Tweaking: Get Faster Answers from Your Haystack Pipeline

This article is the first in our series on optimizing your Haystack question answering system. We’ll link to the other…

medium.com

Common Errors in Training Data

Blog post reviewing three situations where your data goes wrong:

Labeling Errors
Unbalanced Training Data
Bias in Labeling Process

Types of Errors We See with Training Data: How to Recognize and Avoid Common Data Error

It’s helpful to contrast AI development with traditional software development. In traditional software, you write code…

appen.com

Software Updates

spaCy 3.1:

Introducing spaCy v3.1 · Explosion

It's been great to see the adoption of spaCy v3, which introduced transformer-based pipelines, a new training system…

explosion.ai

Adapters 2.1.0:

Release v2.1.0 · Adapter-Hub/adapter-transformers

Based on transformers v4.8.2 Add support for loading adapters from HuggingFace Model Hub (@calpt via #162) Add method…

github.com

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

Power Law Graph Transformer

A new way to generalize and analyze data representations of graph structure of a dataset while keeping the same prediction capabilities of an attention based encoder-decoder model.

burcgokden/Power-Law-Graph-Transformer

This repository is the implementation of the Power Law Graph Transformer (PLGT) detailed in the research article: Power…

github.com

Connected Papers 📈

Learned Token Pruning

Transformer inference quadratically scales with the input sequence length. This makes it difficult to use transformers for processing long sequences. Learned Token Pruning (LTP) is a method that reduces redundant tokens as the data passes through the different layers of the transformer.

kssteven418/LTP

Check our paper for more details. We follow the same installation procedure as the original Huggingface transformer…

github.com

Connected Papers 📈

Daseg

Using transformers for the conversational task of dialog act recognition.

pzelasko/daseg

A library for working with dialog acts. The preferred way to use daseg is with an anaconda environment. We tested it…

github.com

Connected Papers 📈

DRIFT Library

An application supporting customizable training of diachronic word embeddings with the TWEC model.

rajaswa/DRIFT

DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable…

github.com

Connected Papers 📈

Keep It Simple (KiS)

An approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity.

tingofurro/keep_it_simple

This repository contains the code for ACL2021 paper: Keep It Simple: Unsupervised Simplification of Multi-Paragraph…

github.com

Connected Papers 📈

DeepRapper

Neural Rap Generation with Rhyme and Rhythm Modeling. 😁

Measurement of BEAT in Generated Samples

We randomly generated more than 5,000 samples using DeepRapper and DeepRapper with beat frequency control. We compared…

deeprapper.github.io

Connected Papers 📈

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat

Quantum Stat

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 07.11.21

Plata o Plomo

SuperGLUE Benchmark

SuperGLUE is a new benchmark styled after original GLUE benchmark with a set of more difficult language understanding…

The Codex Paper

DeepMind’s Perceiver

The Long-Short Transformer

Deep Learning Videos

Introduction to Deep Learning

I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be…

Python Deep Learning Notebooks

fchollet/deep-learning-with-python-notebooks

This repository contains Jupyter notebooks implementing the code samples found in the book Deep Learning with Python…

Hugging Face’s Model Parallelism Intro

Model Parallelism

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of…

Faster Inference in Haystack’s QA System

Parameter-Tweaking: Get Faster Answers from Your Haystack Pipeline

This article is the first in our series on optimizing your Haystack question answering system. We’ll link to the other…

Common Errors in Training Data

Types of Errors We See with Training Data: How to Recognize and Avoid Common Data Error

It’s helpful to contrast AI development with traditional software development. In traditional software, you write code…

Software Updates

spaCy 3.1:

Introducing spaCy v3.1 · Explosion

It's been great to see the adoption of spaCy v3, which introduced transformer-based pipelines, a new training system…

Adapters 2.1.0:

Release v2.1.0 · Adapter-Hub/adapter-transformers

Based on transformers v4.8.2 Add support for loading adapters from HuggingFace Model Hub (@calpt via #162) Add method…

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

Power Law Graph Transformer

burcgokden/Power-Law-Graph-Transformer

This repository is the implementation of the Power Law Graph Transformer (PLGT) detailed in the research article: Power…

Learned Token Pruning

kssteven418/LTP

Check our paper for more details. We follow the same installation procedure as the original Huggingface transformer…

Daseg

pzelasko/daseg

A library for working with dialog acts. The preferred way to use daseg is with an anaconda environment. We tested it…

DRIFT Library

rajaswa/DRIFT

DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable…

Keep It Simple (KiS)

tingofurro/keep_it_simple

This repository contains the code for ACL2021 paper: Keep It Simple: Unsupervised Simplification of Multi-Paragraph…

DeepRapper

Measurement of BEAT in Generated Samples

We randomly generated more than 5,000 samples using DeepRapper and DeepRapper with beat frequency control. We compared…

Written by Ricky Costa