NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 12.26.21

AI Summer is Out Forever

Ricky Costa

4 min readDec 26, 2021

Merry Christmas 🎄 for those celebrating. And Happy New Year!

Even OpenAI is feeling the holiday spirit: they open sourced their photorealistic GLIDE model several days ago.

GitHub - openai/glide-text2im: GLIDE: a diffusion-based text-conditional image synthesis model

This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image…

github.com

Includes three notebooks:

The text2im

notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts.

The inpaint

notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt.

The clip_guided

notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts.

Parallel Inference with Adapters

A new feature on the adapters library for conducting inference with various adapters simultaneously. (not sure if parallizing is a real word, I just made it up).

Colab of the Week 🎉🥳

Google Colaboratory

Edit description

colab.research.google.com

SetFit: Outperforming GPT-3 in Few-Shot Text-Classification

Sentence Transformer Fine-Tuning (SetFit): Outperforms GPT-3 on few-shot Text-Classification while…

The GPT-n series show very promising results for few-shot NLP classification tasks and keep improving as their model…

towardsdatascience.com

Colab

Google Colaboratory

Edit description

colab.research.google.com

No more Transformer Diagrams 😂

Abhishek maps boring model diagrams to code for building intuition!

AGI and the Gov’t Apathy

lol

Why don't governments seem to mind that companies are explicitly trying to make AGIs? - EA Forum

Epistemic Status: Quickly written, uncertain. I'm fairly sure there's very little in terms of the public or government…

forum.effectivealtruism.org

Periodic Table of NLP Tasks

Streamlit demo…

Periodic Table demo

Edit description

www.innerdoc.com

JellyFish

JellyFish is a library for approximate & phonetic matching of strings.

Algos used…

For string comparison:

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

For phonetic encoding:

American Soundex
Metaphone
NYSIIS (New York State Identification and Intelligence System)
Match Rating Codex

GitHub - jamesturk/jellyfish: 🎐 a python library for doing approximate and phonetic matching of…

jellyfish is a library for approximate & phonetic matching of strings. Source: https://github.com/jamesturk/jellyfish…

github.com

New Speech Models from Microsoft on 🤗 Hub

TextPruner

… a model pruning toolkit for pre-trained language models.

TextPruner/README_EN.md at main · airaria/TextPruner

TextPruner is a model pruning toolkit for pre-trained language models. It provides low-cost and training-free methods…

github.com

Deep Learning in NLP YouTube Lectures

Play the Shannon Game With Language Models

A new summarization evaluation metric called the Shannon Score is proposed. It performs the Shannon Game with a language model.

Paper: https://arxiv.org/pdf/2103.10918.pdf

FakeYou

FakeYou, Your Deep Fake Text to Speech Website.

FakeYou. You can make deep fake text to speech audio and lip synced video.

fakeyou.com

attrs

The One Python Library Everyone Needs

Do you write programs in Python? You should be using attrs. Why, you ask? Don't ask. Just use it. Okay, fine. Let me…

glyph.twistedmatrix.com

Next Level

Demo

Inference with Transformer models in the Browser

Inference with Transformer models in the Browseraiserv.cloud

Papers to Read 📚

https://arxiv.org/pdf/2112.12731.pdf

https://arxiv.org/pdf/2112.10508.pdf

https://arxiv.org/pdf/2112.04426.pdf

https://arxiv.org/pdf/2112.11739.pdf

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

PECOS — Predictions for Enormous and Correlated Output Spaces

PECOS is a machine learning framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.

GitHub - amzn/pecos: PECOS - Prediction for Enormous and Correlated Spaces

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large…

github.com

Connected Papers 📈

Exploring Neural Models for Query-Focused Summarization

A systematic exploration of neural approaches to query summarization, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models.

GitHub - salesforce/query-focused-sum

Official code repository for "Exploring Neural Models for Query-Focused Summarization" This is a work in progress…

github.com

Connected Papers 📈

Contrastive Pruning

A pruning framework which aims at maintaining both task-specific and task-agnostic knowledge during pruning.

GitHub - RunxinXu/ContrastivePruning: Source code for our AAAI'22 paper 《From Dense to Sparse…

Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model…

github.com

Connected Papers 📈

Randomised Controlled Trial Abstract Result Tabulator

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

GitHub - jetsunwhitton/RCT-ART: RCT-ART is an NLP pipeline built with spaCy for converting clinical…

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly…

github.com

Connected Papers 📈

VALSE

A task-independent benchmark for vision and language models centered on linguistic phenomena.

GitHub - Heidelberg-NLP/VALSE

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena…

github.com

Connected Papers 📈

Quantum Stat

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 12.26.21

AI Summer is Out Forever

GitHub - openai/glide-text2im: GLIDE: a diffusion-based text-conditional image synthesis model

This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image…

Parallel Inference with Adapters

Google Colaboratory

Edit description

SetFit: Outperforming GPT-3 in Few-Shot Text-Classification

Sentence Transformer Fine-Tuning (SetFit): Outperforms GPT-3 on few-shot Text-Classification while…

The GPT-n series show very promising results for few-shot NLP classification tasks and keep improving as their model…

Google Colaboratory

Edit description

No more Transformer Diagrams 😂

AGI and the Gov’t Apathy

Why don't governments seem to mind that companies are explicitly trying to make AGIs? - EA Forum

Epistemic Status: Quickly written, uncertain. I'm fairly sure there's very little in terms of the public or government…

Periodic Table of NLP Tasks

Periodic Table demo

Edit description

JellyFish

GitHub - jamesturk/jellyfish: 🎐 a python library for doing approximate and phonetic matching of…

jellyfish is a library for approximate & phonetic matching of strings. Source: https://github.com/jamesturk/jellyfish…

New Speech Models from Microsoft on 🤗 Hub

TextPruner

TextPruner/README_EN.md at main · airaria/TextPruner

TextPruner is a model pruning toolkit for pre-trained language models. It provides low-cost and training-free methods…

Deep Learning in NLP YouTube Lectures

Play the Shannon Game With Language Models

FakeYou

FakeYou, Your Deep Fake Text to Speech Website.

FakeYou. You can make deep fake text to speech audio and lip synced video.

attrs

The One Python Library Everyone Needs

Do you write programs in Python? You should be using attrs. Why, you ask? Don't ask. Just use it. Okay, fine. Let me…

Next Level

Inference with Transformer models in the Browser

Inference with Transformer models in the Browser

Papers to Read 📚

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

PECOS — Predictions for Enormous and Correlated Output Spaces

GitHub - amzn/pecos: PECOS - Prediction for Enormous and Correlated Spaces

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large…

Exploring Neural Models for Query-Focused Summarization

GitHub - salesforce/query-focused-sum

Official code repository for "Exploring Neural Models for Query-Focused Summarization" This is a work in progress…

Contrastive Pruning

GitHub - RunxinXu/ContrastivePruning: Source code for our AAAI'22 paper 《From Dense to Sparse…

Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model…

Randomised Controlled Trial Abstract Result Tabulator

GitHub - jetsunwhitton/RCT-ART: RCT-ART is an NLP pipeline built with spaCy for converting clinical…

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly…

VALSE

GitHub - Heidelberg-NLP/VALSE

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena…

Written by Ricky Costa

No responses yet