The NLP Cypher | 12.26.21

AI Summer is Out Forever

Merry Christmas 🎄 for those celebrating. And Happy New Year!

Even OpenAI is feeling the holiday spirit: they open sourced their photorealistic GLIDE model several days ago.

Includes three notebooks:

The text2im

  • notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts.

The inpaint

  • notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt.

The clip_guided

  • notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts.

Parallel Inference with Adapters

A new feature on the adapters library for conducting inference with various adapters simultaneously. (not sure if parallizing is a real word, I just made it up).

Colab of the Week 🎉🥳

SetFit: Outperforming GPT-3 in Few-Shot Text-Classification


No more Transformer Diagrams 😂

Abhishek maps boring model diagrams to code for building intuition!

AGI and the Gov’t Apathy


Periodic Table of NLP Tasks

Streamlit demo…


JellyFish is a library for approximate & phonetic matching of strings.

Algos used…

For string comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

For phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

New Speech Models from Microsoft on 🤗 Hub


… a model pruning toolkit for pre-trained language models.

Deep Learning in NLP YouTube Lectures

Play the Shannon Game With Language Models

A new summarization evaluation metric called the Shannon Score is proposed. It performs the Shannon Game with a language model.




Next Level


Papers to Read 📚

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

PECOS — Predictions for Enormous and Correlated Output Spaces

PECOS is a machine learning framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.

Connected Papers 📈

Exploring Neural Models for Query-Focused Summarization

A systematic exploration of neural approaches to query summarization, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models.

Connected Papers 📈

Randomised Controlled Trial Abstract Result Tabulator

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Connected Papers 📈




Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Using Genetic Algorithms to Automate the Chrome Dinosaur Game (Part 2)

UDACITY SDCE Nanodegree: Term 1- Project 3: Behavioral Cloning!

Li-ion cell capacity estimation: LSTM neural network vs. Kalman Filter-based methods

Experiences from my internship in machine learning

Recommendation System Series Part 6: The 6 Variants of Autoencoders for Collaborative Filtering

Model Search: An open source platform for finding the best machine learning models

Machine Learning: An Introduction

How to build AutoML from scratch

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ricky Costa

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟

More from Medium

The Dangers of Context-Insensitivity in NLP

Going the extra mile, lessons learnt from Kaggle on how to train better NLP models (Part I)

Two minutes NLP — A Taxonomy of Tokenization Methods

Understanding Perplexity Metrics in Natural Language AI