The Titan’s Goblet | Cole

The NLP Cypher | 12.26.21

AI Summer is Out Forever

Merry Christmas 🎄 for those celebrating. And Happy New Year!

Even OpenAI is feeling the holiday spirit: they open sourced their photorealistic GLIDE model several days ago.

Includes three notebooks:

The text2im

  • notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts.

The inpaint

  • notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt.

The clip_guided

  • notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts.

Parallel Inference with Adapters

A new feature on the adapters library for conducting inference with various adapters simultaneously. (not sure if parallizing is a real word, I just made it up).

Colab of the Week 🎉🥳

SetFit: Outperforming GPT-3 in Few-Shot Text-Classification

Colab

No more Transformer Diagrams 😂

Abhishek maps boring model diagrams to code for building intuition!

AGI and the Gov’t Apathy

lol

Periodic Table of NLP Tasks

Streamlit demo…

JellyFish

JellyFish is a library for approximate & phonetic matching of strings.

Algos used…

For string comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

For phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

New Speech Models from Microsoft on 🤗 Hub

TextPruner

… a model pruning toolkit for pre-trained language models.

Deep Learning in NLP YouTube Lectures

Play the Shannon Game With Language Models

A new summarization evaluation metric called the Shannon Score is proposed. It performs the Shannon Game with a language model.

Paper: https://arxiv.org/pdf/2103.10918.pdf

FakeYou

attrs

Next Level

Demo

Papers to Read 📚

https://arxiv.org/pdf/2112.12731.pdf
https://arxiv.org/pdf/2112.10508.pdf
https://arxiv.org/pdf/2112.04426.pdf
https://arxiv.org/pdf/2112.11739.pdf

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

PECOS — Predictions for Enormous and Correlated Output Spaces

PECOS is a machine learning framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.

Connected Papers 📈

Exploring Neural Models for Query-Focused Summarization

A systematic exploration of neural approaches to query summarization, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models.

Connected Papers 📈

Randomised Controlled Trial Abstract Result Tabulator

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Connected Papers 📈

--

--

Ricky Costa

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟