NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher | 12.26.21
AI Summer is Out Forever
Merry Christmas 🎄 for those celebrating. And Happy New Year!
Even OpenAI is feeling the holiday spirit: they open sourced their photorealistic GLIDE model several days ago.
Includes three notebooks:
The text2im
- notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts.
The inpaint
- notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt.
The clip_guided
- notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts.
Parallel Inference with Adapters
A new feature on the adapters library for conducting inference with various adapters simultaneously. (not sure if parallizing is a real word, I just made it up).
Colab of the Week 🎉🥳
SetFit: Outperforming GPT-3 in Few-Shot Text-Classification
Colab
No more Transformer Diagrams 😂
Abhishek maps boring model diagrams to code for building intuition!
AGI and the Gov’t Apathy
lol
Periodic Table of NLP Tasks
Streamlit demo…
JellyFish
JellyFish is a library for approximate & phonetic matching of strings.
Algos used…
For string comparison:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance
For phonetic encoding:
- American Soundex
- Metaphone
- NYSIIS (New York State Identification and Intelligence System)
- Match Rating Codex
New Speech Models from Microsoft on 🤗 Hub
TextPruner
… a model pruning toolkit for pre-trained language models.
Deep Learning in NLP YouTube Lectures
Play the Shannon Game With Language Models
A new summarization evaluation metric called the Shannon Score is proposed. It performs the Shannon Game with a language model.
Paper: https://arxiv.org/pdf/2103.10918.pdf
FakeYou
attrs
Next Level
Demo
Papers to Read 📚
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
PECOS — Predictions for Enormous and Correlated Output Spaces
PECOS is a machine learning framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.
Exploring Neural Models for Query-Focused Summarization
A systematic exploration of neural approaches to query summarization, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models.
Contrastive Pruning
A pruning framework which aims at maintaining both task-specific and task-agnostic knowledge during pruning.
Randomised Controlled Trial Abstract Result Tabulator
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.
VALSE
A task-independent benchmark for vision and language models centered on linguistic phenomena.