NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher | 12.26.21
AI Summer is Out Forever
Merry Christmas 🎄 for those celebrating. And Happy New Year!
Even OpenAI is feeling the holiday spirit: they open sourced their photorealistic GLIDE model several days ago.
GitHub - openai/glide-text2im: GLIDE: a diffusion-based text-conditional image synthesis model
This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image…
Includes three notebooks:
- notebook shows how to use GLIDE (filtered) with classifier-free guidance to produce images conditioned on text prompts.
- notebook shows how to use GLIDE (filtered) to fill in a masked region of an image, conditioned on a text prompt.
- notebook shows how to use GLIDE (filtered) + a filtered noise-aware CLIP model to produce images conditioned on text prompts.
Parallel Inference with Adapters
A new feature on the adapters library for conducting inference with various adapters simultaneously. (not sure if parallizing is a real word, I just made it up).
Colab of the Week 🎉🥳
SetFit: Outperforming GPT-3 in Few-Shot Text-Classification
Sentence Transformer Fine-Tuning (SetFit): Outperforms GPT-3 on few-shot Text-Classification while…
The GPT-n series show very promising results for few-shot NLP classification tasks and keep improving as their model…
No more Transformer Diagrams 😂
Abhishek maps boring model diagrams to code for building intuition!
AGI and the Gov’t Apathy
Why don't governments seem to mind that companies are explicitly trying to make AGIs? - EA Forum
Epistemic Status: Quickly written, uncertain. I'm fairly sure there's very little in terms of the public or government…
Periodic Table of NLP Tasks
JellyFish is a library for approximate & phonetic matching of strings.
For string comparison:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance
For phonetic encoding:
- American Soundex
- NYSIIS (New York State Identification and Intelligence System)
- Match Rating Codex
GitHub - jamesturk/jellyfish: 🎐 a python library for doing approximate and phonetic matching of…
jellyfish is a library for approximate & phonetic matching of strings. Source: https://github.com/jamesturk/jellyfish…
New Speech Models from Microsoft on 🤗 Hub
… a model pruning toolkit for pre-trained language models.
TextPruner/README_EN.md at main · airaria/TextPruner
TextPruner is a model pruning toolkit for pre-trained language models. It provides low-cost and training-free methods…
Deep Learning in NLP YouTube Lectures
Play the Shannon Game With Language Models
A new summarization evaluation metric called the Shannon Score is proposed. It performs the Shannon Game with a language model.
FakeYou, Your Deep Fake Text to Speech Website.
FakeYou. You can make deep fake text to speech audio and lip synced video.
The One Python Library Everyone Needs
Do you write programs in Python? You should be using attrs. Why, you ask? Don't ask. Just use it. Okay, fine. Let me…
Inference with Transformer models in the Browser
Inference with Transformer models in the Browser
Inference with Transformer models in the Browseraiserv.cloud
Papers to Read 📚
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
PECOS — Predictions for Enormous and Correlated Output Spaces
PECOS is a machine learning framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.
GitHub - amzn/pecos: PECOS - Prediction for Enormous and Correlated Spaces
PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large…
Exploring Neural Models for Query-Focused Summarization
A systematic exploration of neural approaches to query summarization, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models.
GitHub - salesforce/query-focused-sum
Official code repository for "Exploring Neural Models for Query-Focused Summarization" This is a work in progress…
A pruning framework which aims at maintaining both task-specific and task-agnostic knowledge during pruning.
GitHub - RunxinXu/ContrastivePruning: Source code for our AAAI'22 paper 《From Dense to Sparse…
Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model…
Randomised Controlled Trial Abstract Result Tabulator
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.
GitHub - jetsunwhitton/RCT-ART: RCT-ART is an NLP pipeline built with spaCy for converting clinical…
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly…
A task-independent benchmark for vision and language models centered on linguistic phenomena.
GitHub - Heidelberg-NLP/VALSE
💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena…