The NLP Cypher | 02.06.21


5 min readFeb 6, 2022


A Vendetta and 404s

Meanwhile Everything dropped this week…

DeepMind’s AlphaCode

Meanwhile back at the ranch…

Math Olympiad solver from OpenAI:


Meanwhile back at the ranch… again…

New GPT-NeoX 20B params dropped:

New Transformer Book Repo with Colab Notebooks!

Parsr: A PDF Parser that doesn’t suck 😎

SBERT Author Shreds the New GPT-3 Embeddings Offering 🥶🥶

“The biggest downside for the OpenAI embeddings endpoint is the high costs (about 8,000–600,000 times more expensive than open models on your infrastructure), the high dimensionality of up to 12288 dimensions (making downstream applications slow), and the extreme latency when computing embeddings. This hinders the actual usage of the embeddings for any search applications.”

🥶 Oops: Exposed databases on AWS

FYI: I had previously written about this issue over a year ago and even provided a search engine, it seems now more peeps are on top of this issue.

Scan the entire internet under 5 minutes:


How I got an FBI record at age 11 from dabbling in cryptography then got into more trouble 😭😭

ViLT Notebook for Visual Question Answering

author: Niels Rogge @ Hugging Face



How to Improve User Experience (and Behavior): Three Papers from Stanford’s Alexa Prize Team

For Practioners: How GPUs Work | A Thread

Data Engineering in Julia


A repo for validating models and data.

Task-Specific Knowledge Distillation for BERT using Transformers & Amazon SageMaker

Papers to Read 📚


Text Anonymization Benchmark (TAB)

The Text Anonymization Benchmark (TAB) is a new, open-source corpus for text anonymization. It comprises 1,268 English-language court cases from the European Court of Human Rights (ECHR) manually annotated with:

  • semantic categories for personal identifiers,
  • masking decisions (in regard to the re-identification risk for the person to protect),
  • confidential attributes,
  • co-reference relations.

Connected Papers 📈



Ricky Costa

