The NLP Cypher | 03.20.22

A Preemptive Strike

7 min readMar 21, 2022


Welcome back everyone👋!!!!

I have good news… as I speak… the future is being built…

… and the future is sparse !!!

(no idea what this is doing , but looks cool)

With the great engineering minds at Neural Magic, we’re all actively attempting to solve a very difficult problem. A problem that continues to haunt the AI industry regarding these very large deep learning models:

How do we get these large models into production without blowing up our hardware or our wallet?

We all want the same robust performance with our deep learning models. We want them to be accurate, as light as possible, and fast.

So… how do we achieve this? Well… it’s with sparsity and great engineering!

(FYI, check out some of the latest research here)

I have to be honest. I didn’t know how fast the research into sparsity was moving until I recently joined NM. In fact, I began experimenting with some of the latest software features released from the DeepSparse repo and I was really blown away (deets below).

For those new to the library, DeepSparse is an inference engine giving GPU level performance to sparsified models running on CPUs. 💪

So what’s an awesome feature from the repo? How about the DeepSparse Server, or as we nerds refer to it:


It’s an awesome inference HTTP server (built on top of FastAPI) that allows anyone to serve sparsified ONNX models (p.s. you can serve dense ONNX models too).

So very recently, I began to experiment with the deepsparse.serveron a Google Cloud instance (c2-standard-4 | 4vcpu’s and 16GB of RAM) to see how it would perform. And the performance of both the server and the sparse models from Neural Magic’s SparseZoo were remarkable! First, I was amazed I was able to load 18 BERTs on only 16GB of RAM 🥶. Which models you may ask? These 👇

And the models’ performance was incredible. I noticed a remarkable speedup in the latency of each model when compared to the previous month (where I tested the same exact models) however my cloud compute didn’t change. And as it turns out, one of the best kept secrets with Neural Magic software is that as the optimization software keeps improving over time, the models’ performance just keeps getting faster and faster. It’s the equivalent of parking a sedan in your garage and then in a few months you open the garage doors and you have Lambo. This is what I mean 👇

Notice how throughput (y-axis) keeps improving across different versions of DeepSparse (v0.7 to v0.9)?? The improvements are due to the Tensor Columns tech from us. Btw, we’re now on v0.11 so current performance is even better than what the graphic shows.

Ok Ok…. before I get back to the newsletter, I want to give you a heads up that I will eventually write a more detailed blog post including steps to reproduce my experiment (including code for running the server etc.) in due time. I just wanted to give you a heads up of why 2022 is a really exciting time to be working in deep learning and sparsity! ❤️

When you can, check out the DeepSparse repo and give it some⭐⭐⭐!

and now… for this week in NLP…🤣 …

They Promised Us Flying Cars, Instead We Got…

They Promised Us Tinder for Cats, Instead We Got…

(encryption with Emojis 😂😂)

Coding an Entire Video Game Using OpenAI’s DaVinci

This is next level.

Tech Ops Douche Level Check List

I averaged 3.5 ⭐⭐⭐but don’t think it’s `mission-critical`!

PyTorch vs. Tensorflow the Eternal Battle


A Visual Intro to ML

Goopt: A Search Engine but GPT-3

This is proof we are in a simulation.

New CodeParrot Version Released🦜

CodeParrot is a GPT-2 model (1.5B parameters) trained to generate Python code.

You.com Gets an AI Writing Assistant

You.com, a new search engine founded by Ex-Salesforce peeps, now comes with it’s own AI writer.

What Does it Take to Win at Competitive ML?

When Hackers Get Bored

TorBot is an open source intelligence tool developed in python. The main objective of this project is to collect open data from the deep web (aka dark web) and with the help of data mining algorithms, collect as much information as possible and produce an interactive tree graph. The interactive tree graph module will be able to display the relations of the collected intelligence data.

The AI Index is Out

The AI Index was released, juicy stuff begins on pg. 45 imo…

Full report: https://aiindex.stanford.edu/wp-content/uploads/2022/03/2022-AI-Index-Report_Master.pdf

Papers to Read 📚


Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

Benchmark includes four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. The benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, nationality/region, language, and legal area).

UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

UNISAR extends existing autoregressive language models to incorporate three non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships; (2) constrained decoding to decode well structured SQL for a given database schema; and (3) SQL completion to complete potential missing JOIN relationships in SQL based on database schema.



Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟