Welcome back everyone👋!!!!
I have good news… as I speak… the future is being built…
… and the future is sparse !!!
With the great engineering minds at Neural Magic, we’re all actively attempting to solve a very difficult problem. A problem that continues to haunt the AI industry regarding these very large deep learning models:
How do we get these large models into production without blowing up our hardware or our wallet?
We all want the same robust performance with our deep learning models. We want them to be accurate, as light as possible, and fast.
So… how do we achieve this? Well… it’s with sparsity and great engineering!
I have to be honest. I didn’t know how fast the research into sparsity was moving until I recently joined NM. In fact, I began experimenting with some of the latest software features released from the DeepSparse repo and I was really blown away (deets below).
For those new to the library, DeepSparse is an inference engine giving GPU level performance to sparsified models running on CPUs. 💪
So what’s an awesome feature from the repo? How about the DeepSparse Server, or as we nerds refer to it:
It’s an awesome inference HTTP server (built on top of FastAPI) that allows anyone to serve sparsified ONNX models (p.s. you can serve dense ONNX models too).
So very recently, I began to experiment with the
deepsparse.serveron a Google Cloud instance (c2-standard-4 | 4vcpu’s and 16GB of RAM) to see how it would perform. And the performance of both the server and the sparse models from Neural Magic’s SparseZoo were remarkable! First, I was amazed I was able to load 18 BERTs on only 16GB of RAM 🥶. Which models you may ask? These 👇
And the models’ performance was incredible. I noticed a remarkable speedup in the latency of each model when compared to the previous month (where I tested the same exact models) however my cloud compute didn’t change. And as it turns out, one of the best kept secrets with Neural Magic software is that as the optimization software keeps improving over time, the models’ performance just keeps getting faster and faster. It’s the equivalent of parking a sedan in your garage and then in a few months you open the garage doors and you have Lambo. This is what I mean 👇
Notice how throughput (y-axis) keeps improving across different versions of DeepSparse (v0.7 to v0.9)?? The improvements are due to the Tensor Columns tech from us. Btw, we’re now on v0.11 so current performance is even better than what the graphic shows.
Ok Ok…. before I get back to the newsletter, I want to give you a heads up that I will eventually write a more detailed blog post including steps to reproduce my experiment (including code for running the server etc.) in due time. I just wanted to give you a heads up of why 2022 is a really exciting time to be working in deep learning and sparsity! ❤️
When you can, check out the DeepSparse repo and give it some⭐⭐⭐!
and now… for this week in NLP…🤣 …
They Promised Us Flying Cars, Instead We Got…
They Promised Us Tinder for Cats, Instead We Got…
(encryption with Emojis 😂😂)
My Secret Message
⚠️ The message cannot be blank ⚠️ The message cannot be blank RESULT  (not updated) RESULT  (not…
Coding an Entire Video Game Using OpenAI’s DaVinci
This is next level.
Building games and apps entirely through natural language using OpenAI's code-davinci model
https://media.giphy.com/media/vMFgJ4Uq1yqOtuT1Cc/giphy.gif TL;DR: OpenAI has a new code generating model that's…
Tech Ops Douche Level Check List
I averaged 3.5 ⭐⭐⭐but don’t think it’s `mission-critical`!
PyTorch vs. Tensorflow the Eternal Battle
A Visual Intro to ML
A visual introduction to machine learning
Let's revisit the 73-m elevation boundary proposed previously to see how we can improve upon our intuition. Clearly…
Goopt: A Search Engine but GPT-3
This is proof we are in a simulation.
GitHub - jokenox/Goopt: 🔍 Search Engine for a Procedural Simulation of the Web with GPT-3.
Search Engine for a Procedural Simulation of the Web with GPT-3 Web 4.0 could be the propitious evolution for the…
New CodeParrot Version Released🦜
CodeParrot is a GPT-2 model (1.5B parameters) trained to generate Python code.
lvwerra/codeparrot · Hugging Face
CodeParrot 🦜 is a GPT-2 model (1.5B parameters) trained to generate Python code. After the initial training and…
You.com Gets an AI Writing Assistant
You.com, a new search engine founded by Ex-Salesforce peeps, now comes with it’s own AI writer.
What Does it Take to Win at Competitive ML?
Winning at Competitive ML in 2022
Just like last year, we've partnered with Eniola Olaleye to look back and analyse the previous year's competitions…
When Hackers Get Bored
TorBot is an open source intelligence tool developed in python. The main objective of this project is to collect open data from the deep web (aka dark web) and with the help of data mining algorithms, collect as much information as possible and produce an interactive tree graph. The interactive tree graph module will be able to display the relations of the collected intelligence data.
TorBot - Open Source Intelligence Tool for the Dark Web
TorBot is an open source intelligence tool developed in python. The main objective of this project is to collect open…
The AI Index is Out
The AI Index was released, juicy stuff begins on pg. 45 imo…
The AI Index Report - Artificial Intelligence Index
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)…
Papers to Read 📚
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
Benchmark includes four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. The benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, nationality/region, language, and legal area).
GRIPS takes in instructions designed for humans and automatically returns an improved, edited prompt, while allowing for API-based tuning.
UNISAR extends existing autoregressive language models to incorporate three non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships; (2) constrained decoding to decode well structured SQL for a given database schema; and (3) SQL completion to complete potential missing JOIN relationships in SQL based on database schema.
MoKGE, a novel method that diversifies the generative commonsense reasoning by a mixture of expert (MoE) strategy on knowledge graphs (KG).
NLX-GPT, a language model that can simultaneously predict an answer and explain it.
Tevatron is a toolkit for training and running dense retrievers with deep language models.