NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 11.21.21

Inference Prime

5 min readNov 21, 2021

Hey … so have you ever deployed a state-of-the-art production level inference server? Don’t know how to do it?

Well… last week, Michael Benesty dropped a bomb when he published one of the first ever detailed blogs on how to not only deploy a production level inference API but benchmarking some of the most widely used frameworks such as FastAPI and Triton servers and runtime engines such as ONNX runtime (ORT) and TensorRT (TRT). Eventually, Michael recreated Hugging Face’s ability to reach a 1–2ms inference with miniLM & a T4 GPU. 👀

🥶🥶🥶🥶🥶🥶

Hugging Face Transformer Inference Under 1 Millisecond Latency

Go to production with Microsoft and Nvidia open source tooling

towardsdatascience.com

Code:

GitHub - ELS-RD/triton_transformers: Deploy optimized transformer based models on Nvidia Triton…

Yes, you can perfom inference with transformer based model in less than 1ms on the cheapest GPU available on Amazon…

github.com

Another Tutorial for Triton and Hugging Face Inference

How to deploy (almost) any Hugging face model on NVIDIA Triton Inference Server with an…

SUMMARY

medium.com

NVIDIA’s Triton Server Update

NVIDIA Announces Major Updates to Triton Inference Server as 25,000+ Companies Worldwide Deploy…

Capital One, Microsoft, Samsung Medison, Siemens Energy, Snap Among Industry Leaders Worldwide Using Platform NVIDIA AI…

www.yahoo.com

PyTorch LIT (talkin’ bout Inference)

PyTorch Lite Inference Toolkit: works with Hugging Face pipeline.

Here’s an example for text generation with GPT-J (6 Billi param model)

GitHub — AminRezaei0x443/PyTorch-LIT: Lite Inference Toolkit (LIT) for PyTorch

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on…

github.com

Model Size x 18 = Model Memory Required

Deets:

Performance and Scalability: How To Fit a Bigger Model and Train It Faster

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of…

huggingface.co

A Convenient Collection of Simple Python Code Snippets

Some Examples:

1 Hello World

2 JSON to CSV

3 Random Password Generator

4 Instagram Profile Info

6 Fetch links from Webpage

7 Todo App With Flask

8 Add Watermark on Images

9 WishList App Using Django

10 Split Folders into Subfolders

GitHub — Python-World/python-mini-projects: A collection of simple python mini projects to enhance…

A collection of simple python mini projects to enhance your Python skills. If you want to learn about python, visit…

github.com

OpenAI’s API Goes Open Range

OpenAI's API Now Available with No Waitlist

OpenAI is committed to the safe deployment of AI. Since the launch of our API, we've made deploying applications faster…

openai.com

G5 Instances at AWS w/ A10G GPUs

New - EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs | Amazon Web Services

Two years ago I told you about the then-new G4 instances, which featured up to eight NVIDIA T4 Tensor Core GPUs. These…

aws.amazon.com

Hop: Reading Files without Extracting Archive

“25x faster than unzip and 10x faster than tar at reading individual files (uncompressed)”

GitHub - Jarred-Sumner/hop

Simple archive format designed for quickly reading some files without extracting the entire archive. Possibly will be…

github.com

InfraNodus | Text Analysis Software

Create graphs with your text data.

InfraNodus: An Excellent Tool for Textual Data Analysis

Introduction to InfraNodus with example of Google Trends queries during the Covid-19 pandemic

towardsdatascience.com

Free Version:

GitHub - noduslabs/infranodus: A Node.Js / Neo4J tool that translates words and relations into…

A Node.Js / Neo4J tool that translates words and relations into network graphs and shows you how it all connects. …

github.com

Distributed Training w/ PyTorch Lightning and Ray

pip install ray-lightning

Getting Started With Ray Lightning: Easy Multi-Node PyTorch Lightning Training

Why distributed training is important and how you can use PyTorch Lightning with Ray to enable multi-node training and…

medium.com

Papers to Read 📚

Stanford’s Papers at EMNLP/CoNLL

Stanford AI Lab Papers at EMNLP/CoNLL 2021

The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) will take place next week…

ai.stanford.edu

https://arxiv.org/pdf/2111.08609.pdf

https://arxiv.org/pdf/2111.07991.pdf

https://arxiv.org/pdf/2111.07935.pdf

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

DeBERTa V3

Improving DeBERTa using ELECTRA Style Pre-Training with Gradient-Disentangled Embedding Sharing. On GLUE it achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure.

Includes a multi-lingual variant. :)

GitHub - microsoft/DeBERTa: The implementation of DeBERTa

This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Disentangled Attention Masked…

github.com

🤗 Model Pages

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Connected Papers 📈

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

A novel dataset of 4,631 community question answering threads for answer summarization.

GitHub - Alex-Fabbri/AnswerSumm

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Connected Papers 📈

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

GitHub - pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for…

github.com

Connected Papers 📈

DataCLUE: A Benchmark Suite for Data-centric NLP

A benchmark for Data Centric AI. It benchmarks how data modification can impact model’s performance. You can modify the training set and validation set, re-split the training set and validation set, or add data by non-crawler methods. The modification can be done by algorithms or programs or in combination with manual methods.

GitHub - nttcslab-sp-admin/TEDSummary

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name…

github.com

Connected Papers 📈

Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length

Dynamic-TinyBERT, a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1% loss drop).

Pretrained-Language-Model/TinyBERT at master · huawei-noah/Pretrained-Language-Model

TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks…

github.com

Connected Papers 📈

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Style transfer for voice cloning in text-to-speech (TTS).

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Songxiang Liu, Dan Su, Dong Yu 2 Tencent AI Lab The task of few-shot style transfer for voice cloning in text-to-speech…

liusongxiang.github.io

Connected Papers 📈

Quantum Stat

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 11.21.21

Inference Prime

Hugging Face Transformer Inference Under 1 Millisecond Latency

Go to production with Microsoft and Nvidia open source tooling

GitHub - ELS-RD/triton_transformers: Deploy optimized transformer based models on Nvidia Triton…

Yes, you can perfom inference with transformer based model in less than 1ms on the cheapest GPU available on Amazon…

How to deploy (almost) any Hugging face model on NVIDIA Triton Inference Server with an…

SUMMARY

NVIDIA Announces Major Updates to Triton Inference Server as 25,000+ Companies Worldwide Deploy…

Capital One, Microsoft, Samsung Medison, Siemens Energy, Snap Among Industry Leaders Worldwide Using Platform NVIDIA AI…

PyTorch LIT (talkin’ bout Inference)

GitHub — AminRezaei0x443/PyTorch-LIT: Lite Inference Toolkit (LIT) for PyTorch

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on…

Model Size x 18 = Model Memory Required

Performance and Scalability: How To Fit a Bigger Model and Train It Faster

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of…

A Convenient Collection of Simple Python Code Snippets

GitHub — Python-World/python-mini-projects: A collection of simple python mini projects to enhance…

A collection of simple python mini projects to enhance your Python skills. If you want to learn about python, visit…

OpenAI’s API Goes Open Range

OpenAI's API Now Available with No Waitlist

OpenAI is committed to the safe deployment of AI. Since the launch of our API, we've made deploying applications faster…

G5 Instances at AWS w/ A10G GPUs

New - EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs | Amazon Web Services

Two years ago I told you about the then-new G4 instances, which featured up to eight NVIDIA T4 Tensor Core GPUs. These…

Hop: Reading Files without Extracting Archive

GitHub - Jarred-Sumner/hop

Simple archive format designed for quickly reading some files without extracting the entire archive. Possibly will be…

InfraNodus | Text Analysis Software

InfraNodus: An Excellent Tool for Textual Data Analysis

Introduction to InfraNodus with example of Google Trends queries during the Covid-19 pandemic

GitHub - noduslabs/infranodus: A Node.Js / Neo4J tool that translates words and relations into…

A Node.Js / Neo4J tool that translates words and relations into network graphs and shows you how it all connects. …

Distributed Training w/ PyTorch Lightning and Ray

Getting Started With Ray Lightning: Easy Multi-Node PyTorch Lightning Training

Why distributed training is important and how you can use PyTorch Lightning with Ray to enable multi-node training and…

Papers to Read 📚

Stanford AI Lab Papers at EMNLP/CoNLL 2021

The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) will take place next week…

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

DeBERTa V3

GitHub - microsoft/DeBERTa: The implementation of DeBERTa

This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Disentangled Attention Masked…

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

GitHub - Alex-Fabbri/AnswerSumm

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

GitHub - pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for…

DataCLUE: A Benchmark Suite for Data-centric NLP

GitHub - nttcslab-sp-admin/TEDSummary

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name…

Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length

Pretrained-Language-Model/TinyBERT at master · huawei-noah/Pretrained-Language-Model

TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks…

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Songxiang Liu, Dan Su, Dong Yu 2 Tencent AI Lab The task of few-shot style transfer for voice cloning in text-to-speech…

Written by Ricky Costa