NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher | 11.21.21
Inference Prime
Hey … so have you ever deployed a state-of-the-art production level inference server? Don’t know how to do it?
Well… last week, Michael Benesty dropped a bomb when he published one of the first ever detailed blogs on how to not only deploy a production level inference API but benchmarking some of the most widely used frameworks such as FastAPI and Triton servers and runtime engines such as ONNX runtime (ORT) and TensorRT (TRT). Eventually, Michael recreated Hugging Face’s ability to reach a 1–2ms inference with miniLM & a T4 GPU. 👀
🥶🥶🥶🥶🥶🥶
Code:
Another Tutorial for Triton and Hugging Face Inference
NVIDIA’s Triton Server Update
PyTorch LIT (talkin’ bout Inference)
PyTorch Lite Inference Toolkit: works with Hugging Face pipeline.
Here’s an example for text generation with GPT-J (6 Billi param model)
Model Size x 18 = Model Memory Required
Deets:
A Convenient Collection of Simple Python Code Snippets
Some Examples:
10 Split Folders into Subfolders
OpenAI’s API Goes Open Range
G5 Instances at AWS w/ A10G GPUs
Hop: Reading Files without Extracting Archive
“25x faster than unzip
and 10x faster than tar
at reading individual files (uncompressed)”
InfraNodus | Text Analysis Software
Create graphs with your text data.
Free Version:
Distributed Training w/ PyTorch Lightning and Ray
pip install ray-lightning
Papers to Read 📚
Stanford’s Papers at EMNLP/CoNLL
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
DeBERTa V3
Improving DeBERTa using ELECTRA Style Pre-Training with Gradient-Disentangled Embedding Sharing. On GLUE it achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (SOTA) among the models with a similar structure.
Includes a multi-lingual variant. :)
🤗 Model Pages
AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization
A novel dataset of 4,631 community question answering threads for answer summarization.
DataCLUE: A Benchmark Suite for Data-centric NLP
A benchmark for Data Centric AI. It benchmarks how data modification can impact model’s performance. You can modify the training set and validation set, re-split the training set and validation set, or add data by non-crawler methods. The modification can be done by algorithms or programs or in combination with manual methods.
Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length
Dynamic-TinyBERT, a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1% loss drop).
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Style transfer for voice cloning in text-to-speech (TTS).