NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher | 08.08.21
Time to Pretend
Hey Welcome Back!
We have a new CLIP implementation from Max Woolf. It allows for faster experimentation and has some new features like using weighted prompts and using icons for priming the model to improve generation quality. It was released today, try it out! It’s trippy:
Textual (Text User Interface)
From the maker of Rich library, Will McGugan, Textual is a new project where you can create some amazing apps in terminal. 😎😎
Ciphey | NLP in Encryption
Looks like NLP has arrived for cracking encryption. Let’s say you wanted to know “How was X encrypted?” Ciphey was built to answer this question.
Under the hood:
“Ciphey uses a custom built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customisable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.”
Recent papers you need to read:
These three papers cover prompting, question answering and the fragility of evaluation benchmarks.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
Collection of Repos for Parsing PDFs
Document Layout Analysis resources for development with PdfPig. It’s in C#, sorry Python lovers.
Resources 📚
New NLP Videos by Dan Jurafsky dropped:
Machine Learning education content from aggregating 1,300 questions from an ML Course.
Pretty cool site with very simple and intuitive answers to technical ML questions. If you are looking for more math heavy stuff go elsewhere.
Here’s an example:
What do dropout layers do?
Dropout layers throw things away. Now you would be asking, why would I want my model to throw data away? It turns out that throwing things away when training a model can drastically improve a model’s performance in testing (where data is not throw away).
When to use dropout layers?
When you feel like your model is overfitting the input, makes the probability of dropping out higher. Often you dropout as much as possible because dropout usually makes a model more robust to noisy inputs.
Free PDF download for 2nd edition of Introduction to Stat Learning
CI/CD Tools Review Used in Machine Learning
A breakdown of all the most used tools for CI/CD including free and paid variants. You know you love Jenkins. (just saying 😂)
Stack Overflow Developer Survey
Breaks down tech stacks by media salary among other things 😎…
Summary Explorer: For Exploring Datasets and Models for Summarization
Get access to 50+ models for summarization including their paper, repo and Rouge scores. (In addition to visualizing a few summarization datasets).
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
MTVR Dataset
MTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips.
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
The official implementation of StyleGAN-NADA, a non-adversarial domain adaptation for image generators. Includes Colab.
InferWiki, Inferential Benchmark for Knowledge Graph Completion
InferWiki16k and InferWiki64k datasets for the knowledge graph completion task.
EmailSum Dataset
Email Thread Summarization (EMAILSUM) dataset, which contains human annotated short summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics.
We build amazing NLP software for companies worldwide. If you are looking for software development, check out our site and reach out to us here: info [at] quantumstat com