NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 01.23.22

Desiderata

🕵️‍♂️Has AI interest peaked?

https://trends.google.com/trends/explore?date=all&q=deep%20learning,Artificial%20Intelligence

If you’re bummed, you can always… 👇

Graph ML in 2022: Where Are We Now?

The State of Web-Scraping 2022

DARPA and OSS 🕵️‍♀️

Press Release: https://www.darpa.mil/news-events/2021-12-21

The DARPA GARD program seeks to establish theoretical ML system foundations to identify system vulnerabilities, characterize properties that will enhance system robustness, and encourage the creation of effective defenses. Currently, ML defenses tend to be highly specific and are effective only against particular attacks. GARD seeks to develop defenses capable of defending against broad categories of attacks. Furthermore, current evaluation paradigms of AI robustness often focus on simplistic measures that may not be relevant to security. To verify relevance to security and wide applicability, defenses generated under GARD will be measured in a novel testbed employing scenario-based evaluations.

Repos mentioned in the press release:

State of Machine Learning in Julia

For Those interested in Semantic Similarity

Free CS Classes

Google Style Guide for Python

From the Creator of FastAPI 👉 Asyncer

“The main goal of Asyncer is to improve developer experience by providing better support for autocompletion and inline errors in the editor, and more certainty that the code is bug-free by providing better support for type checking tools like mypy.”

Real-Time Machine Learning

Handling Large Messages with Kafka

Sentence Segmentation

Kaggle Solutions Repo

Happy Transformer

OSLO: Extending the Training Capability for Transformers

SeaTunnel

Problems it attempts to solve:

  • Data loss and duplication
  • Task accumulation and delay
  • Low throughput
  • Long cycle to be applied in the production environment
  • Lack of application running status monitoring

Cresset — A PyTorch Universal Docker Template

Papers to Read📚

From the Lex Fridman podcast featuring Yann LeCun as guest:

It’s cued up to the moment Yann mentions the paper above.

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

COPA-SSE

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset, a variant of the Choice of Plausible Alternatives (COPA) benchmark. The explanations are formatted as a set of triple-like common sense statements with ConceptNet relations but freely written concepts.

Connected Papers 📈

SQUIRE: A Sequence-to-sequence Framework for Multi-hop Knowledge Graph Reasoning

The first sequence-to-sequence based multi-hop reasoning framework, which utilizes an encoder-decoder structure to translate the triple query to a multi-hop path.

Connected Papers 📈

Datasheet for the Pile

This datasheet describes the Pile, a 825 GiB dataset of human-authored text compiled by EleutherAI for use in large-scale language modeling. The Pile is comprised of 22 different text sources, ranging from original scrapes done for this project, to text data made available by the data owners, to third-party scrapes available online.

Connected Papers 📈

UnifiedSKG📚: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

The UnifiedSKG framework, which unifies 21 SKG tasks into the text-to-text format, aiming to promote systematic SKG research - instead of being exclusive to a single task, domain, or dataset. It shows that large language models like T5, with simple modification when necessary, achieve state-of-the-art performance on nearly all 21 tasks.

Connected Papers 📈

--

--

--

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ricky Costa

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟

More from Medium

Context Matters in Data-Centric NLP

Going the extra mile, lessons learnt from Kaggle on how to train better NLP models (Part I)

Two minutes NLP — Quick Introduction to Haystack

Keyword Selection – Supervised versus Unsupervised – Futuristic view