Play Me

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 01.23.22

Desiderata

Ricky Costa
6 min readJan 23, 2022

--

🕵️‍♂️Has AI interest peaked?

https://trends.google.com/trends/explore?date=all&q=deep%20learning,Artificial%20Intelligence

If you’re bummed, you can always… 👇

Graph ML in 2022: Where Are We Now?

The State of Web-Scraping 2022

DARPA and OSS 🕵️‍♀️

Press Release: https://www.darpa.mil/news-events/2021-12-21

The DARPA GARD program seeks to establish theoretical ML system foundations to identify system vulnerabilities, characterize properties that will enhance system robustness, and encourage the creation of effective defenses. Currently, ML defenses tend to be highly specific and are effective only against particular attacks. GARD seeks to develop defenses capable of defending against broad categories of attacks. Furthermore, current evaluation paradigms of AI robustness often focus on simplistic measures that may not be relevant to security. To verify relevance to security and wide applicability, defenses generated under GARD will be measured in a novel testbed employing scenario-based evaluations.

Repos mentioned in the press release:

State of Machine Learning in Julia

For Those interested in Semantic Similarity

Free CS Classes

Google Style Guide for Python

From the Creator of FastAPI 👉 Asyncer

“The main goal of Asyncer is to improve developer experience by providing better support for autocompletion and inline errors in the editor, and more certainty that the code is bug-free by providing better support for type checking tools like mypy.”

Real-Time Machine Learning

Handling Large Messages with Kafka

Sentence Segmentation

Kaggle Solutions Repo

Happy Transformer

OSLO: Extending the Training Capability for Transformers

SeaTunnel

Problems it attempts to solve:

  • Data loss and duplication
  • Task accumulation and delay
  • Low throughput
  • Long cycle to be applied in the production environment
  • Lack of application running status monitoring

Cresset — A PyTorch Universal Docker Template

Papers to Read📚

https://arxiv.org/pdf/2110.03742.pdf

From the Lex Fridman podcast featuring Yann LeCun as guest:

It’s cued up to the moment Yann mentions the paper above.

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

COPA-SSE

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset, a variant of the Choice of Plausible Alternatives (COPA) benchmark. The explanations are formatted as a set of triple-like common sense statements with ConceptNet relations but freely written concepts.

Connected Papers 📈

SQUIRE: A Sequence-to-sequence Framework for Multi-hop Knowledge Graph Reasoning

The first sequence-to-sequence based multi-hop reasoning framework, which utilizes an encoder-decoder structure to translate the triple query to a multi-hop path.

Connected Papers 📈

Datasheet for the Pile

This datasheet describes the Pile, a 825 GiB dataset of human-authored text compiled by EleutherAI for use in large-scale language modeling. The Pile is comprised of 22 different text sources, ranging from original scrapes done for this project, to text data made available by the data owners, to third-party scrapes available online.

Connected Papers 📈

UnifiedSKG📚: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

The UnifiedSKG framework, which unifies 21 SKG tasks into the text-to-text format, aiming to promote systematic SKG research - instead of being exclusive to a single task, domain, or dataset. It shows that large language models like T5, with simple modification when necessary, achieve state-of-the-art performance on nearly all 21 tasks.

Connected Papers 📈

--

--

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟