A physical commonsense reasoning benchmark for 100+ languages, written in collaboration with 300+ researchers from 65 countries.
Catherine Arnett
catherinearnett
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset 11 days ago
catherinearnett/bilingual-tokenizer-training-data published
a dataset 11 days ago
catherinearnett/bilingual-tokenizer-training-data liked
a dataset 21 days ago
commoncrawl/CommonLID