Impresso - Media Monitoring of the Past

university

https://impresso-project.ch

Activity Feed Request to join this org

AI & ML interests

Historical Media Analysis and Enrichment

Recent Activity

adrienjourne updated a dataset 3 days ago

impresso-project/ner-augmentation

adrienjourne updated a dataset 3 days ago

impresso-project/ner-eval-predictions

simon-clmtd updated a Space 13 days ago

impresso-project/multilingual-static-word-embeddings-demo

View all activity

Organization Card

Community About org cards

Impresso - Media Monitoring of the Past is an interdisciplinary research project using machine learning to transform how historical media are processed, enriched, explored, and studied across modalities, languages, time periods, and national borders.

We develop the 🚀 Impresso Web App and the 🔬 Impresso Datalab, providing access to a large multilingual corpus of historical newspapers and radio broadcasts.

🤖 Models and 📚 datasets

🤖 Impresso models for historical multilingual documents, including language identification, OCR quality assessment, topic inference, NER, and NEL.
📚 Impresso datasets curated from digitized historical media sources for ML development and evaluation. Upcoming releases include NER and NEL benchmarks from the HIPE evaluation campaign, an image type classification dataset, and more.

🏛️ Partners and funding

Impresso gratefully acknowledges the continued support of its cultural heritage partners, as well as funding from the SNSF (Grant Nos. CRSII5_173719 and CRSII5_213585) and the FNR (Grant No. 17498891).

spaces 13

Multilingual Dictionary Explorer

Find translations for a word across multiple languages

Impresso Topic Explorer

Topic model aggregate exploration for the Impresso corpus

Impresso Topic Explorer

Topic model aggregate exploration for the Impresso corpus

Ocrqa Exploration

OCR Quality Exploration on Impresso Corpus

Ad Classification Exploration

Explore yearly ad and non-ad distributions in Impresso

models 26

impresso-project/mallet-topic-inferencer

Updated 22 days ago

impresso-project/ner-hipe2020-hist-base

Token Classification • 0.1B • Updated 29 days ago • 30

impresso-project/ner-hipe2020-hist-medium

Token Classification • 41.9M • Updated 29 days ago • 34

impresso-project/mmbert-impresso-mediasources-ner

Token Classification • 0.3B • Updated May 31 • 80

impresso-project/mmbert-multilingual-impresso-continued-mlm

Fill-Mask • 0.3B • Updated May 31 • 132 • 1

impresso-project/impresso-ad-classification-xlm-one-class

0.3B • Updated May 19 • 7 • 1

impresso-project/german_print_20

impresso-project/frakturline-classification-cnn

Image Classification • Updated May 15 • 34

impresso-project/nel-mgenre-multilingual

0.6B • Updated Apr 18 • 114 • 4

impresso-project/OCR-diversely-robust-gte-multilingual-base

Sentence Similarity • 0.3B • Updated Apr 16 • 97 • 1

datasets 9

impresso-project/ner-eval-predictions

Viewer • Updated 3 days ago • 492k • 207

impresso-project/ner-augmentation

Viewer • Updated 3 days ago • 521k • 509

impresso-project/impresso-mediaagencies-ner-dataset

Viewer • Updated Jun 2 • 1.48k • 11 • 1

impresso-project/frakturline-dataset

Viewer • Updated Mar 24 • 32.4k • 12

impresso-project/frakturline-testset

Viewer • Updated Mar 23 • 2k • 12 • 1

impresso-project/wiki_comparable_corpus_en_de_hi_it_ko_zh

Viewer • Updated Feb 6 • 69.2k • 73 • 1

impresso-project/HistLuxAlign

Viewer • Updated Mar 11, 2025 • 59.6k • 139

impresso-project/sts-h-paraphrase-detection

Viewer • Updated Jan 30, 2025 • 338 • 47

impresso-project/amr-true-paraphrases

Viewer • Updated Jan 30, 2025 • 167 • 26