6 18

Liam Duignan

Lduignan1

Lduignan1

AI & ML interests

NLP/Named Entity Recognition/LLMs

Recent Activity

liked a model 1 day ago

Qwen/Qwen3.5-9B

upvoted a paper 9 days ago

Reasoning Models Struggle to Control their Chains of Thought

liked a dataset about 2 months ago

nvidia/Nemotron-Math-v2

View all activity

Organizations

liked a model 1 day ago

Qwen/Qwen3.5-9B

Image-Text-to-Text • 10B • Updated 17 days ago • 2.27M • • 911

upvoted a paper 9 days ago

Reasoning Models Struggle to Control their Chains of Thought

Paper • 2603.05706 • Published 13 days ago • 31

liked a dataset about 2 months ago

nvidia/Nemotron-Math-v2

Viewer • Updated Feb 11 • 7.09M • 4.89k • 172

upvoted an article about 2 months ago

Article

TextQuests: How Good are LLMs at Text-Based Video Games?

Aug 12, 2025

•

liked a Space 2 months ago

Evaluation Guidebook

📝

285

Explore LLM benchmark trends over time

liked a dataset 4 months ago

allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 354k • 314

upvoted an article 4 months ago

Article

Integrating benchmarks into LM Evaluation Harness

Jul 21, 2025

•

upvoted an article 5 months ago

Article

Supercharge your OCR Pipelines with Open Models

Oct 21, 2025

•

307

liked a model 6 months ago

ibm-granite/granite-docling-258M

Image-Text-to-Text • Updated Sep 23, 2025 • 98.5k • 1.14k

liked 2 Spaces 11 months ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

OpenLLM French leaderboard 🇫🇷

🥇

Explore and submit LLM benchmarks

liked a dataset 11 months ago

OpenLLM-France/Lucie-Training-Dataset

Viewer • Updated May 27, 2025 • 10.9B • 5.48k • 35

liked a dataset 12 months ago

Lduignan1/MATH_LVL5

Viewer • Updated Mar 23, 2025 • 7.26k • 26 • 2

updated a dataset 12 months ago

Lduignan1/MATH_LVL5

Viewer • Updated Mar 23, 2025 • 7.26k • 26 • 2

published a dataset 12 months ago

Lduignan1/MATH_LVL5

Viewer • Updated Mar 23, 2025 • 7.26k • 26 • 2

liked a dataset about 1 year ago

manu/french-bench-grammar-vocab-reading

Viewer • Updated May 2, 2025 • 309 • 225 • 4

liked a dataset over 1 year ago

manu/french_bench_arc_challenge

Viewer • Updated Feb 29, 2024 • 2.59k • 154 • 3

upvoted a collection over 1 year ago

FrenchBench Evaluation datasets

Collection

These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8

liked 2 Spaces over 1 year ago

Open LLM Leaderboard

🏆

13.9k

Track, rank and evaluate open LLMs and chatbots

Open LLM Leaderboard

🏆

104

Track, rank and evaluate open LLMs and chatbots

Liam Duignan

AI & ML interests

Recent Activity

Organizations

Lduignan1's activity

TextQuests: How Good are LLMs at Text-Based Video Games?

Evaluation Guidebook

Integrating benchmarks into LM Evaluation Harness

Supercharge your OCR Pipelines with Open Models

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

OpenLLM French leaderboard 🇫🇷

Open LLM Leaderboard

Open LLM Leaderboard