Quentin Gallouédec's picture

Hiring 💼

Quentin Gallouédec PRO

qgallouedec

huggingface

·

AI & ML interests

None yet

Recent Activity

posted an update about 1 hour ago

TRL v1.2 introduces the SSDTrainer 🚀 Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL. The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model. ```python from trl.experimental.ssd import SSDConfig, SSDTrainer trainer = SSDTrainer( model="Qwen/Qwen3-4B-Instruct", args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95), train_dataset=dataset, ) trainer.train() ``` v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of `use_transformers_paged`, and key fixes for VLM response parsing. Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0

updated a bucket about 2 hours ago

hf-doc-build/doc

updated a dataset about 2 hours ago

hf-doc-build/doc-build

View all activity

Organizations

qgallouedec 's datasets 85

qgallouedec/test-grpo-vlm-log-completions

Viewer • Updated 28 days ago • 435 • 324

qgallouedec/llama_star_formatted

Viewer • Updated Feb 21 • 7.21k • 11

qgallouedec/deepmath-completions-logs2

Viewer • Updated Jan 22 • 48 • 58

qgallouedec/deepmath-completions-logs

Viewer • Updated Jan 13 • 232 • 77 • 1

qgallouedec/Dolci-Think-DPO-7B

Viewer • Updated Nov 28, 2025 • 150k • 11

qgallouedec/biogrid_qa

Viewer • Updated Nov 18, 2025 • 59.4k • 209

qgallouedec/human_gene_interaction_qa_v2

Viewer • Updated Nov 18, 2025 • 79.2k • 11

qgallouedec/human_gene_interaction_qa

Viewer • Updated Nov 17, 2025 • 1.84M • 13

qgallouedec/biogrid

Viewer • Updated Nov 17, 2025 • 2.82M • 1.13k

qgallouedec/trl-metrics

Viewer • Updated Oct 7, 2025 • 148k • 37 • 1

qgallouedec/rick

Viewer • Updated Sep 11, 2025 • 1.18k • 8

qgallouedec/OpenMathReasoning

Viewer • Updated Sep 10, 2025 • 10k • 24

qgallouedec/math-lvl3to5-8k

Viewer • Updated Aug 22, 2025 • 8.52k • 13

qgallouedec/svg

Viewer • Updated Aug 2, 2025 • 900 • 16 • 1

qgallouedec/rick-physics-grpo

Viewer • Updated May 22, 2025 • 1.79k • 23 • 1

qgallouedec/rick-science

Viewer • Updated May 16, 2025 • 1.18k • 19 • 3

qgallouedec/physics-problems

Viewer • Updated May 10, 2025 • 247 • 11

qgallouedec/rick-teaches-math

Viewer • Updated May 10, 2025 • 6.8k • 11

qgallouedec/DAPO-Math-17k-Processed-Scored

Viewer • Updated Apr 29, 2025 • 16.4k • 12 • 3

qgallouedec/prm800k

Viewer • Updated Dec 17, 2024 • 41.2k • 14 • 3

qgallouedec/ultrafeedback-prompt

Viewer • Updated Sep 9, 2024 • 60.9k • 10

qgallouedec/ultrafeedback-gpt-3.5-turbo-helpfulness

Viewer • Updated Sep 9, 2024 • 16.6k • 5

qgallouedec/lm-human-preferences-descriptiveness

Viewer • Updated Sep 9, 2024 • 6.26k • 12

qgallouedec/lm-human-preferences-sentiment

Viewer • Updated Sep 9, 2024 • 6.26k • 5

qgallouedec/tldr-preference

Viewer • Updated Sep 9, 2024 • 179k • 8

qgallouedec/tldr

Viewer • Updated Sep 9, 2024 • 130k • 18

qgallouedec/hh-rlhf-helpful-base

Viewer • Updated Sep 5, 2024 • 46.2k • 6

qgallouedec/hh-rlhf-helpful-base-trl-style

Viewer • Updated Sep 5, 2024 • 46.2k • 24

qgallouedec/suap_essentials

Viewer • Updated Aug 6, 2024 • 30 • 14

qgallouedec/qa_suap

Viewer • Updated Jul 14, 2024 • 270 • 6