Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Hiring 💼
1238
160
124
Quentin Gallouédec
PRO
qgallouedec
Follow
amir1371's profile picture
Rookson's profile picture
jeremy-london's profile picture
619 followers
·
344 following
QGallouedec
qgallouedec
qgallouedec
qgallouedec.bsky.social
AI & ML interests
None yet
Recent Activity
posted
an
update
about 1 hour ago
TRL v1.2 introduces the SSDTrainer 🚀 Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL. The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model. ```python from trl.experimental.ssd import SSDConfig, SSDTrainer trainer = SSDTrainer( model="Qwen/Qwen3-4B-Instruct", args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95), train_dataset=dataset, ) trainer.train() ``` v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of `use_transformers_paged`, and key fixes for VLM response parsing. Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0
updated
a bucket
about 2 hours ago
hf-doc-build/doc
updated
a dataset
about 2 hours ago
hf-doc-build/doc-build
View all activity
Organizations
qgallouedec
's datasets
85
Sort: Recently updated
qgallouedec/test-grpo-vlm-log-completions
Viewer
•
Updated
28 days ago
•
435
•
324
qgallouedec/llama_star_formatted
Viewer
•
Updated
Feb 21
•
7.21k
•
11
qgallouedec/deepmath-completions-logs2
Viewer
•
Updated
Jan 22
•
48
•
58
qgallouedec/deepmath-completions-logs
Viewer
•
Updated
Jan 13
•
232
•
77
•
1
qgallouedec/Dolci-Think-DPO-7B
Viewer
•
Updated
Nov 28, 2025
•
150k
•
11
qgallouedec/biogrid_qa
Viewer
•
Updated
Nov 18, 2025
•
59.4k
•
209
qgallouedec/human_gene_interaction_qa_v2
Viewer
•
Updated
Nov 18, 2025
•
79.2k
•
11
qgallouedec/human_gene_interaction_qa
Viewer
•
Updated
Nov 17, 2025
•
1.84M
•
13
qgallouedec/biogrid
Viewer
•
Updated
Nov 17, 2025
•
2.82M
•
1.13k
qgallouedec/trl-metrics
Viewer
•
Updated
Oct 7, 2025
•
148k
•
37
•
1
qgallouedec/rick
Viewer
•
Updated
Sep 11, 2025
•
1.18k
•
8
qgallouedec/OpenMathReasoning
Viewer
•
Updated
Sep 10, 2025
•
10k
•
24
qgallouedec/math-lvl3to5-8k
Viewer
•
Updated
Aug 22, 2025
•
8.52k
•
13
qgallouedec/svg
Viewer
•
Updated
Aug 2, 2025
•
900
•
16
•
1
qgallouedec/rick-physics-grpo
Viewer
•
Updated
May 22, 2025
•
1.79k
•
23
•
1
qgallouedec/rick-science
Viewer
•
Updated
May 16, 2025
•
1.18k
•
19
•
3
qgallouedec/physics-problems
Viewer
•
Updated
May 10, 2025
•
247
•
11
qgallouedec/rick-teaches-math
Viewer
•
Updated
May 10, 2025
•
6.8k
•
11
qgallouedec/DAPO-Math-17k-Processed-Scored
Viewer
•
Updated
Apr 29, 2025
•
16.4k
•
12
•
3
qgallouedec/prm800k
Viewer
•
Updated
Dec 17, 2024
•
41.2k
•
14
•
3
qgallouedec/ultrafeedback-prompt
Viewer
•
Updated
Sep 9, 2024
•
60.9k
•
10
qgallouedec/ultrafeedback-gpt-3.5-turbo-helpfulness
Viewer
•
Updated
Sep 9, 2024
•
16.6k
•
5
qgallouedec/lm-human-preferences-descriptiveness
Viewer
•
Updated
Sep 9, 2024
•
6.26k
•
12
qgallouedec/lm-human-preferences-sentiment
Viewer
•
Updated
Sep 9, 2024
•
6.26k
•
5
qgallouedec/tldr-preference
Viewer
•
Updated
Sep 9, 2024
•
179k
•
8
qgallouedec/tldr
Viewer
•
Updated
Sep 9, 2024
•
130k
•
18
qgallouedec/hh-rlhf-helpful-base
Viewer
•
Updated
Sep 5, 2024
•
46.2k
•
6
qgallouedec/hh-rlhf-helpful-base-trl-style
Viewer
•
Updated
Sep 5, 2024
•
46.2k
•
24
qgallouedec/suap_essentials
Viewer
•
Updated
Aug 6, 2024
•
30
•
14
qgallouedec/qa_suap
Viewer
•
Updated
Jul 14, 2024
•
270
•
6
Previous
1
2
3
Next