-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 2.4k • 89 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 355 • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2504.05535
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 2.4k • 89 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 355 • 4
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31