-
The Ultra-Scale Playbook
🌌3.55kThe ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Robotics • 9B • Updated • 3.52k • 411
Collections
Discover the best community collections!
Collections including paper arxiv:2405.07863
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 29 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 12.4k • 183 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 512 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.44k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71
-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 39 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 16 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 16 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 20
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 12 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
RLHFlow/prompt-collection-v0.1
Viewer • Updated • 179k • 109 • 9 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 512 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.44k • 60 -
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer • Updated • 1M • 46 • 3
-
The Ultra-Scale Playbook
🌌3.55kThe ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Robotics • 9B • Updated • 3.52k • 411
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 29 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 12 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 12.4k • 183 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 512 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.44k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 10 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 39 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 16 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 16 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 20
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
RLHFlow/prompt-collection-v0.1
Viewer • Updated • 179k • 109 • 9 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 512 • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 1.44k • 60 -
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer • Updated • 1M • 46 • 3