Xiaoyang Cao's picture

5

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

updated a model about 1 month ago

Sean13/llama-8b-instruct-v0.2-cpo-full-label_smoothing-0.1

published a model about 1 month ago

Sean13/llama-8b-instruct-v0.2-cpo-full-label_smoothing-0.1

updated a model about 1 month ago

Sean13/mistral-7b-instruct-v0.2-cpo-full-label_smoothing-0.1

View all activity

Organizations

None yet

upvoted a paper about 1 month ago

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Paper • 2511.08577 • Published Nov 11 • 105

upvoted 3 papers 3 months ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Paper • 2510.06710 • Published Oct 8 • 39

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3 • 97

Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment

Paper • 2509.24159 • Published Sep 29 • 1

upvoted a paper 7 months ago

VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments

Paper • 2506.02387 • Published Jun 3 • 58