Yao's picture

Yao

Huaxiu

·

https://www.huaxiuyao.io/

HuaxiuYaoML

AI & ML interests

None yet

Recent Activity

authored a paper 1 day ago

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

authored a paper 1 day ago

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

authored a paper 1 day ago

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

View all activity

Organizations

authored 7 papers 1 day ago

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Paper • 2602.08236 • Published Feb 9 • 9

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Paper • 2602.08234 • Published Feb 9 • 70

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 51

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Paper • 2602.22190 • Published 23 days ago • 15

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 8 days ago • 62

SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read

Paper • 2602.22426 • Published 23 days ago

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 3 days ago • 106

submitted a paper to Daily Papers 1 day ago

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 3 days ago • 106

authored 4 papers over 1 year ago

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Paper • 2410.13085 • Published Oct 16, 2024 • 24

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Paper • 2410.10139 • Published Oct 14, 2024 • 51

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5, 2024 • 55

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Paper • 2407.05131 • Published Jul 6, 2024 • 26