Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.19253

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32
The Era of Agentic Organization: Learning to Organize with Language Models

Paper • 2510.26658 • Published Oct 30 • 26
Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published Oct 28 • 97

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16 • 44
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 30
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

Paper • 2503.11514 • Published Mar 13 • 18
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Paper • 2502.19328 • Published Feb 26 • 23

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 91
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 109
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Paper • 2503.22675 • Published Mar 28 • 36
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Paper • 2503.22230 • Published Mar 28 • 45
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Paper • 2509.13313 • Published Sep 16 • 79
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16 • 67

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30
MPO: Boosting LLM Agents with Meta Plan Optimization

Paper • 2503.02682 • Published Mar 4 • 28
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 91

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 91
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 109
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32
The Era of Agentic Organization: Learning to Organize with Language Models

Paper • 2510.26658 • Published Oct 30 • 26
Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published Oct 28 • 97

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Paper • 2503.22675 • Published Mar 28 • 36
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Paper • 2503.22230 • Published Mar 28 • 45
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Paper • 2509.13313 • Published Sep 16 • 79
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16 • 67

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16 • 44
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 30
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

Paper • 2503.11514 • Published Mar 13 • 18
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Paper • 2502.19328 • Published Feb 26 • 23

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30
MPO: Boosting LLM Agents with Meta Plan Optimization

Paper • 2503.02682 • Published Mar 4 • 28
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 91

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs