Collections
Discover the best community collections!
Collections including paper arxiv:2505.19253
-
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 32 -
The Era of Agentic Organization: Learning to Organize with Language Models
Paper • 2510.26658 • Published • 26 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 97
-
Personalize Anything for Free with Diffusion Transformer
Paper • 2503.12590 • Published • 44 -
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 30 -
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks
Paper • 2503.11514 • Published • 18 -
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Paper • 2502.19328 • Published • 23
-
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 32 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 91 -
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Paper • 2505.21497 • Published • 109 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 79 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Paper • 2502.18890 • Published • 30 -
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper • 2503.02682 • Published • 28 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 91
-
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 32 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 91 -
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Paper • 2505.21497 • Published • 109 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 32 -
The Era of Agentic Organization: Learning to Organize with Language Models
Paper • 2510.26658 • Published • 26 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 97
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 79 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
Personalize Anything for Free with Diffusion Transformer
Paper • 2503.12590 • Published • 44 -
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 30 -
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks
Paper • 2503.11514 • Published • 18 -
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Paper • 2502.19328 • Published • 23
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Paper • 2502.18890 • Published • 30 -
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper • 2503.02682 • Published • 28 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 91