ASearcher: Large-Scale End-to-End RL Training for Search Agents
Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
In the ASearcher project, we demonstrate that large-scale end-to-end reinforcement learning can enable strong agent capabilities on complex search tasks, even with a minimalist agent design and a single open-source model. ASearcher first generates high-quality reinforcement learning data through a synthetic agent workflow. Then, leveraging the AReaL framework, it performs large-scale asynchronous RL training, achieving up to 128 agent–environment interactions per prompt during training for sufficient exploration. After RL training with a 32B model, ASearcher achieved scores of GAIA 58.1, xBench 51.1, and Frames 74.5 using only basic search tools, and can be further boosted at test time to outperform OpenAI DeepResearch and Kimi-Researcher, suggesting the great potential of RL scaling for agentic tasks.
The project is available at: https://github.com/inclusionAI/ASearcher/
Yi Wu is an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. He obtained his Ph.D. from UC Berkeley and was a researcher at OpenAI from 2019 to 2020. His research focuses on reinforcement learning, multi-agent learning and LLM agent. His representative works include the value iteration network, the MADDPG/MAPPO algorithm, OpenAI's hide-and-seek project, and the AReaL project. He received the best paper award at NIPS 2016, the best demo award finalist at ICRA 2024, and the 2025's MIT TR35 Asia Pacific Award.