Spotlight
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu · Hongjin SU · Chen Xing · Boyu Mi · Qian Liu · Weijia Shi · Binyuan Hui · Fan Zhou · Yitao Liu · Tianbao Xie · Zhoujun Cheng · Siheng Zhao · Lingpeng Kong · Bailin Wang · Caiming Xiong · Tao Yu
We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to fully functional language agents necessitates models to ground natural language instructions effectively in diverse environments and execute valid actions within them, requiring models for the synergy between language and coding capabilities. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks. Comprehensive experiments demonstrate Lemur’s superiority over existing open-source models and its proficiency across various agent tasks involving human communication, tool usage, and interaction under fully- and partially- observable environments. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities, providing key insights into developing advanced open-source agents adept at reasoning, planning, and operating seamlessly across environments. Our model and code will be open-sourced.