ChatDev/MultiAgentEbook/papers.csv

Title,Authors,Date,Abstract,Url,AwesomeListCategory,Categories,PaperIndex,Affiliation
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang",2024.5.20,"Recent advancements in machine translation (MT) have significantly enhanced
translation quality across various domains. However, the translation of literary
texts remains a formidable challenge due to their complex language, figurative ex-
pressions, and cultural nuances. In this work, we introduce a novel multi-agent
framework based on large language models (LLMs) for literary translation, im-
plemented as a company called TRANSAGENTS, which mirrors traditional trans-
lation publication process by leveraging the collective capabilities of multiple
agents, to address the intricate demands of translating literary works. To evaluate
the effectiveness of our system, we propose two innovative evaluation strategies:
Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP).
MHP assesses translations from the perspective of monolingual readers of the tar-
get language, while BLP uses advanced LLMs to compare translations directly
with the original texts. Empirical findings indicate that despite lower d-BLEU
scores, translations from TRANSAGENTS are preferred by both human evalua-
tors and LLMs over human-written references, particularly in genres requiring
domain-specific knowledge. We also highlight the strengths and limitations of
TRANSAGENTS through case studies and suggests directions for future research.",https://arxiv.org/abs/2405.11804,Organization,Computation and Language (cs.CL),(perhaps)_beyond_human_translation_20240520,"Monash University, University of Macau, Tencent AI Lab"
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang",2024.5.20,"Recent advancements in machine translation (MT) have significantly enhanced
translation quality across various domains. However, the translation of literary
texts remains a formidable challenge due to their complex language, figurative ex-
pressions, and cultural nuances. In this work, we introduce a novel multi-agent
framework based on large language models (LLMs) for literary translation, im-
plemented as a company called TRANSAGENTS, which mirrors traditional trans-
lation publication process by leveraging the collective capabilities of multiple
agents, to address the intricate demands of translating literary works. To evaluate
the effectiveness of our system, we propose two innovative evaluation strategies:
Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP).
MHP assesses translations from the perspective of monolingual readers of the tar-
get language, while BLP uses advanced LLMs to compare translations directly
with the original texts. Empirical findings indicate that despite lower d-BLEU
scores, translations from TRANSAGENTS are preferred by both human evalua-
tors and LLMs over human-written references, particularly in genres requiring
domain-specific knowledge. We also highlight the strengths and limitations of
TRANSAGENTS through case studies and suggests directions for future research.",https://arxiv.org/abs/2405.11804,Simulation,Computation and Language (cs.CL),(perhaps)_beyond_human_translation_20240520,"Monash University, University of Macau, Tencent AI Lab"
360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System,"Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang",2024.4.8,"Large
language
model
agents
have
demonstrated
remarkable
advancements
across various complex tasks. Recent works
focus on optimizing the agent team or
employing self-reflection to iteratively solve
complex tasks.
Since these agents are all
based on the same LLM, only conducting
self-evaluation or removing underperforming
agents does not substantively enhance the
capability of the agents.
We argue that a
comprehensive evaluation and accumulating
experience from evaluation feedback is an
effective
approach
to
improving
system
performance.
In this paper, we propose
Reusable
Experience
Accumulation
with
360◦ Assessment (360◦REA), a hierarchical
multi-agent framework inspired by corporate
organizational practices.
The framework
employs a novel 360◦ performance assessment
method for multi-perspective performance
evaluation with fine-grained assessment. To
enhance the capability of agents in addressing
complex
tasks,
we
introduce
dual-level
experience pool for agents to accumulate
experience through fine-grained assessment.
Extensive
experiments
on
complex
task
datasets demonstrate the effectiveness of
360◦REA.",https://arxiv.org/abs/2404.05569,Evolution,Artificial Intelligence (cs.AI),360°rea_towards_a_reusable_20240408,"University of Electronic Science and Technology of China, Shandong University, Renmin University of China, National University of Defense Technology, Tsinghua University"
Affordable Generative Agents,"Yangbin Yu, Qin Zhang, Junyou Li, Qiang Fu, Deheng Ye",2024.2.3,"The emergence of large language models (LLMs)
has significantly advanced the simulation of
believable interactive agents.
However, the
substantial cost on maintaining the prolonged
agent interactions poses challenge over the
deployment of believable LLM-based agents.
Therefore, in this paper, we develop Affordable
Generative Agents (AGA), a framework for
enabling the generation of believable and
low-cost interactions on both agent-environment
and inter-agents levels. Specifically, for agent-
environment interactions, we substitute repetitive
LLM inferences with learned policies; while for
inter-agent interactions, we model the social rela-
tionships between agents and compress auxiliary
dialogue information. Extensive experiments on
multiple environments show the effectiveness
and efficiency of our proposed framework. Also,
we delve into the mechanisms of emergent
believable behaviors lying in LLM agents,
demonstrating that agents can only generate
finite behaviors in fixed environments, based
upon which, we understand ways to facilitate
emergent interaction behaviors.
Our code is
publicly available at:
https://github.
com/AffordableGenerativeAgents/
Affordable-Generative-Agents.",https://arxiv.org/abs/2402.02053,Evolution,Artificial Intelligence (cs.AI),affordable_generative_agents_20240203,Tencent Inc.
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the
entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by
large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness
within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can
simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep
accumulating experience from both successful and unsuccessful cases. Simulation experiments show that
the treatment performance of doctor agents consistently improves on various tasks. More interestingly,
the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare
benchmarks. After treating around ten thousand patients (real-world doctors may take over two years),
the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Evolution,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the
entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by
large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness
within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can
simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep
accumulating experience from both successful and unsuccessful cases. Simulation experiments show that
the treatment performance of doctor agents consistently improves on various tasks. More interestingly,
the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare
benchmarks. After treating around ten thousand patients (real-world doctors may take over two years),
the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Organization,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the
entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by
large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness
within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can
simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep
accumulating experience from both successful and unsuccessful cases. Simulation experiments show that
the treatment performance of doctor agents consistently improves on various tasks. More interestingly,
the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare
benchmarks. After treating around ten thousand patients (real-world doctors may take over two years),
the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Simulation,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen",2023.10.13,"Recently, there has been an emergence of employing LLM-powered
agents as believable human proxies, based on their remarkable
decision-making capability. However, existing studies mainly focus
on simulating human dialogue. Human non-verbal behaviors, such
as item clicking in recommender systems, although implicitly ex-
hibiting user preferences and could enhance the modeling of users,
have not been deeply explored. The main reasons lie in the gap
between language modeling and behavior modeling, as well as the
incomprehension of LLMs about user-item relations.
To address this issue, we propose AgentCF for simulating user-
item interactions in recommender systems through agent-based
collaborative filtering. We creatively consider not only users but
also items as agents, and develop a collaborative learning approach
that optimizes both kinds of agents together. Specifically, at each
time step, we first prompt the user and item agents to interact au-
tonomously. Then, based on the disparities between the agents’
decisions and real-world interaction records, user and item agents
are prompted to reflect on and adjust the misleading simulations
collaboratively, thereby modeling their two-sided relations. The op-
timized agents can also propagate their preferences to other agents
in subsequent interactions, implicitly capturing the collaborative fil-
tering idea. Overall, the optimized agents exhibit diverse interaction
behaviors within our framework, including user-item, user-user,
item-item, and collective interactions. The results show that these
agents can demonstrate personalized behaviors akin to those of real-
world individuals, sparking the development of next-generation
user behavior simulation.",https://arxiv.org/abs/2310.09233,Communication,Information Retrieval (cs.IR),agentcf_collaborative_learning_with_20231013,"Renmin University of China, UC San Diego, Tencent"
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen",2023.10.13,"Recently, there has been an emergence of employing LLM-powered
agents as believable human proxies, based on their remarkable
decision-making capability. However, existing studies mainly focus
on simulating human dialogue. Human non-verbal behaviors, such
as item clicking in recommender systems, although implicitly ex-
hibiting user preferences and could enhance the modeling of users,
have not been deeply explored. The main reasons lie in the gap
between language modeling and behavior modeling, as well as the
incomprehension of LLMs about user-item relations.
To address this issue, we propose AgentCF for simulating user-
item interactions in recommender systems through agent-based
collaborative filtering. We creatively consider not only users but
also items as agents, and develop a collaborative learning approach
that optimizes both kinds of agents together. Specifically, at each
time step, we first prompt the user and item agents to interact au-
tonomously. Then, based on the disparities between the agents’
decisions and real-world interaction records, user and item agents
are prompted to reflect on and adjust the misleading simulations
collaboratively, thereby modeling their two-sided relations. The op-
timized agents can also propagate their preferences to other agents
in subsequent interactions, implicitly capturing the collaborative fil-
tering idea. Overall, the optimized agents exhibit diverse interaction
behaviors within our framework, including user-item, user-user,
item-item, and collective interactions. The results show that these
agents can demonstrate personalized behaviors akin to those of real-
world individuals, sparking the development of next-generation
user behavior simulation.",https://arxiv.org/abs/2310.09233,Simulation,Information Retrieval (cs.IR),agentcf_collaborative_learning_with_20231013,"Renmin University of China, UC San Diego, Tencent"
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou",2023.8.21,"Autonomous agents empowered by Large Language Models (LLMs) have under-
gone significant improvements, enabling them to generalize across a broad spec-
trum of tasks. However, in real-world scenarios, cooperation among individuals is
often required to enhance the efficiency and effectiveness of task accomplishment.
Hence, inspired by human group dynamics, we propose a multi-agent framework
AGENTVERSE that can effectively orchestrate a collaborative group of expert agents
as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that
AGENTVERSE can proficiently deploy multi-agent groups that outperform a single
agent. Extensive experiments on text understanding, reasoning, coding, tool utiliza-
tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, our
analysis of agent interactions within AGENTVERSE reveals the emergence of spe-
cific collaborative behaviors, contributing to heightened group efficiency. Our code
has been released at https://github.com/OpenBMB/AgentVerse/.",https://arxiv.org/abs/2308.10848,Communication,Computation and Language (cs.CL),agentverse_facilitating_multi-agent_collaboration_20230821,"Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc."
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou",2023.8.21,"Autonomous agents empowered by Large Language Models (LLMs) have under-
gone significant improvements, enabling them to generalize across a broad spec-
trum of tasks. However, in real-world scenarios, cooperation among individuals is
often required to enhance the efficiency and effectiveness of task accomplishment.
Hence, inspired by human group dynamics, we propose a multi-agent framework
AGENTVERSE that can effectively orchestrate a collaborative group of expert agents
as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that
AGENTVERSE can proficiently deploy multi-agent groups that outperform a single
agent. Extensive experiments on text understanding, reasoning, coding, tool utiliza-
tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, our
analysis of agent interactions within AGENTVERSE reveals the emergence of spe-
cific collaborative behaviors, contributing to heightened group efficiency. Our code
has been released at https://github.com/OpenBMB/AgentVerse/.",https://arxiv.org/abs/2308.10848,Simulation,Computation and Language (cs.CL),agentverse_facilitating_multi-agent_collaboration_20230821,"Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc."
AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis,"Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xi, Fei Huang, Jingren Zhou",2024.2.15,"The incorporation of Large Language Models
(LLMs) in healthcare marks a significant ad-
vancement. However, the application has pre-
dominantly been limited to discriminative and
question-answering tasks, which does not fully
leverage their interactive potential. To address
this limitation, our paper presents AI Hospital,
a framework designed to build a real-time in-
teractive diagnosis environment. To simulate
the procedure, we collect high-quality medical
records to create patient, examiner, and medical
director agents. AI Hospital is then utilized for
the interactive evaluation and collaboration of
LLMs. Initially, we create a Multi-View Medi-
cal Evaluation (MVME) benchmark where vari-
ous LLMs serve as intern doctors for interactive
diagnosis. Subsequently, to improve diagnostic
accuracy, we introduce a collaborative mech-
anism that involves iterative discussions and
a dispute resolution process under the supervi-
sion of the medical director. In our experiments,
we validate the reliability of AI Hospital. The
results not only explore the feasibility of apply
LLMs in clinical consultation but also confirm
the effectiveness of the dispute resolution fo-
cused collaboration method.",https://arxiv.org/abs/2402.09742,Simulation,Computation and Language (cs.CL),ai_hospital_interactive_evaluation_20240215,"Alibaba Inc., Huazhong University of Science and Technology, Fudan University"
Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates,"Haotian Wang, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, Yi Guan",2023.12.8,"Multi-agent debate systems are designed to derive accurate and consistent conclusions through adversarial interactions among agents. However, these systems often encounter challenges due to cognitive constraints, manifesting as (1) agents' obstinate adherence to incorrect viewpoints and (2) their propensity to abandon correct viewpoints. These issues are primarily responsible for the ineffectiveness of such debates. Addressing the challenge of cognitive constraints, we introduce a novel framework, the Multi-Agent Debate with Retrieval Augmented (MADRA). MADRA incorporates retrieval of prior knowledge into the debate process, effectively breaking cognitive constraints and enhancing the agents' reasoning capabilities. Furthermore, we have developed a self-selection module within this framework, enabling agents to autonomously select pertinent evidence, thereby minimizing the impact of irrelevant or noisy data. We have comprehensively tested and analyzed MADRA across six diverse datasets. The experimental results demonstrate that our approach significantly enhances performance across various tasks, proving the effectiveness of our proposed method.",https://arxiv.org/abs/2312.04854,Communication,Computation and Language (cs.CL),apollo's_oracle_retrieval-augmented_reasoning_20231208,"Harbin Institute of Technology, Sun Yat-sen University, Zhejiang University"
Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks,"Siyu Li, Jin Yang, Kui Zhao",2023.7.19,"As the capabilities of Large Language Models (LLMs) emerge, they not only assist in accomplishing traditional tasks within more efficient paradigms but also stimulate the evolution of social bots. Researchers have begun exploring the implementation of LLMs as the driving core of social bots, enabling more efficient and user-friendly completion of tasks like profile completion, social behavior decision-making, and social content generation. However, there is currently a lack of systematic research on the behavioral characteristics of LLMs-driven social bots and their impact on social networks. We have curated data from Chirper, a Twitter-like social network populated by LLMs-driven social bots and embarked on an exploratory study. Our findings indicate that: (1) LLMs-driven social bots possess enhanced individual-level camouflage while exhibiting certain collective characteristics; (2) these bots have the ability to exert influence on online communities through toxic behaviors; (3) existing detection methods are applicable to the activity environment of LLMs-driven social bots but may be subject to certain limitations in effectiveness. Moreover, we have organized the data collected in our study into the Masquerade-23 dataset, which we have publicly released, thus addressing the data void in the subfield of LLMs-driven social bots behavior datasets. Our research outcomes provide primary insights for the research and governance of LLMs-driven social bots within the research community.",https://arxiv.org/abs/2307.10337,Simulation,Social and Information Networks (cs.SI),are_you_in_a_20230719,Sichuan University
ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator,"Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha",2024.5.28,"Large language models (LLMs) are proven to
benefit a lot from retrieval-augmented genera-
tion (RAG) in alleviating hallucinations con-
fronted with knowledge-intensive questions.
RAG adopts information retrieval techniques
to inject external knowledge from semantic-
relevant documents as input contexts. How-
ever, due to today’s Internet being flooded with
numerous noisy and fabricating content, it is
inevitable that RAG systems are vulnerable
to these noises and prone to respond incor-
rectly. To this end, we propose to optimize
the retrieval-augmented GENERATOR with a
Adversarial Tuning Multi-agent system (ATM).
The ATM steers the GENERATOR to have a ro-
bust perspective of useful documents for ques-
tion answering with the help of an auxiliary
ATTACKER agent. The GENERATOR and the
ATTACKER are tuned adversarially for several
iterations. After rounds of multi-agent itera-
tive tuning, the GENERATOR can eventually
better discriminate useful documents amongst
fabrications. The experimental results verify
the effectiveness of ATM and we also observe
that the GENERATOR can achieve better perfor-
mance compared to state-of-the-art baselines.",https://arxiv.org/abs/2405.18111,Communication,Computation and Language (cs.CL),atm_adversarial_tuning_multi-agent_20240528,"Beihang University, Baidu Inc."
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions,"Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing",2024.5.30,"As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation
method that can provide robust evaluation results in a timely fashion. Currently,
as static benchmarks are prone to contamination concerns, users tend to trust
human voting platforms, such as Chatbot Arena. However, human annotations
require extensive manual efforts. To provide an automatic, robust, and trustworthy
evaluation framework, we innovatively propose the Auto-Arena of LLMs, which
automates the entire evaluation process with LLM agents. Firstly, an examiner
LLM devises queries. Then, a pair of candidate LLMs engage in a multi-round peer-
battle around the query, during which the LLM’s true performance gaps become
visible. Finally, a committee of LLM judges collectively discuss and determine the
winner, which alleviates bias and promotes fairness. In our extensive experiment
on the 17 newest LLMs, Auto-Arena shows the highest correlation with human
preferences, providing a promising alternative to human evaluation platforms.",https://arxiv.org/abs/2405.20267,Communication,Computation and Language (cs.CL),auto_arena_of_llms_20240530,"Nanyang Technological University, Alibaba Group, Singapore University of Technology and Design"
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang",2023.8.16,"AutoGen2 is an open-source framework that allows developers to build LLM ap-
plications via multiple agents that can converse with each other to accomplish
tasks. AutoGen agents are customizable, conversable, and can operate in vari-
ous modes that employ combinations of LLMs, human inputs, and tools. Using
AutoGen, developers can also flexibly define agent interaction behaviors. Both
natural language and computer code can be used to program flexible conversation
patterns for different applications. AutoGen serves as a generic framework for
building diverse applications of various complexities and LLM capacities. Em-
pirical studies demonstrate the effectiveness of the framework in many example
applications, with domains ranging from mathematics, coding, question answer-
ing, operations research, online decision-making, entertainment, etc.",https://arxiv.org/abs/2308.08155,Organization,Artificial Intelligence (cs.AI),autogen_enabling_next-gen_llm_20230816,"Microsoft Research, Pennsylvania State University, University of Washington, Xidian University"
Autonomous Agents for Collaborative Task under Information Asymmetry,"Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian",2024.6.21,"Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great
progress in solving complex tasks. It performs communication among agents within
the system to collaboratively solve tasks, under the premise of shared information.
However, when agents’ communication is leveraged to enhance human cooperation,
a new challenge arises due to information asymmetry, since each agent can only
access the information of its human user. Previous MAS struggle to complete tasks
under this condition. To address this, we propose a new MAS paradigm termed
iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human
social network is mirrored in the agent network, where agents proactively exchange
human information necessary for task resolution, thereby overcoming information
asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to
navigate agents’ communication towards effective information exchange. Together
with InfoNav, iAgents organizes human information in a mixed memory to provide
agents with accurate and comprehensive information for exchange. Additionally,
we introduce InformativeBench, the first benchmark tailored for evaluating LLM
agents’ task-solving ability under information asymmetry. Experimental results
show that iAgents can collaborate within a social network of 140 individuals
and 588 relationships, autonomously communicate over 30 turns, and retrieve
information from nearly 70,000 messages to complete tasks within 3 minutes.",https://arxiv.org/abs/2406.14928,Communication,Artificial Intelligence (cs.AI),autonomous_agents_for_collaborative_20240621,"Tsinghua University, Beijing University of Posts and Telecommunications"
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang",2023.10.2,"Recent breakthroughs in large language models (LLMs) have brought remark-
able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption
is that the information processed by LLMs is consistently honest, neglecting the
pervasive deceptive or misleading information in human society and AI-generated
content.
This oversight makes LLMs susceptible to malicious manipulations,
potentially resulting in detrimental outcomes. This study utilizes the intricate
Avalon game as a testbed to explore LLMs’ potential in deceptive environments.
Avalon, full of misinformation and requiring sophisticated logic, manifests as a
“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking and
perspective-taking in the Avalon game, we introduce a novel framework, Recur-
sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteract
deceptive information. ReCon combines formulation and refinement contempla-
tion processes; formulation contemplation produces initial thoughts and speech,
while refinement contemplation further polishes them. Additionally, we incor-
porate first-order and second-order perspective transitions into these processes
respectively. Specifically, the first-order allows an LLM agent to infer others’
mental states, and the second-order involves understanding how others perceive
the agent’s mental state.......",https://arxiv.org/abs/2310.01320,Communication,Artificial Intelligence (cs.AI),avalon's_game_of_thoughts_20231002,"Tsinghua University, BIGAI, Technical University of Munich"
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang",2023.10.2,"Recent breakthroughs in large language models (LLMs) have brought remark-
able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption
is that the information processed by LLMs is consistently honest, neglecting the
pervasive deceptive or misleading information in human society and AI-generated
content.
This oversight makes LLMs susceptible to malicious manipulations,
potentially resulting in detrimental outcomes. This study utilizes the intricate
Avalon game as a testbed to explore LLMs’ potential in deceptive environments.
Avalon, full of misinformation and requiring sophisticated logic, manifests as a
“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking and
perspective-taking in the Avalon game, we introduce a novel framework, Recur-
sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteract
deceptive information. ReCon combines formulation and refinement contempla-
tion processes; formulation contemplation produces initial thoughts and speech,
while refinement contemplation further polishes them. Additionally, we incor-
porate first-order and second-order perspective transitions into these processes
respectively. Specifically, the first-order allows an LLM agent to infer others’
mental states, and the second-order involves understanding how others perceive
the agent’s mental state.......",https://arxiv.org/abs/2310.01320,Organization,Artificial Intelligence (cs.AI),avalon's_game_of_thoughts_20231002,"Tsinghua University, BIGAI, Technical University of Munich"
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis,"Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang",2024.4.23,"This paper presents BattleAgent, a detailed emulation demonstration system that
combines the Large Vision-Language Model (VLM) and Multi-Agent System
(MAS). This novel system aims to simulate complex dynamic interactions among
multiple agents, as well as between agents and their environments, over a period of
time. It emulates both the decision-making processes of leaders and the viewpoints
of ordinary participants, such as soldiers. The emulation showcases the current
capabilities of agents, featuring fine-grained multi-modal interactions between
agents and landscapes. It develops customizable agent structures to meet specific
situational requirements, for example, a variety of battle-related activities like
scouting and trench digging. These components collaborate to recreate historical
events in a lively and comprehensive manner while offering insights into the
thoughts and feelings of individuals from diverse viewpoints. The technological
foundations of BattleAgent establish detailed and immersive settings for historical
battles, enabling individual agents to partake in, observe, and dynamically respond
to evolving battle scenarios. This methodology holds the potential to substantially
deepen our understanding of historical events, particularly through individual
accounts. Such initiatives can also aid historical research, as conventional historical
narratives often lack documentation and prioritize the perspectives of decision-
makers, thereby overlooking the experiences of ordinary individuals. This biased
documentation results in a considerable gap in our historical understanding, as many
stories remain untold......",https://arxiv.org/abs/2404.15532,Simulation,Human-Computer Interaction (cs.HC),battleagent_multi-modal_dynamic_emulation_20240423,"Rutgers University, University of Michigan, University of Rochester"
Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun",2024.2.28,"Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.",https://arxiv.org/abs/2402.18439,Communication,Computation and Language (cs.CL),beyond_natural_language_llms_20240228,"Tsinghua University, Tencent, Beijing University of Posts and Telecommunications"
Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun",2024.2.28,"Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.",https://arxiv.org/abs/2402.18439,Evolution,Computation and Language (cs.CL),beyond_natural_language_llms_20240228,"Tsinghua University, Tencent, Beijing University of Posts and Telecommunications"
Building Cooperative Embodied Agents Modularly with Large Language Models,"Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan",2023.7.5,"In this work, we address challenging multi-agent cooperation problems with de-
centralized control, raw sensory observations, costly communication, and multi-
objective tasks instantiated in various embodied environments. While previous re-
search either presupposes a cost-free communication channel or relies on a central-
ized controller with shared observations, we harness the commonsense knowledge,
reasoning ability, language comprehension, and text generation prowess of LLMs
and seamlessly incorporate them into a cognitive-inspired modular framework that
integrates with perception, memory, and execution. Thus building a Cooperative
Embodied Language Agent CoELA, who can plan, communicate, and cooperate
with others to accomplish long-horizon tasks efficiently. Our experiments on C-
WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong
planning-based methods and exhibit emergent effective communication. Though
current Open LMs like LLAMA-2 still underperform, we fine-tune a CoLLAMA
with data collected with our agents and show how they can achieve promising
performance. We also conducted a user study for human-agent interaction and
discovered that CoELA communicating in natural language can earn more trust and
cooperate more effectively with humans. Our research underscores the potential of
LLMs for future research in multi-agent cooperation. Videos can be found on the
project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.",https://arxiv.org/abs/2307.02485,Communication,Artificial Intelligence (cs.AI),building_cooperative_embodied_agents_20230705,"University of Massachusetts Amherst, Tsinghua University, Shanghai Jiao Tong University, MIT, MIT-IBM Watson AI Lab"
"CAMEL: Communicative Agents for ""Mind"" Exploration of Large Language Model Society","Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem",2023.3.31,"The rapid advancement of chat-based language models has led to remarkable
progress in complex task-solving. However, their success heavily relies on human
input to guide the conversation, which can be challenging and time-consuming.
This paper explores the potential of building scalable techniques to facilitate au-
tonomous cooperation among communicative agents, and provides insight into
their “cognitive” processes. To address the challenges of achieving autonomous
cooperation, we propose a novel communicative agent framework named role-
playing . Our approach involves using inception prompting to guide chat agents
toward task completion while maintaining consistency with human intentions. We
showcase how role-playing can be used to generate conversational data for studying
the behaviors and capabilities of a society of agents, providing a valuable resource
for investigating conversational language models. In particular, we conduct com-
prehensive studies on instruction-following cooperation in multi-agent settings.
Our contributions include introducing a novel communicative agent framework,
offering a scalable approach for studying the cooperative behaviors and capabili-
ties of multi-agent systems, and open-sourcing our library to support research on
communicative agents and beyond: https://github.com/camel-ai/camel.",https://arxiv.org/abs/2303.17760,Communication,Artificial Intelligence (cs.AI),camel_communicative_agents_for_20230331,King Abdullah University of Science and Technology
Can Large Language Model Agents Simulate Human Trust Behaviors?,"Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li",2024.2.7,"Large Language Model (LLM) agents have been
increasingly adopted as simulation tools to model
humans in applications such as social science.
However, one fundamental question remains: can
LLM agents really simulate human behaviors? In
this paper, we focus on one of the most critical
behaviors in human interactions, trust, and aim to
investigate whether or not LLM agents can sim-
ulate human trust behaviors. We first find that
LLM agents generally exhibit trust behaviors, re-
ferred to as agent trust, under the framework of
Trust Games, which are widely recognized in be-
havioral economics. Then, we discover that LLM
agents can have high behavioral alignment with
humans regarding trust behaviors, particularly for
GPT-4, indicating the feasibility to simulate hu-
man trust behaviors with LLM agents. In addition,
we probe into the biases in agent trust and the
differences in agent trust towards agents and hu-
mans. We also explore the intrinsic properties of
agent trust under conditions including advanced
reasoning strategies and external manipulations.
We further offer important implications of our
discoveries for various scenarios where trust is
paramount. Our study provides new insights into
the behaviors of LLM agents and the fundamental
analogy between LLMs and humans.",https://arxiv.org/abs/2402.04559,Simulation,Artificial Intelligence (cs.AI),can_large_language_model_20240207,"KAUST, Illinois Institute of Technology, Pennsylvania State University, The University of Chicago, University of Oxford, California Institute of Technology"
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks,"Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik",2024.6.4,"Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of covering the part with needed information, while window extension struggles with focusing on the pertinent information for solving the task. To mitigate these limitations, we propose Chain-of-Agents (CoA), a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning across various LLMs over long-context tasks. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context. We perform comprehensive evaluation of CoA on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.",https://arxiv.org/abs/2406.02818,Organization,Computation and Language (cs.CL),chain_of_agents_large_20240604,"Penn State University, Google Cloud AI Research"
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation,"Zejun Wang, Jia Li, Ge Li, Zhi Jin",2023.11.1,"Large language models have shown good performances in generat-
ing code to meet human requirements. However, human require-
ments expressed in natural languages can be vague, incomplete,
and ambiguous, leading large language models to misunderstand
human requirements and make mistakes. Worse, it is difficult for a
human user to refine the requirement. To help human users refine
their requirements and improve large language models’ code gen-
eration performances, we propose ChatCoder: a method to refine
the requirements via chatting with large language models. We de-
sign a chat scheme in which the large language models will guide
the human users to refine their expression of requirements to be
more precise, unambiguous, and complete than before. Experiments
show that ChatCoder has improved existing large language models’
performance by a large margin. Besides, ChatCoder has the advan-
tage over refine-based methods and LLMs fine-tuned via human
response.",https://arxiv.org/abs/2311.00272,Organization,Software Engineering (cs.SE),chatcoder_chat-based_refine_requirement_20231101,Peking University
ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that
necessitates cooperation among multiple mem-
bers with diverse skills. Numerous studies used
deep learning to improve specific phases in a
waterfall model, such as design, coding, and
testing.
However, the deep learning model
in each phase requires unique designs, lead-
ing to technical inconsistencies across various
phases, which results in a fragmented and in-
effective development process. In this paper,
we introduce ChatDev, a chat-powered soft-
ware development framework in which special-
ized agents driven by large language models
(LLMs) are guided in what to communicate
(via chat chain) and how to communicate (via
communicative dehallucination). These agents
actively contribute to the design, coding, and
testing phases through unified language-based
communication, with solutions derived from
their multi-turn dialogues. We found their uti-
lization of natural language is advantageous
for system design, and communicating in pro-
gramming language proves helpful in debug-
ging. This paradigm demonstrates how linguis-
tic communication facilitates multi-agent col-
laboration, establishing language as a unify-
ing bridge for autonomous task-solving among
LLM agents. The code and data are available
at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Communication,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc."
ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that
necessitates cooperation among multiple mem-
bers with diverse skills. Numerous studies used
deep learning to improve specific phases in a
waterfall model, such as design, coding, and
testing.
However, the deep learning model
in each phase requires unique designs, lead-
ing to technical inconsistencies across various
phases, which results in a fragmented and in-
effective development process. In this paper,
we introduce ChatDev, a chat-powered soft-
ware development framework in which special-
ized agents driven by large language models
(LLMs) are guided in what to communicate
(via chat chain) and how to communicate (via
communicative dehallucination). These agents
actively contribute to the design, coding, and
testing phases through unified language-based
communication, with solutions derived from
their multi-turn dialogues. We found their uti-
lization of natural language is advantageous
for system design, and communicating in pro-
gramming language proves helpful in debug-
ging. This paradigm demonstrates how linguis-
tic communication facilitates multi-agent col-
laboration, establishing language as a unify-
ing bridge for autonomous task-solving among
LLM agents. The code and data are available
at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Organization,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc."
ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that
necessitates cooperation among multiple mem-
bers with diverse skills. Numerous studies used
deep learning to improve specific phases in a
waterfall model, such as design, coding, and
testing.
However, the deep learning model
in each phase requires unique designs, lead-
ing to technical inconsistencies across various
phases, which results in a fragmented and in-
effective development process. In this paper,
we introduce ChatDev, a chat-powered soft-
ware development framework in which special-
ized agents driven by large language models
(LLMs) are guided in what to communicate
(via chat chain) and how to communicate (via
communicative dehallucination). These agents
actively contribute to the design, coding, and
testing phases through unified language-based
communication, with solutions derived from
their multi-turn dialogues. We found their uti-
lization of natural language is advantageous
for system design, and communicating in pro-
gramming language proves helpful in debug-
ging. This paradigm demonstrates how linguis-
tic communication facilitates multi-agent col-
laboration, establishing language as a unify-
ing bridge for autonomous task-solving among
LLM agents. The code and data are available
at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Simulation,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc."
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate,"Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu",2023.8.14,"Text evaluation has historically posed significant challenges, often demanding
substantial labor and time cost. With the emergence of large language models
(LLMs), researchers have explored LLMs’ potential as alternatives for human
evaluation. While these single-agent-based approaches show promise, experi-
mental results suggest that further advancements are needed to bridge the gap
between their current effectiveness and human-level evaluation quality. Recog-
nizing that best practices of human evaluation processes often involve multiple
human annotators collaborating in the evaluation, we resort to a multi-agent debate
framework, moving beyond single-agent prompting strategies. The multi-agent-
based approach enables a group of LLMs to synergize with an array of intelli-
gent counterparts, harnessing their distinct capabilities and expertise to enhance
efficiency and effectiveness in handling intricate tasks. In this paper, we con-
struct a multi-agent referee team called ChatEval to autonomously discuss and
evaluate the quality of generated responses from different models on open-ended
questions and traditional natural language generation (NLG) tasks. We derive
insights and lessons from practical scenarios where humans instigate group dis-
cussions for brainstorming and propose different communication strategies within
ChatEval......",https://arxiv.org/abs/2308.07201,Organization,Computation and Language (cs.CL),chateval_towards_better_llm-based_20230814,"Tsinghua University, Hong Kong University of Science and Technology, Peking University"
"CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving","Pei Chen, Boran Han, Shuai Zhang",2024.4.26,"Large Language Models (LLMs) have shown
great ability in solving traditional natural lan-
guage tasks and elementary reasoning tasks
with appropriate prompting techniques. How-
ever, their ability is still limited in solving com-
plicated science problems. In this work, we
aim to push the upper bound of the reason-
ing capability of LLMs by proposing a col-
laborative multi-agent, multi-reasoning-path
(CoMM) prompting framework. Specifically,
we prompt LLMs to play different roles in a
problem-solving team, and encourage differ-
ent role-play agents to collaboratively solve
the target task. In particular, we discover that
applying different reasoning paths for differ-
ent roles is an effective strategy to implement
few-shot prompting approaches in the multi-
agent scenarios. Empirical results demonstrate
the effectiveness of the proposed methods on
two college-level science problems over com-
petitive baselines. Our further analysis shows
the necessity of prompting LLMs to play dif-
ferent roles or experts independently. We re-
lease the code at: https://github.com/
amazon-science/comm-prompt.",https://arxiv.org/abs/2404.17729,Organization,Computation and Language (cs.CL),"comm_collaborative_multi-agent,_multi-reasoning-path_20240426","Texas A&M University, Amazon Web Services"
CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents,"Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie",2023.10.26,"Large language models (LLMs) have been widely
used as agents to complete different tasks, such
as personal assistance or event planning. While
most of the work has focused on cooperation
and collaboration between agents, little work
explores competition, another important mech-
anism that promotes the development of soci-
ety and economy. In this paper, we seek to ex-
amine the competition dynamics in LLM-based
agents. We first propose a general framework for
studying the competition between agents. Then,
we implement a practical competitive environ-
ment using GPT-4 to simulate a virtual town with
two types of agents, including restaurant agents
and customer agents. Specifically, the restaurant
agents compete with each other to attract more
customers, where competition encourages them
to transform, such as cultivating new operating
strategies. Simulation experiments reveal several
interesting findings at the micro and macro lev-
els, which align well with existing market and
sociological theories. We hope that the frame-
work and environment can be a promising testbed
to study the competition that fosters understand-
ing of society. Code is available at: https:
//github.com/microsoft/competeai.",https://arxiv.org/abs/2310.17512,Simulation,Artificial Intelligence (cs.AI),competeai_understanding_the_competition_20231026,"University of Science and Technology of China, Microsoft Research, William & Mary, Georgia Institute of Technology, Carnegie Mellon University"
"Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents","Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang",2023.2.3,"We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose ""Describe, Explain, Plan and Select"" (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing self-explanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the 𝙾𝚋𝚝𝚊𝚒𝚗𝙳𝚒𝚊𝚖𝚘𝚗𝚍 grand challenge with our approach.",https://arxiv.org/abs/2302.01560,Organization,Artificial Intelligence (cs.AI),"describe,_explain,_plan_and_20230203","Peking University, University of California Los Angeles, Beijing Institute for General Artificial Intelligence"
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang",2023.10.3,"Large language model (LLM) agents have been shown effective on a wide range
of tasks, and by ensembling multiple LLM agents, their performances could be
further improved. Existing approaches employ a fixed set of agents to interact
with each other in a static architecture, which limits their generalizability to vari-
ous tasks and requires strong human prior in designing these agents. In this work,
we propose to construct a strategic team of agents communicating in a dynamic
interaction architecture based on the task query. Specifically, we build a frame-
work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora-
tion on complicated tasks like reasoning and code generation. DyLAN enables
agents to interact for multiple rounds in a dynamic architecture with inference-
time agent selection and an early-stopping mechanism to improve performance
and efficiency. We further design an automatic agent team optimization algorithm
based on an unsupervised metric termed Agent Importance Score, enabling the
selection of best agents based on the contribution each agent makes. Empirically,
we demonstrate that DyLAN performs well in both reasoning and code generation
tasks with reasonable computational cost. DyLAN achieves 1",https://arxiv.org/abs/2310.02170,Organization,Computation and Language (cs.CL),dynamic_llm-agent_network_an_20231003,"Tsinghua University, Georgia Tech, Stanford University"
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang",2023.10.3,"Large language model (LLM) agents have been shown effective on a wide range
of tasks, and by ensembling multiple LLM agents, their performances could be
further improved. Existing approaches employ a fixed set of agents to interact
with each other in a static architecture, which limits their generalizability to vari-
ous tasks and requires strong human prior in designing these agents. In this work,
we propose to construct a strategic team of agents communicating in a dynamic
interaction architecture based on the task query. Specifically, we build a frame-
work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora-
tion on complicated tasks like reasoning and code generation. DyLAN enables
agents to interact for multiple rounds in a dynamic architecture with inference-
time agent selection and an early-stopping mechanism to improve performance
and efficiency. We further design an automatic agent team optimization algorithm
based on an unsupervised metric termed Agent Importance Score, enabling the
selection of best agents based on the contribution each agent makes. Empirically,
we demonstrate that DyLAN performs well in both reasoning and code generation
tasks with reasonable computational cost. DyLAN achieves 1",https://arxiv.org/abs/2310.02170,Evolution,Computation and Language (cs.CL),dynamic_llm-agent_network_an_20231003,"Tsinghua University, Georgia Tech, Stanford University"
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao",2023.10.16,"The advent of artificial intelligence has led to a
growing emphasis on data-driven modeling in
macroeconomics, with agent-based modeling
(ABM) emerging as a prominent bottom-up
simulation paradigm. In ABM, agents (e.g.,
households, firms) interact within a macroe-
conomic environment, collectively generating
market dynamics. Existing agent modeling typ-
ically employs predetermined rules or learning-
based neural networks for decision-making.
However, customizing each agent presents sig-
nificant challenges, complicating the modeling
of agent heterogeneity. Additionally, the in-
fluence of multi-period market dynamics and
multifaceted macroeconomic factors are often
overlooked in decision-making processes. In
this work, we introduce EconAgent, a large
language model-empowered agent with human-
like characteristics for macroeconomic simu-
lation. We first construct a simulation envi-
ronment that incorporates various market dy-
namics driven by agents’ decisions regarding
work and consumption. Through the perception
module, we create heterogeneous agents with
distinct decision-making mechanisms.
Fur-
thermore, we model the impact of macroeco-
nomic trends using a memory module, which
allows agents to reflect on past individual ex-
periences and market dynamics. Simulation
experiments show that EconAgent can make
realistic decisions, leading to more reasonable
macroeconomic phenomena compared to exist-
ing rule-based or learning-based agents. Our
codes are released at https://github.com/
tsinghua-fib-lab/ACL24-EconAgent.",https://arxiv.org/abs/2310.10436,Organization,Artificial Intelligence (cs.AI),econagent_large_language_model-empowered_20231016,Tsinghua University
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao",2023.10.16,"The advent of artificial intelligence has led to a
growing emphasis on data-driven modeling in
macroeconomics, with agent-based modeling
(ABM) emerging as a prominent bottom-up
simulation paradigm. In ABM, agents (e.g.,
households, firms) interact within a macroe-
conomic environment, collectively generating
market dynamics. Existing agent modeling typ-
ically employs predetermined rules or learning-
based neural networks for decision-making.
However, customizing each agent presents sig-
nificant challenges, complicating the modeling
of agent heterogeneity. Additionally, the in-
fluence of multi-period market dynamics and
multifaceted macroeconomic factors are often
overlooked in decision-making processes. In
this work, we introduce EconAgent, a large
language model-empowered agent with human-
like characteristics for macroeconomic simu-
lation. We first construct a simulation envi-
ronment that incorporates various market dy-
namics driven by agents’ decisions regarding
work and consumption. Through the perception
module, we create heterogeneous agents with
distinct decision-making mechanisms.
Fur-
thermore, we model the impact of macroeco-
nomic trends using a memory module, which
allows agents to reflect on past individual ex-
periences and market dynamics. Simulation
experiments show that EconAgent can make
realistic decisions, leading to more reasonable
macroeconomic phenomena compared to exist-
ing rule-based or learning-based agents. Our
codes are released at https://github.com/
tsinghua-fib-lab/ACL24-EconAgent.",https://arxiv.org/abs/2310.10436,Simulation,Artificial Intelligence (cs.AI),econagent_large_language_model-empowered_20231016,Tsinghua University
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate,"Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi",2023.5.30,"Modern large language models (LLMs) like
ChatGPT have shown remarkable performance
on general language tasks but still struggle on
complex reasoning tasks, which drives the re-
search on cognitive behaviors of LLMs to ex-
plore human-like problem-solving strategies.
Along this direction, one representative strat-
egy is self-reflection, which asks an LLM to
refine the solution with the feedback gener-
ated by itself iteratively. However, our study
shows that such reflection-style methods suf-
fer from the Degeneration-of-Thought (DoT)
problem: once the LLM has established confi-
dence in its solutions, it is unable to generate
novel thoughts later through reflection even if
its initial stance is incorrect. To address the
DoT problem, we propose a Multi-Agent De-
bate (MAD) framework, in which multiple
agents express their arguments in the state of
“tit for tat” and a judge manages the debate
process to obtain a final solution. Clearly, our
MAD framework encourages divergent think-
ing in LLMs which would be helpful for tasks
that require deep levels of contemplation. Ex-
periment results on two challenging datasets,
commonsense machine translation and counter-
intuitive arithmetic reasoning, demonstrate the
effectiveness of our MAD framework. Exten-
sive analyses suggest that the adaptive break of
debate and the modest level of “tit for tat” state
are required for MAD to obtain good perfor-
mance. Moreover, we find that LLMs might not
be a fair judge if different LLMs are used for
agents. Code is available at https://github.
com/Skytliang/Multi-Agents-Debate.",https://arxiv.org/abs/2305.19118,Communication,Computation and Language (cs.CL),encouraging_divergent_thinking_in_20230530,"Tsinghua University, Shanghai Jiao Tong University, Tencent AI Lab"
Epidemic Modeling with Generative Agents,"Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, Navid Ghaffarzadegan",2023.7.11,"This study offers a new paradigm of individual-level modeling to address the grand challenge of
incorporating human behavior in epidemic models. Using generative artificial intelligence in an
agent-based epidemic model, each agent is empowered to make its own reasonings and decisions
via connecting to a large language model such as ChatGPT. Through various simulation
experiments, we present compelling evidence that generative agents mimic real-world behaviors
such as quarantining when sick and self-isolation when cases rise. Collectively, the agents
demonstrate patterns akin to multiple waves observed in recent pandemics followed by an
endemic period. Moreover, the agents successfully flatten the epidemic curve. This study creates
potential to improve dynamic system modeling by offering a way to represent human brain,
reasoning, and decision making.",https://arxiv.org/abs/2307.04986,Simulation,Artificial Intelligence (cs.AI),epidemic_modeling_with_generative_20230711,Virginia Tech
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate,"Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, Bing Qin",2023.5.19,"Large Language Models (LLMs) have shown
impressive capabilities in various applications,
but they still face various inconsistency issues.
Existing works primarily focus on the incon-
sistency issues within a single LLM, while we
complementarily explore the inter-consistency
among multiple LLMs for collaboration. To
examine whether LLMs can collaborate effec-
tively to achieve a consensus for a shared goal,
we focus on commonsense reasoning, and in-
troduce a formal debate framework (FORD)
to conduct a three-stage debate among LLMs
with real-world scenarios alignment: fair de-
bate, mismatched debate, and roundtable de-
bate. Through extensive experiments on var-
ious datasets, LLMs can effectively collabo-
rate to reach a consensus despite noticeable
inter-inconsistencies, but imbalances in their
abilities can lead to domination by superior
LLMs. Leveraging a more advanced LLM like
GPT-4 as an authoritative judge can boost col-
laboration performance. Our work contributes
to understanding the inter-consistency among
LLMs and lays the foundation for develop-
ing future collaboration methods. Codes and
data are available at https://github.com/Waste-
Wood/FORD.",https://arxiv.org/abs/2305.11595,Communication,Computation and Language (cs.CL),examining_inter-consistency_of_large_20230519,"Harbin Institute of Technology, Singapore Management University"
Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun",2023.12.28,"Recent advancements in large language mod-
els (LLMs) have brought significant changes
to various domains, especially through LLM-
driven autonomous agents. A representative
scenario is in software development, where
LLM agents demonstrate efficient collabora-
tion, task division, and assurance of software
quality, markedly reducing the need for man-
ual involvement. However, these agents fre-
quently perform a variety of tasks indepen-
dently, without benefiting from past experi-
ences, which leads to repeated mistakes and
inefficient attempts in multi-step task execu-
tion. To this end, we introduce Experiential Co-
Learning, a novel LLM-agent learning frame-
work in which instructor and assistant agents
gather shortcut-oriented experiences from their
historical trajectories and use these past expe-
riences for future task execution. The exten-
sive experiments demonstrate that the frame-
work enables agents to tackle unseen software-
developing tasks more effectively. We antici-
pate that our insights will guide LLM agents
towards enhanced autonomy and contribute
to their evolutionary growth in cooperative
learning. The code and data are available at
https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2312.17025,Evolution,Computation and Language (cs.CL),experiential_co-learning_of_software-developing_20231228,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens"
Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun",2023.12.28,"Recent advancements in large language mod-
els (LLMs) have brought significant changes
to various domains, especially through LLM-
driven autonomous agents. A representative
scenario is in software development, where
LLM agents demonstrate efficient collabora-
tion, task division, and assurance of software
quality, markedly reducing the need for man-
ual involvement. However, these agents fre-
quently perform a variety of tasks indepen-
dently, without benefiting from past experi-
ences, which leads to repeated mistakes and
inefficient attempts in multi-step task execu-
tion. To this end, we introduce Experiential Co-
Learning, a novel LLM-agent learning frame-
work in which instructor and assistant agents
gather shortcut-oriented experiences from their
historical trajectories and use these past expe-
riences for future task execution. The exten-
sive experiments demonstrate that the frame-
work enables agents to tackle unseen software-
developing tasks more effectively. We antici-
pate that our insights will guide LLM agents
towards enhanced autonomy and contribute
to their evolutionary growth in cooperative
learning. The code and data are available at
https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2312.17025,Organization,Computation and Language (cs.CL),experiential_co-learning_of_software-developing_20231228,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens"
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View,"Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, Shumin Deng",2023.10.3,"As Natural Language Processing (NLP) sys-
tems are increasingly employed in intricate so-
cial environments, a pressing query emerges:
Can these NLP systems mirror human-esque
collaborative intelligence, in a multi-agent so-
ciety consisting of multiple large language mod-
els (LLMs)? This paper probes the collabora-
tion mechanisms among contemporary NLP
systems by melding practical experiments with
theoretical insights. We fabricate four unique
‘societies’ comprised of LLM agents, where
each agent is characterized by a specific ‘trait’
(easy-going or overconfident) and engages in
collaboration with a distinct ‘thinking pattern’
(debate or reflection).
Through evaluating
these multi-agent societies on three benchmark
datasets, we discern that certain collaborative
strategies not only outshine previous top-tier
approaches but also optimize efficiency (using
fewer API tokens). Moreover, our results fur-
ther illustrate that LLM agents manifest human-
like social behaviors, such as conformity and
consensus reaching, mirroring foundational so-
cial psychology theories. In conclusion, we
integrate insights from social psychology to
contextualize the collaboration of LLM agents,
inspiring further investigations into the collab-
oration mechanism for LLMs. We have shared
our code and datasets1, hoping to catalyze fur-
ther research in this promising avenue.",https://arxiv.org/abs/2310.02124,Simulation,Computation and Language (cs.CL),exploring_collaboration_mechanisms_for_20231003,"Zhejiang University, National University of Singapore, NUS-NCS Joint Lab, Google DeepMind"
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu",2023.9.9,"Communication games, which we refer to as
incomplete information games that heavily de-
pend on natural language communication, hold
significant research value in fields such as eco-
nomics, social science, and artificial intelli-
gence. In this work, we explore the problem of
how to engage large language models (LLMs)
in communication games, and in response, pro-
pose a tuning-free framework. Our approach
keeps LLMs frozen, and relies on the retrieval
and reflection on past communications and ex-
periences for improvement. An empirical study
on the representative and widely-studied com-
munication game, “Werewolf”, demonstrates
that our framework can effectively play Were-
wolf game without tuning the parameters of the
LLMs. More importantly, strategic behaviors
begin to emerge in our experiments, suggest-
ing that it will be a fruitful journey to engage
LLMs in communication games and associated
domains.",https://arxiv.org/abs/2309.04658,Communication,Computation and Language (cs.CL),exploring_large_language_models_20230909,"Tsinghua University, Zhongguancun Laboratory"
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu",2023.9.9,"Communication games, which we refer to as
incomplete information games that heavily de-
pend on natural language communication, hold
significant research value in fields such as eco-
nomics, social science, and artificial intelli-
gence. In this work, we explore the problem of
how to engage large language models (LLMs)
in communication games, and in response, pro-
pose a tuning-free framework. Our approach
keeps LLMs frozen, and relies on the retrieval
and reflection on past communications and ex-
periences for improvement. An empirical study
on the representative and widely-studied com-
munication game, “Werewolf”, demonstrates
that our framework can effectively play Were-
wolf game without tuning the parameters of the
LLMs. More importantly, strategic behaviors
begin to emerge in our experiments, suggest-
ing that it will be a fruitful journey to engage
LLMs in communication games and associated
domains.",https://arxiv.org/abs/2309.04658,Organization,Computation and Language (cs.CL),exploring_large_language_models_20230909,"Tsinghua University, Zhongguancun Laboratory"
Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting,"Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan",2024.5.28,"The emergence of online recruitment services has revolutionized
the traditional landscape of job seeking and recruitment, neces-
sitating the development of high-quality industrial applications
to improve person-job fitting. Existing methods generally rely on
modeling the latent semantics of resumes and job descriptions and
learning a matching function between them. Inspired by the pow-
erful role-playing capabilities of Large Language Models (LLMs),
we propose to introduce a mock interview process between LLM-
played interviewers and candidates. The mock interview conver-
sations can provide additional evidence for candidate evaluation,
thereby augmenting traditional person-job fitting based solely on
resumes and job descriptions. However, characterizing these two
roles in online recruitment still presents several challenges, such
as developing the skills to raise interview questions, formulating
appropriate answers, and evaluating two-sided fitness.
To this end, we propose MockLLM, a novel applicable framework
that divides the person-job matching process into two modules:
mock interview generation and two-sided evaluation in handshake
protocol, jointly enhancing their performance through collaborative
behaviors between interviewers and candidates. We design a role-
playing framework as a multi-role and multi-behavior paradigm
to enable a single LLM agent to effectively behave with multiple
functions for both parties......",https://arxiv.org/abs/2405.18113,Organization,Computation and Language (cs.CL),facilitating_multi-role_and_multi-behavior_20240528,"Renmin University of China, BOSS Zhipin, King Abdullah University of Science and Technology, University of Electronic Science and Technology of China"
GameGPT: Multi-agent Collaborative Framework for Game Development,"Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang",2023.10.12,"The large language model (LLM) based agents have demonstrated their capacity
to automate and expedite software development processes. In this paper, we
focus on game development and propose a multi-agent collaborative framework,
dubbed GameGPT, to automate game development. While many studies have
pinpointed hallucination as a primary roadblock for deploying LLMs in production,
we identify another concern: redundancy. Our framework presents a series of
methods to mitigate both concerns. These methods include dual collaboration and
layered approaches with several in-house lexicons, to mitigate the hallucination
and redundancy in the planning, task identification, and implementation phases.
Furthermore, a decoupling approach is also introduced to achieve code generation
with better precision.",https://arxiv.org/abs/2310.08067,Organization,Artificial Intelligence (cs.AI),gamegpt_multi-agent_collaborative_framework_20231012,"AutoGame Research, X-Institute, University of Southern California"
Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Communication,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind"
Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Organization,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind"
Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Simulation,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind"
Humanoid Agents: Platform for Simulating Human-like Generative Agents,"Zhilin Wang, Yu Ying Chiu, Yu Cheung Chiu",2023.10.9,"Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Emotion and Closeness in Relationships. Humanoid Agents are able to use these dynamic elements to adapt their daily activities and conversations with other agents, as supported with empirical experiments. Our system is designed to be extensible to various settings, three of which we demonstrate, as well as to other elements influencing human behavior (e.g. empathy, moral values and cultural background). Our platform also includes a Unity WebGL game interface for visualization and an interactive analytics dashboard to show agent statuses over time.",https://arxiv.org/abs/2310.05418,Simulation,Computation and Language (cs.CL),humanoid_agents_platform_for_20231009,"University of Washington, NVIDIA, The University of Hong Kong"
Improving Factuality and Reasoning in Language Models through Multiagent Debate,"Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch",2023.5.23,"Large language models (LLMs) have demonstrated remarkable capabilities in
language generation, understanding, and few-shot learning in recent years. An
extensive body of work has explored how their performance may be further im-
proved through the tools of prompting, ranging from verification, self-consistency,
or intermediate scratchpads. In this paper, we present a complementary approach
to improve language responses where multiple language model instances propose
and debate their individual responses and reasoning processes over multiple rounds
to arrive at a common final answer. Our findings indicate that this approach
significantly enhances mathematical and strategic reasoning across a number of
tasks. We also demonstrate that our approach improves the factual validity of
generated content, reducing fallacious answers and hallucinations that contem-
porary models are prone to. Our approach may be directly applied to existing
black-box models and uses identical procedure and prompts for all tasks we inves-
tigate. Overall, our findings suggest that such ""society of minds"" approach has the
potential to significantly advance the capabilities of LLMs and pave the way for
further breakthroughs in language generation and understanding. Project website
at https://composable-models.github.io/llm_debate/.",https://arxiv.org/abs/2305.14325,Communication,Computation and Language (cs.CL),improving_factuality_and_reasoning_20230523,"MIT CSAIL, Google Brain"
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback,"Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata",2023.5.17,"We study whether multiple large language models (LLMs) can autonomously
improve each other in a negotiation game by playing, reﬂecting, and criticizing.
We are interested in this question because if LLMs were able to improve each
other, it would imply the possibility of creating strong AI agents with minimal
human intervention. We ask two LLMs to negotiate with each other, playing
the roles of a buyer and a seller, respectively. They aim to reach a deal with
the buyer targeting a lower price and the seller a higher one. A third language
model, playing the critic, provides feedback to a player to improve the player’s
negotiation strategies. We let the two agents play multiple rounds, using previous
negotiation history and AI feedback as in-context demonstrations to improve the
model’s negotiation strategy iteratively. We use different LLMs (GPT and Claude)
for different roles and use the deal price as the evaluation metric. Our experiments
reveal multiple intriguing ﬁndings: (",https://arxiv.org/abs/2305.10142,Communication,Computation and Language (cs.CL),improving_language_model_negotiation_20230517,"University of Edinburgh, Allen Institute for AI, University of Edinburgh"
Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie",2024.6.17,"Multi-agent debate has proven effective in im-
proving large language models quality for rea-
soning and factuality tasks. While various role-
playing strategies in multi-agent debates have
been explored, in terms of the communica-
tion among agents, existing approaches adopt
a brute force algorithm – each agent can com-
municate with all other agents. In this paper,
we systematically investigate the effect of com-
munication connectivity in multi-agent systems.
Our experiments on GPT and Mistral models re-
veal that multi-agent debates leveraging sparse
communication topology can achieve compara-
ble or superior performance while significantly
reducing computational costs. Furthermore, we
extend the multi-agent debate framework to
multimodal reasoning and alignment labeling
tasks, showcasing its broad applicability and
effectiveness. Our findings underscore the im-
portance of communication connectivity on en-
hancing the efficiency and effectiveness of the
“society of minds” approach.",https://arxiv.org/abs/2406.11776,Organization,Computation and Language (cs.CL),improving_multi-agent_debate_with_20240617,"Google, Google DeepMind"
Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie",2024.6.17,"Multi-agent debate has proven effective in im-
proving large language models quality for rea-
soning and factuality tasks. While various role-
playing strategies in multi-agent debates have
been explored, in terms of the communica-
tion among agents, existing approaches adopt
a brute force algorithm – each agent can com-
municate with all other agents. In this paper,
we systematically investigate the effect of com-
munication connectivity in multi-agent systems.
Our experiments on GPT and Mistral models re-
veal that multi-agent debates leveraging sparse
communication topology can achieve compara-
ble or superior performance while significantly
reducing computational costs. Furthermore, we
extend the multi-agent debate framework to
multimodal reasoning and alignment labeling
tasks, showcasing its broad applicability and
effectiveness. Our findings underscore the im-
portance of communication connectivity on en-
hancing the efficiency and effectiveness of the
“society of minds” approach.",https://arxiv.org/abs/2406.11776,Communication,Computation and Language (cs.CL),improving_multi-agent_debate_with_20240617,"Google, Google DeepMind"
Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun",2024.5.7,"Autonomous agents powered by large language
models (LLMs) show significant potential for
achieving high autonomy in various scenar-
ios such as software development. Recent re-
search has shown that LLM agents can lever-
age past experiences to reduce errors and en-
hance efficiency. However, the static experi-
ence paradigm, reliant on a fixed collection of
past experiences acquired heuristically, lacks
iterative refinement and thus hampers agents’
adaptability. In this paper, we introduce the It-
erative Experience Refinement framework, en-
abling LLM agents to refine experiences itera-
tively during task execution. We propose two
fundamental patterns: the successive pattern,
refining based on nearest experiences within a
task batch, and the cumulative pattern, acquir-
ing experiences across all previous task batches.
Augmented with our heuristic experience elim-
ination, the method prioritizes high-quality and
frequently-used experiences, effectively man-
aging the experience space and enhancing effi-
ciency. Extensive experiments show that while
the successive pattern may yield superior re-
sults, the cumulative pattern provides more sta-
ble performance......",https://arxiv.org/abs/2405.04219,Evolution,Computation and Language (cs.CL),iterative_experience_refinement_of_20240507,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens"
Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun",2024.5.7,"Autonomous agents powered by large language
models (LLMs) show significant potential for
achieving high autonomy in various scenar-
ios such as software development. Recent re-
search has shown that LLM agents can lever-
age past experiences to reduce errors and en-
hance efficiency. However, the static experi-
ence paradigm, reliant on a fixed collection of
past experiences acquired heuristically, lacks
iterative refinement and thus hampers agents’
adaptability. In this paper, we introduce the It-
erative Experience Refinement framework, en-
abling LLM agents to refine experiences itera-
tively during task execution. We propose two
fundamental patterns: the successive pattern,
refining based on nearest experiences within a
task batch, and the cumulative pattern, acquir-
ing experiences across all previous task batches.
Augmented with our heuristic experience elim-
ination, the method prioritizes high-quality and
frequently-used experiences, effectively man-
aging the experience space and enhancing effi-
ciency. Extensive experiments show that while
the successive pattern may yield superior re-
sults, the cumulative pattern provides more sta-
ble performance......",https://arxiv.org/abs/2405.04219,Organization,Computation and Language (cs.CL),iterative_experience_refinement_of_20240507,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens"
Language Agents as Digital Representatives in Collective Decision-Making,"Jarrett, Daniel and Pislar, Miruna and Bakker, Michiel A and Tessler, Michael Henry and Koster, Raphael and Balaguer, Jan and Elie, Romuald and Summerfield, Christopher and Tacchetti, Andrea",2023.11.8,"Consider the process of collective decision-making, in which a group of individuals
interactively select a preferred outcome from among a universe of alternatives. In
this context, “representation” is the activity of making an individual’s preferences
present in the process via participation by a proxy agent—i.e. their “representative”.
To this end, learned models of human behavior have the potential to fill this role,
with practical implications for multi-agent scenario studies and mechanism design.
In this work, we investigate the possibility of training language agents to behave
in the capacity of representatives of human agents, appropriately expressing the
preferences of those individuals whom they stand for. First, we formalize the setting
of collective decision-making—as the episodic process of interaction between a
group of agents and a decision mechanism. On this basis, we then formalize the
problem of digital representation—as the simulation of an agent’s behavior to yield
equivalent outcomes from the mechanism. Finally, we conduct an empirical case
study in the setting of consensus-finding among diverse humans, and demonstrate
the feasibility of fine-tuning large language models to act as digital representatives.",https://openreview.net/pdf?id=sv7KZcUqu1,Simulation,,language_agents_as_digital_20231108,Google DeepMind
Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber",2024.2.26,"Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ",https://arxiv.org/abs/2402.16823,Organization,Artificial Intelligence (cs.AI),language_agents_as_optimizable_20240226,"King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI"
Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber",2024.2.26,"Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ",https://arxiv.org/abs/2402.16823,Evolution,Artificial Intelligence (cs.AI),language_agents_as_optimizable_20240226,"King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI"
Large Language Models are Diverse Role-Players for Summarization Evaluation,"Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang",2023.3.27,". Text summarization has a wide range of applications in many scenarios.
The evaluation of the quality of the generated text is a complex problem. A big
challenge to language evaluation is that there is a clear divergence between existing
metrics and human evaluation. A document summary’s quality can be assessed
by human annotators on various criteria, both objective ones like grammar and
correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able
to adequately capture the above dimensions. In this paper, we propose a new
evaluation framework based on LLMs, which provides a comprehensive evaluation
framework by comparing generated text and reference text from both objective and
subjective aspects. First, we propose to model objective and subjective dimensions
of generated text based on roleplayers prompting mechanism. Furthermore, we
introduce a context-based prompting mechanism that is able to generate dynamic
roleplayer profiles based on input context. Finally, we design a multi-roleplayer
prompting technology based on batch prompting and integrate multiple outputs
into the final evaluation results. Experimental results on three real datasets for
summarization show that our model is highly competitive and has a very high
consistency with human annotators.",https://arxiv.org/abs/2303.15078,Organization,Computation and Language (cs.CL),large_language_models_are_20230327,Microsoft
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game,"Qianqiao Xu, Zhiliang Tian, Hongyan Wu, Zhen Huang, Yiping Song, Feng Liu, Dongsheng Li",2024.4.3,"With the enhanced performance of large models on natural language processing
tasks, potential moral and ethical issues of large models arise. There exist ma-
licious attackers who induce large models to jailbreak and generate information
containing illegal, privacy-invasive information through techniques such as prompt
engineering. As a result, large models counter malicious attackers’ attacks using
techniques such as safety alignment. However, the strong defense mechanism
of the large model through rejection replies is easily identified by attackers and
used to strengthen attackers’ capabilities. In this paper, we propose a multi-agent
attacker-disguiser game approach to achieve a weak defense mechanism that allows
the large model to both safely reply to the attacker and hide the defense intent. First,
we construct a multi-agent framework to simulate attack and defense scenarios,
playing different roles to be responsible for attack, disguise, safety evaluation,
and disguise evaluation tasks. After that, we design attack and disguise game
algorithms to optimize the game strategies of the attacker and the disguiser and use
the curriculum learning process to strengthen the capabilities of the agents. The
experiments verify that the method in this paper is more effective in strengthening
the model’s ability to disguise the defense intent compared with other methods.
Moreover, our approach can adapt any black-box large model to assist the model in
defense and does not suffer from model version iterations.",https://arxiv.org/abs/2404.02532,Organization,Artificial Intelligence (cs.AI),learn_to_disguise_avoid_20240403,"National University of Defense Technology, Guangdong University of Foreign Studies, "
Leveraging Large Language Models for Collective Decision-Making,"Marios Papachristou, Longqi Yang, Chin-Chia Hsu",2023.11.3,"In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from conversations and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and the LLM-based system. The system refines and improves its proposed options over time, ensuring that many of the members' individual preferences are satisfied in an equitable way. Finally, we conduct a survey study involving human participants to assess our system's ability to aggregate preferences and reasoning about them. Our findings show that the system exhibits strong performance in both dimensions",https://arxiv.org/abs/2311.04928,Organization,Computation and Language (cs.CL),leveraging_large_language_models_20231103,"Cornell University, Microsoft"
LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang",2023.10.23,"This paper explores the open research prob-
lem of understanding the social behaviors of
LLM-based agents. Using Avalon as a testbed,
we employ system prompts to guide LLM
agents in gameplay. While previous studies
have touched on gameplay with LLM agents,
research on their social behaviors is lacking.
We propose a novel framework, tailored for
Avalon, features a multi-agent system facil-
itating efficient communication and interac-
tion. We evaluate its performance based on
game success and analyze LLM agents’ so-
cial behaviors. Results affirm the framework’s
effectiveness in creating adaptive agents and
suggest LLM-based agents’ potential in nav-
igating dynamic social interactions. By ex-
amining collaboration and confrontation be-
haviors, we offer insights into this field’s re-
search and applications.
Our code is pub-
licly available at https://github.com/
3DAgentWorld/LLM-Game-Agent",https://arxiv.org/abs/2310.14985,Communication,Computation and Language (cs.CL),llm-based_agent_society_investigation_20231023,"The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent"
LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang",2023.10.23,"This paper explores the open research prob-
lem of understanding the social behaviors of
LLM-based agents. Using Avalon as a testbed,
we employ system prompts to guide LLM
agents in gameplay. While previous studies
have touched on gameplay with LLM agents,
research on their social behaviors is lacking.
We propose a novel framework, tailored for
Avalon, features a multi-agent system facil-
itating efficient communication and interac-
tion. We evaluate its performance based on
game success and analyze LLM agents’ so-
cial behaviors. Results affirm the framework’s
effectiveness in creating adaptive agents and
suggest LLM-based agents’ potential in nav-
igating dynamic social interactions. By ex-
amining collaboration and confrontation be-
haviors, we offer insights into this field’s re-
search and applications.
Our code is pub-
licly available at https://github.com/
3DAgentWorld/LLM-Game-Agent",https://arxiv.org/abs/2310.14985,Organization,Computation and Language (cs.CL),llm-based_agent_society_investigation_20231023,"The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent"
LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns,"Xiaoqing Zhang, Xiuying Chen, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Rui Yan",2024.3.22,"In the digital world, influencers are pivotal as opinion leaders, shap-
ing the views and choices of their influencees. Modern advertising
often follows this trend, where marketers choose appropriate in-
fluencers for product endorsements, based on thorough market
analysis. Previous studies on influencer selection have typically
relied on numerical representations of individual opinions and
interactions, a method that simplifies the intricacies of social dy-
namics. With the development of large language models (LLMs),
we now have the opportunity to capture the nuanced exchanges
of information within social networks. Hence, in this work, we
first introduce an Influencer Dynamics Simulator (IDS), helping
promoters identify and select the right influencers to market their
products, based on LLM simulation. Concretely, we first propose an
influencer-influencee engagement-based pre-selection module to
screen potential influencer candidates. Subsequently, a simulation is
constructed for these candidates and their influencees. Each user is
represented as an LLM-based agent, drawing from their interaction
history to deduce their profile and interests. The influencee agents
will predict their behavior in response to influencer advertising. Fi-
nally, we develop a ranking metric designed to pinpoint influencers
who are most likely to drive product purchases based on feedback
from their influencees. To evaluate our framework, we collect a
real-world advertising network dataset, including social relations,
post and comment content, and user behaviors.......",https://arxiv.org/abs/2403.15105,Simulation,Social and Information Networks (cs.SI),llm-driven_agents_for_influencer_20240322,"Renmin University of China, King Abdullah University of Science and Technology, Moonshot AI"
LM vs LM: Detecting Factual Errors via Cross Examination,"Roi Cohen, May Hamri, Mor Geva, Amir Globerson",2023.5.22,"A prominent weakness of modern language
models (LMs) is their tendency to generate fac-
tually incorrect text, which hinders their us-
ability. A natural question is whether such fac-
tual errors can be detected automatically. In-
spired by truth-seeking mechanisms in law, we
propose a factuality evaluation framework for
LMs that is based on cross-examination. Our
key idea is that an incorrect claim is likely to
result in inconsistency with other claims that
the model generates. To discover such incon-
sistencies, we facilitate a multi-turn interaction
between the LM that generated the claim and
another LM (acting as an examiner) which in-
troduces questions to discover inconsistencies.
We empirically evaluate our method on factual
claims made by multiple recent LMs on four
benchmarks, ﬁnding that it outperforms exist-
ing methods and baselines, often by a large
gap. Our results demonstrate the potential of
using interacting LMs to capture factual errors.",https://arxiv.org/abs/2305.13281,Communication,Computation and Language (cs.CL),lm_vs_lm_detecting_20230522,"Tel Aviv University, Google DeepMind, Google Research"
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration,"Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang",2024.2.18,"Large language models (LLMs) have demon-
strated impressive performance in understand-
ing language and executing complex reasoning
tasks. However, LLMs with long context win-
dows have been notorious for their expensive
training costs and high inference latency. Even
the most advanced models such as GPT-4 and
Claude2 often make mistakes when processing
inputs of over 100k tokens, a phenomenon also
known as lost in the middle. In this paper,
we propose LONGAGENT, a method based
on multi-agent collaboration, which scales
LLMs (e.g., LLaMA) to a context of 128K and
demonstrates potential superiority in long-text
processing compared to GPT-",https://arxiv.org/abs/2402.11550,Organization,Computation and Language (cs.CL),longagent_scaling_language_models_20240218,Fudan University
Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn",2023.10.3,"Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.",https://arxiv.org/abs/2310.02172,Evolution,Human-Computer Interaction (cs.HC),lyfe_agents_generative_agents_20231003,"Massachusetts Institute of Technology, Peking University, LyfeAL"
Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn",2023.10.3,"Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.",https://arxiv.org/abs/2310.02172,Simulation,Human-Computer Interaction (cs.HC),lyfe_agents_generative_agents_20231003,"Massachusetts Institute of Technology, Peking University, LyfeAL"
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun",2023.10.10,"Significant advancements have occurred in the application of Large Language
Models (LLMs) for various tasks and social simulations. Despite this, their capac-
ities to coordinate within task-oriented social contexts are under-explored. Such
capabilities are crucial if LLMs are to effectively mimic human-like social be-
havior and produce meaningful results. To bridge this gap, we introduce collab-
orative generative agents, endowing LLM-based Agents with consistent behavior
patterns and task-solving abilities. We situate these agents in a simulated job fair
environment as a case study to scrutinize their coordination skills. We propose
a novel framework that equips collaborative generative agents with human-like
reasoning abilities and specialized skills. Our evaluation demonstrates that these
agents show promising performance. However, we also uncover limitations that
hinder their effectiveness in more complex coordination tasks. Our work provides
valuable insights into the role and evolution of LLMs in task-oriented social sim-
ulations.",https://arxiv.org/abs/2310.06500,Organization,Artificial Intelligence (cs.AI),metaagents_simulating_interactions_of_20231010,"University of Cambridge, William & Mary, Lehigh University"
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun",2023.10.10,"Significant advancements have occurred in the application of Large Language
Models (LLMs) for various tasks and social simulations. Despite this, their capac-
ities to coordinate within task-oriented social contexts are under-explored. Such
capabilities are crucial if LLMs are to effectively mimic human-like social be-
havior and produce meaningful results. To bridge this gap, we introduce collab-
orative generative agents, endowing LLM-based Agents with consistent behavior
patterns and task-solving abilities. We situate these agents in a simulated job fair
environment as a case study to scrutinize their coordination skills. We propose
a novel framework that equips collaborative generative agents with human-like
reasoning abilities and specialized skills. Our evaluation demonstrates that these
agents show promising performance. However, we also uncover limitations that
hinder their effectiveness in more complex coordination tasks. Our work provides
valuable insights into the role and evolution of LLMs in task-oriented social sim-
ulations.",https://arxiv.org/abs/2310.06500,Simulation,Artificial Intelligence (cs.AI),metaagents_simulating_interactions_of_20231010,"University of Cambridge, William & Mary, Lehigh University"
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework,"Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber",2023.8.1,"Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, are complicated through logic inconsistencies due to
cascading hallucinations caused by naively chaining LLMs. Here we introduce
MetaGPT, an innovative meta-programming framework incorporating efficient
human workflows into LLM-based multi-agent collaborations.
MetaGPT en-
codes Standardized Operating Procedures (SOPs) into prompt sequences for more
streamlined workflows, thus allowing agents with human-like domain expertise
to verify intermediate results and reduce errors. MetaGPT utilizes an assembly
line paradigm to assign diverse roles to various agents, efficiently breaking down
complex tasks into subtasks involving many agents working together. On col-
laborative software engineering benchmarks, MetaGPT generates more coherent
solutions than previous chat-based multi-agent systems. Our project can be found
at https://github.com/geekan/MetaGPT",https://arxiv.org/abs/2308.00352,Organization,Artificial Intelligence (cs.AI),metagpt_meta_programming_for_20230801,"DeepWisdom, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong (Shenzhen), Nanjing University, University of Pennsylvania University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI"
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework,"Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun",2024.3.20,"Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.",https://arxiv.org/abs/2403.13248,Organization,Computer Vision and Pattern Recognition (cs.CV),mora_enabling_generalist_video_20240320,"Lehigh University, Microsoft Research"
Multi-Agent Software Development through Cross-Team Collaboration,"Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang",2024.6.13,"The latest breakthroughs in Large Language
Models (LLMs), e.g., ChatDev, have catalyzed
profound transformations, particularly through
multi-agent collaboration for software devel-
opment. LLM agents can collaborate in teams
like humans, and follow the waterfall model
to sequentially work on requirements analysis,
development, review, testing, and other phases
to perform autonomous software generation.
However, for an agent team, each phase in a
single development process yields only one pos-
sible outcome. This results in the completion
of only one development chain, thereby losing
the opportunity to explore multiple potential
decision paths within the solution space. Con-
sequently, this may lead to obtaining subop-
timal results. To address this challenge, we
introduce Cross-Team Collaboration (CTC),
a scalable multi-team framework that enables
orchestrated teams to jointly propose various
decisions and communicate with their insights
in a cross-team collaboration environment for
superior content generation. Experimental re-
sults in software development reveal a notable
increase in quality compared to state-of-the-
art baselines, underscoring the efficacy of our
framework. The significant improvements in
story generation demonstrate the promising
generalization ability of our framework across
various domains. We anticipate that our work
will guide LLM agents towards a cross-team
paradigm and contribute to their significant
growth in but not limited to software devel-
opment. The code and data will be available at
https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.08979,Organization,Computation and Language (cs.CL),multi-agent_software_development_through_20240613,"Zhejiang University, Tsinghua University, Beijing University of Posts and Telecommunications"
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate,"Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang",2024.6.20,"Large Language Models (LLMs) have shown
exceptional results on current benchmarks
when working individually. The advancement
in their capabilities, along with a reduction in
parameter size and inference times, has facil-
itated the use of these models as agents, en-
abling interactions among multiple models to
execute complex tasks. Such collaborations
offer several advantages, including the use of
specialized models (e.g. coding), improved
confidence through multiple computations, and
enhanced divergent thinking, leading to more
diverse outputs. Thus, the collaborative use of
language models is expected to grow signifi-
cantly in the coming years. In this work, we
evaluate the behavior of a network of models
collaborating through debate under the influ-
ence of an adversary. We introduce pertinent
metrics to assess the adversary’s effectiveness,
focusing on system accuracy and model agree-
ment. Our findings highlight the importance
of a model’s persuasive ability in influencing
others. Additionally, we explore inference-time
methods to generate more compelling argu-
ments and evaluate the potential of prompt-
based mitigation as a defensive strategy.",https://arxiv.org/abs/2406.14711v1,Organization,Computation and Language (cs.CL),multiagent_collaboration_attack_investigating_20240620,"UC Santa Barbara, Rutgers University"
On Generative Agents in Recommendation,"An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, Tat-Seng Chua",2023.10.16,"Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a user simulator in recommendation, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using real-world datasets (e.g. MovieLens, Steam, Amazon-Book), capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized recommender models in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: ``To what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems?'' Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks.",https://arxiv.org/abs/2310.10108,Simulation,Information Retrieval (cs.IR),on_generative_agents_in_20231016,"National University of Singapore, Tsinghua University, University of Science and Technology of China"
"Out of One, Many: Using Language Models to Simulate Human Samples","Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate",2022.9.14,"We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the ""algorithmic bias"" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property ""algorithmic fidelity"" and explore its extent in GPT-3. We create ""silicon samples"" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.",https://arxiv.org/abs/2209.06899,Simulation,Machine Learning (cs.LG),out_of_one_many_20220914,Brigham Young University
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games,"Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He",2024.4.26,"We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.",https://arxiv.org/abs/2404.17662,Communication,Computation and Language (cs.CL),player_enhancing_llm-based_multi-agent_20240426,"King’s College London, Huawei London Research Centre, The Alan Turing Institute"
Quantifying the Impact of Large Language Models on Collective Opinion Dynamics,"Chao Li, Xing Su, Haoying Han, Cong Xue, Chunmo Zheng, Chao Fan",2023.8.7,"The process of opinion expression and exchange is a critical component of democratic societies. As people interact with large language models (LLMs) in the opinion shaping process different from traditional media, the impacts of LLMs are increasingly recognized and being concerned. However, the knowledge about how LLMs affect the process of opinion expression and exchange of social opinion networks is very limited. Here, we create an opinion network dynamics model to encode the opinions of LLMs, cognitive acceptability and usage strategies of individuals, and simulate the impact of LLMs on opinion dynamics in a variety of scenarios. The outcomes of the simulations inform about effective demand-oriented opinion network interventions. The results from this study suggested that the output opinion of LLMs has a unique and positive effect on the collective opinion difference. The marginal effect of cognitive acceptability on collective opinion formation is nonlinear and shows a decreasing trend. When people partially rely on LLMs, the exchange process of opinion becomes more intense and the diversity of opinion becomes more favorable. In fact, there is 38.6% more opinion diversity when people all partially rely on LLMs, compared to prohibiting the use of LLMs entirely. The optimal diversity of opinion was found when the fractions of people who do not use, partially rely on, and fully rely on LLMs reached roughly 4:12:1. Our experiments also find that introducing extra agents with opposite/neutral/random opinions, we can effectively mitigate the impact of biased/toxic output from LLMs. Our findings provide valuable insights into opinion dynamics in the age of LLMs, highlighting the need for customized interventions tailored to specific scenarios to address the drawbacks of improper output and use of LLMs.",https://arxiv.org/abs/2308.03313,Simulation,Social and Information Networks (cs.SI),quantifying_the_impact_of_20230807," Zhejiang University, Clemson University, "
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs,"Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal",2023.9.22,"Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance.",https://arxiv.org/abs/2309.13007,Organization,Computation and Language (cs.CL),reconcile_round-table_conference_improves_20230922,UNC Chapel Hill
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?,"Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song",2024.2.28,"Recent progress in LLMs discussion suggests
that multi-agent discussion improves the rea-
soning abilities of LLMs. In this work, we
reevaluate this claim through systematic experi-
ments, where we propose a novel group discus-
sion framework to enrich the set of discussion
mechanisms. Interestingly, our results show
that a single-agent LLM with strong prompts
can achieve almost the same performance as
the best existing discussion approach on a wide
range of reasoning tasks and backbone LLMs.
We observe that the multi-agent discussion per-
forms better than a single agent only when there
is no demonstration in the prompt. Further
study reveals the common interaction mecha-
nisms of LLMs during the discussion.",https://arxiv.org/abs/2402.18272,Organization,Computation and Language (cs.CL),rethinking_the_bounds_of_20240228,"Zhejiang University, HKUST, UIUC"
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models,"Zhao Mandi, Shreeya Jain, Shuran Song",2023.7.10,": We propose a novel approach to multi-robot collaboration that har-
nesses the power of pre-trained large language models (LLMs) for both high-level
communication and low-level path planning. Robots are equipped with LLMs to
discuss and collectively reason task strategies. They then generate sub-task plans
and task space waypoint paths, which are used by a multi-arm motion planner to
accelerate trajectory planning. We also provide feedback from the environment,
such as collision checking, and prompt the LLM agents to improve their plan and
waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task bench-
mark covering a wide range of multi-robot collaboration scenarios, accompanied
by a text-only dataset for agent representation and reasoning. We experimentally
demonstrate the effectiveness of our approach – it achieves high success rates
across all tasks in RoCoBench and adapts to variations in task semantics. Our di-
alog setup offers high interpretability and flexibility – in real world experiments,
we show RoCo easily incorporates human-in-the-loop, where a user can commu-
nicate and collaborate with a robot agent to complete tasks together. See project
website project-roco.github.io for videos and code.",https://arxiv.org/abs/2307.04738,Communication,Robotics (cs.RO),roco_dialectic_multi-robot_collaboration_20230710,Columbia University
S3: Social-network Simulation System with Large Language Model-Empowered Agents,"Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, Yong Li",2023.7.27,"Simulation plays a crucial role in addressing various challenges within social
science. It offers extensive applications such as state prediction, phenomena ex-
planation, and policy-making support, among others. In this work, we harness the
human-like capabilities of large language models (LLMs) in sensing, reasoning,
and behaving, and utilize these qualities to construct the S3 system (short for
Social network Simulation System). Adhering to the widely employed agent-based
simulation paradigm, we employ fine-tuning and prompt engineering techniques to
ensure that the agent’s behavior closely emulates that of a genuine human within
the social network. Specifically, we simulate three pivotal aspects: emotion, at-
titude, and interaction behaviors. By endowing the agent in the system with the
ability to perceive the informational environment and emulate human actions, we
observe the emergence of population-level phenomena, including the propagation
of information, attitudes, and emotions. We conduct an evaluation encompassing
two levels of simulation, employing real-world social network data. Encouragingly,
the results demonstrate promising accuracy. This work represents an initial step in
the realm of social network simulation empowered by LLM-based agents. We an-
ticipate that our endeavors will serve as a source of inspiration for the development
of simulation systems within, but not limited to, social science.",https://arxiv.org/abs/2307.14984,Simulation,Social and Information Networks (cs.SI),s3_social-network_simulation_system_20230727,Tsinghua University
Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?,"Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan",2023.9.27,"— A flurry of recent work has demonstrated that
pre-trained large language models (LLMs) can be effective
task planners for a variety of single-robot tasks. The planning
performance of LLMs is significantly improved via prompting
techniques, such as in-context learning or re-prompting with
state feedback, placing new importance on the token budget
for the context window. An under-explored but natural next
direction is to investigate LLMs as multi-robot task planners.
However, long-horizon, heterogeneous multi-robot planning
introduces new challenges of coordination while also pushing
up against the limits of context window length. It is therefore
critical to find token-efficient LLM planning frameworks that
are also able to reason about the complexities of multi-robot
coordination. In this work, we compare the task success rate and
token efficiency of four multi-agent communication frameworks
(centralized, decentralized, and two hybrid) as applied to
four coordination-dependent multi-agent 2D task scenarios for
increasing numbers of agents. We find that a hybrid framework
achieves better task success rates across all four tasks and
scales better to more agents. We further demonstrate the hybrid
frameworks in 3D simulations where the vision-to-text problem
and dynamical errors are considered. ",https://arxiv.org/abs/2309.15943,Organization,Robotics (cs.RO),scalable_multi-robot_collaboration_with_20230927,"Massachusetts Institute of Technology, Harvard University, MIT-IBM Watson AI Lab. "
Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun",2024.6.11,"Pioneering advancements in large language
model-powered agents have underscored the
design pattern of multi-agent collaboration,
demonstrating that collective intelligence can
surpass the capabilities of each individual. In-
spired by the neural scaling law, which posits
that increasing neurons leads to emergent abil-
ities, this study investigates whether a simi-
lar principle applies to increasing agents in
multi-agent collaboration.
Technically, we
propose ::multi-agent
:collaboration
::
networks
(MACNET), which utilize directed acyclic
graphs to organize agents and streamline their
interactive reasoning via topological ordering,
with solutions derived from their dialogues.
Extensive experiments show that MACNET
consistently outperforms baseline models, en-
abling effective agent collaboration across var-
ious network topologies and supporting coop-
eration among more than a thousand agents.
Notably, we observed a small-world collabo-
ration phenomenon, where topologies resem-
bling small-world properties achieved supe-
rior performance. Additionally, we identified
a collaborative scaling law, indicating that
normalized solution quality follows a logistic
growth pattern as scaling agents, with collabo-
rative emergence occurring much earlier than
previously observed instances of neural emer-
gence. The code and data will be available at
https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.07155,Organization,Artificial Intelligence (cs.AI),scaling_large-language-model-based_multi-agent_collaboration_20240611,"Tsinghua University, Beijing University of Posts and Telecommunications"
Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun",2024.6.11,"Pioneering advancements in large language
model-powered agents have underscored the
design pattern of multi-agent collaboration,
demonstrating that collective intelligence can
surpass the capabilities of each individual. In-
spired by the neural scaling law, which posits
that increasing neurons leads to emergent abil-
ities, this study investigates whether a simi-
lar principle applies to increasing agents in
multi-agent collaboration.
Technically, we
propose ::multi-agent
:collaboration
::
networks
(MACNET), which utilize directed acyclic
graphs to organize agents and streamline their
interactive reasoning via topological ordering,
with solutions derived from their dialogues.
Extensive experiments show that MACNET
consistently outperforms baseline models, en-
abling effective agent collaboration across var-
ious network topologies and supporting coop-
eration among more than a thousand agents.
Notably, we observed a small-world collabo-
ration phenomenon, where topologies resem-
bling small-world properties achieved supe-
rior performance. Additionally, we identified
a collaborative scaling law, indicating that
normalized solution quality follows a logistic
growth pattern as scaling agents, with collabo-
rative emergence occurring much earlier than
previously observed instances of neural emer-
gence. The code and data will be available at
https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.07155,Communication,Artificial Intelligence (cs.AI),scaling_large-language-model-based_multi-agent_collaboration_20240611,"Tsinghua University, Beijing University of Posts and Telecommunications"
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization,"Yoichi Ishibashi, Yoshimasa Nishimura",2024.4.2,"Recent advancements in automatic code gener-
ation using large language model (LLM) agent
have brought us closer to the future of auto-
mated software development. However, exist-
ing single-agent approaches face limitations
in generating and improving large-scale, com-
plex codebases due to constraints in context
length. To tackle this challenge, we propose
Self-Organized multi-Agent framework (SoA),
a novel multi-agent framework that enables the
scalable and efficient generation and optimiza-
tion of large-scale code. In SoA, self-organized
agents operate independently to generate and
modify code components while seamlessly col-
laborating to construct the overall codebase. A
key feature of our framework is the automatic
multiplication of agents based on problem com-
plexity, allowing for dynamic scalability. This
enables the overall code volume to be increased
indefinitely according to the number of agents,
while the amount of code managed by each
agent remains constant. We evaluate SoA on
the HumanEval benchmark and demonstrate
that, compared to a single-agent system, each
agent in SoA handles significantly less code,
yet the overall generated code is substantially
greater. Moreover, SoA surpasses the powerful
single-agent baseline by 5%......",https://arxiv.org/abs/2404.02183,Organization,Software Engineering (cs.SE),self-organized_agents_a_llm_20240402,TsukushiAI
Simulating Opinion Dynamics with Networks of LLM-based Agents,"Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers",2023.11.16,"Accurately simulating human opinion dynam-
ics is crucial for understanding a variety of soci-
etal phenomena, including polarization and the
spread of misinformation. However, the agent-
based models (ABMs) commonly used for such
simulations often over-simplify human behav-
ior. We propose a new approach to simulat-
ing opinion dynamics based on populations of
Large Language Models (LLMs). Our findings
reveal a strong inherent bias in LLM agents to-
wards producing accurate information, leading
simulated agents to consensus in line with sci-
entific reality. This bias limits their utility for
understanding resistance to consensus views
on issues like climate change. After induc-
ing confirmation bias through prompt engineer-
ing, however, we observed opinion fragmenta-
tion in line with existing agent-based modeling
and opinion dynamics research. These insights
highlight the promise and limitations of LLM
agents in this domain and suggest a path for-
ward: refining LLMs with real-world discourse
to better simulate the evolution of human be-
liefs.",https://arxiv.org/abs/2311.09618,Simulation,Physics and Society (physics.soc-ph),simulating_opinion_dynamics_with_20231116,University of Wisconsin-Madison
Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms,"Petter Törnberg, Diliara Valeeva, Justus Uitermark, Christopher Bail",2023.10.5,". Social media is often criticized for amplifying
toxic discourse and discouraging constructive conversa-
tions. But designing social media platforms to promote
better conversations is inherently challenging. This paper
asks whether simulating social media through a combina-
tion of Large Language Models (LLM) and Agent-Based
Modeling can help researchers study how different news
feed algorithms shape the quality of online conversations.
We create realistic personas using data from the Ameri-
can National Election Study to populate simulated social
media platforms. Next, we prompt the agents to read
and share news articles — and like or comment upon
each other’s messages — within three platforms that use
different news feed algorithms. In the first platform, users
see the most liked and commented posts from users whom
they follow. In the second, they see posts from all users —
even those outside their own network. The third platform
employs a novel “bridging” algorithm that highlights posts
that are liked by people with opposing political views. We
find this bridging algorithm promotes more constructive,
non-toxic, conversation across political divides than the
other two models. Though further research is needed to
evaluate these findings, we argue that LLMs hold consid-
erable potential to improve simulation research on social
media and many other complex social settings.",https://arxiv.org/abs/2310.05984,Simulation,Social and Information Networks (cs.SI),simulating_social_media_using_20231005,"University of Amsterdam, Duke University"
Social Simulacra: Creating Populated Prototypes for Social Computing Systems,"Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2022.8.8,"Social computing prototypes probe the social behaviors that may
arise in an envisioned system design. This prototyping practice
is currently limited to recruiting small groups of people. Unfortu-
nately, many challenges do not arise until a system is populated
at a larger scale. Can a designer understand how a social system
might behave when populated, and make adjustments to the de-
sign before the system falls prey to such challenges? We intro-
duce social simulacra, a prototyping technique that generates a
breadth of realistic social interactions that may emerge when a so-
cial computing system is populated. Social simulacra take as input
the designer’s description of a community’s design—goal, rules, and
member personas—and produce as output an instance of that design
with simulated behavior, including posts, replies, and anti-social
behaviors. We demonstrate that social simulacra shift the behaviors
that they generate appropriately in response to design changes, and
that they enable exploration of “what if?” scenarios where commu-
nity members or moderators intervene. To power social simulacra,
we contribute techniques for prompting a large language model
to generate thousands of distinct community members and their
social interactions with each other; these techniques are enabled by
the observation that large language models’ training data already
includes a wide variety of positive and negative behavior on social
media platforms. In evaluations, we show that participants are of-
ten unable to distinguish social simulacra from actual community
behavior and that social computing designers successfully refine
their social computing designs when using social simulacra.
",https://arxiv.org/abs/2208.04024,Simulation,Human-Computer Interaction (cs.HC),social_simulacra_creating_populated_20220808,"Stanford University, Google Research"
"StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving","Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam",2023.11.15,"Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% → 38.8\%), commonsense reasoning (70.3\% → 72.5\%), algorithmic reasoning (73.7\% → 85.0\%), and symbolic reasoning (30.0\% → 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.",https://arxiv.org/abs/2311.08803,Organization,Computation and Language (cs.CL),strategyllm_large_language_models_20231115,"The Chinese University of Hong Kong, Sun Yat-sen University, Tencent AI Lab"
The Impact of Language on Arithmetic Proficiency- A Multilingual Investigation with Cross-Agent Checking Computation,"Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao",2024.6.16,"This paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.",https://aclanthology.org/2024.naacl-short.53.pdf,Communication,,the_impact_of_language_20240616,"AIST, University of Tokyo"
The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents,"Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers",2023.11.16,"Human groups are able to converge on more accurate beliefs through deliberation,
even in the presence of polarization and partisan bias — a phenomenon known as
the “wisdom of partisan crowds.” Generated agents powered by Large Language
Models (LLMs) are increasingly used to simulate human collective behavior, yet
few benchmarks exist for evaluating their dynamics against the behavior of hu-
man groups. In this paper, we examine the extent to which the wisdom of partisan
crowds emerges in groups of LLM-based agents that are prompted to role-play
as partisan personas (e.g., Democrat or Republican). We find that they not only
display human-like partisan biases, but also converge to more accurate beliefs
through deliberation as humans do. We then identify several factors that interfere
with convergence, including the use of chain-of-thought prompt and lack of details
in personas. Conversely, fine-tuning on human data appears to enhance conver-
gence. These findings show the potential and limitations of LLM-based agents as
a model of human collective intelligence.",https://arxiv.org/abs/2311.09665,Simulation,Computation and Language (cs.CL),the_wisdom_of_partisan_20231116,University of Wisconsin-Madison
Theory of Mind for Multi-Agent Collaboration via Large Language Models,"Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara",2023.10.16,"While Large Language Models (LLMs) have
demonstrated impressive accomplishments in
both reasoning and planning, their abilities
in multi-agent collaborations remains largely
unexplored.
This study evaluates LLM-
based agents in a multi-agent cooperative text
game with Theory of Mind (ToM) inference
tasks, comparing their performance with Multi-
Agent Reinforcement Learning (MARL) and
planning-based baselines. We observed evi-
dence of emergent collaborative behaviors and
high-order Theory of Mind capabilities among
LLM-based agents. Our results reveal limi-
tations in LLM-based agents’ planning opti-
mization due to systematic failures in managing
long-horizon contexts and hallucination about
the task state. We explore the use of explicit
belief state representations to mitigate these is-
sues, finding that it enhances task performance
and the accuracy of ToM inferences for LLM-
based agents.",https://arxiv.org/abs/2310.10701,Communication,Computation and Language (cs.CL),theory_of_mind_for_20231016,"University of Pittsburgh, Carnegie Mellon University"
To Infinity and Beyond- SHOW-1 and Showrunner Agents in Multi-Agent Simulations,"Philipp Maas, Frank Carey, Chris Wheeler, Edward Saatchi, Pete Billington, Jessica Yaffa Shamash",2023.7.24,"In this work we present our approach to generating high-quality episodic content for IP’s (Intellectual Property) using large language models (LLMs), custom state-of- the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control. Powerful LLMs such as GPT-4 were trained on a large corpus of TV show data which lets us believe that with the right guidance users will be able to rewrite entire seasons.""That Is What Entertainment Will Look Like. Maybe people are still upset about the last season of Game of Thrones. Imagine if you could ask your A.I. to make a new ending that goes a different way and maybe even put yourself in there as a main character or something.”. ",https://fablestudio.github.io/showrunner-agents/static/pdfs/To_Infinity_and_Beyond_SHOW-1_And_Showrunner_Agents_in_Multi_Agent_Simulations_v2.pdf,Simulation,,to_infinity_and_beyond_20230724,Fable Studio
Toward Optimal LLM Alignments Using Two-Player Games,"Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu",2024.6.16,"Alignment of large language models is a critical process designed to ensure that
the model’s responses to user prompts accurately reflect human intentions and
adhere to societal values. The standard Reinforcement Learning from Human
Feedback (RLHF) framework primarily focuses on optimizing the performance of
large language models using pre-collected prompts. However, collecting prompts
that provide comprehensive coverage is both tedious and challenging, and often
fails to include scenarios that LLMs need to improve on the most. In this paper,
we investigate alignment through the lens of two-agent games, involving iterative
interactions between an adversarial and a defensive agent. The adversarial agent’s
task at each step is to generate prompts that expose the weakness of the defensive
agent. In return, the defensive agent seeks to improve its responses to these newly
identified prompts it “struggled"" with, based on feedback from the reward model.
We theoretically demonstrate that this iterative reinforcement learning optimization
converges to a Nash Equilibrium for the game induced by the agents. Experi-
mental results in safety scenarios demonstrate that learning in such a competitive
environment not only fully trains agents but also leads to policies with enhanced
generalization capabilities for both adversarial and defensive agents. Our code is
released at https://github.com/ruizheng20/gpo.",https://arxiv.org/abs/2406.10977,Communication,Computation and Language (cs.CL),toward_optimal_llm_alignments_20240616,"Fudan University, Northwestern University, ByteDance Research"
Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework,"Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan",2024.6.5,"The advent of large language models (LLMs)
has facilitated the development of natural lan-
guage text generation. It also poses unprece-
dented challenges, with content hallucination
emerging as a significant concern. Existing
solutions often involve expensive and complex
interventions during the training process. More-
over, some approaches emphasize problem dis-
assembly while neglecting the crucial valida-
tion process, leading to performance degrada-
tion or limited applications. To overcome these
limitations, we propose a Markov Chain-based
multi-agent debate verification framework to
enhance hallucination detection accuracy in
concise claims. Our method integrates the fact-
checking process, including claim detection,
evidence retrieval, and multi-agent verification.
In the verification stage, we deploy multiple
agents through flexible Markov Chain-based
debates to validate individual claims, ensuring
meticulous verification outcomes. Experimen-
tal results across three generative tasks demon-
strate that our approach achieves significant
improvements over baselines.",https://arxiv.org/abs/2406.03075,Communication,Computation and Language (cs.CL),towards_detecting_llms_hallucination_20240605,"Peking University, Renmin University of China"
TraveLER: A Multi-LMM Agent Framework for Video Question-Answering,"Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig",2024.4.1,"Recently, Large Multimodal Models (LMMs) have made significant progress
in video question-answering using a frame-wise approach by leveraging
large-scale, image-based pretraining in a zero-shot manner. While image-
based methods for videos have shown impressive performance, a current
limitation is that they often overlook how key timestamps are selected and
cannot adjust when incorrect timestamps are identified. Moreover, they are
unable to extract details relevant to the question, instead providing general
descriptions of the frame. To overcome this, we design a multi-LMM agent
framework that travels along the video, iteratively collecting relevant in-
formation from keyframes through interactive question-asking until there
is sufficient information to answer the question. Specifically, we propose
TraveLER, a model that can create a plan to “Traverse” through the video,
ask questions about individual frames to “Locate” and store key informa-
tion, and then “Evaluate” if there is enough information to answer the
question. Finally, if there is not enough information, our method is able to
“Replan” based on its collected knowledge. Through extensive experiments,
we find that the proposed TraveLER approach improves performance on
several video question-answering benchmarks, such as NExT-QA, STAR,
and Perception Test, without the need to fine-tune on specific datasets.",https://arxiv.org/abs/2404.01476,Organization,Computer Vision and Pattern Recognition (cs.CV),traveler_a_multi-lmm_agent_20240401,"University of California, Berkeley"
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration,"Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji",2023.7.11,"Human intelligence thrives on cognitive syn-
ergy, where collaboration among different
minds yield superior outcomes compared to iso-
lated individuals. In this work, we propose Solo
Performance Prompting (SPP), which trans-
forms a single LLM into a cognitive synergist
by engaging in multi-turn self-collaboration
with multiple personas.
A cognitive syner-
gist is an intelligent agent that collaboratively
combines multiple minds’ strengths and knowl-
edge to enhance problem-solving in complex
tasks. By dynamically identifying and simu-
lating different personas based on task inputs,
SPP unleashes the potential of cognitive syn-
ergy in LLMs. Our in-depth analysis shows
that assigning multiple fine-grained personas
in LLMs improves problem-solving abilities
compared to using a single or fixed number
of personas. We evaluate SPP on three chal-
lenging tasks: Trivia Creative Writing, Code-
names Collaborative, and Logic Grid Puzzle,
encompassing both knowledge-intensive and
reasoning-intensive types. Unlike previous
works, such as Chain-of-Thought, that solely
enhance the reasoning abilities in LLMs, ex-
perimental results demonstrate that SPP effec-
tively reduces factual hallucination, and main-
tains strong reasoning capabilities. Addition-
ally, comparative experiments show that cog-
nitive synergy only emerges in GPT-4 and
does not appear in less capable models, such
as GPT-",https://arxiv.org/abs/2307.05300,Organization,Artificial Intelligence (cs.AI),unleashing_the_emergent_cognitive_20230711,"University of Illinois Urbana-Champaign, Microsoft Research Asia"
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation,"Xinyi Mou, Zhongyu Wei, Xuanjing Huang",2024.2.26,"Social media has emerged as a cornerstone of
social movements, wielding significant influ-
ence in driving societal change. Simulating
the response of the public and forecasting the
potential impact has become increasingly im-
portant. However, existing methods for simu-
lating such phenomena encounter challenges
concerning their efficacy and efficiency in cap-
turing the behaviors of social movement par-
ticipants. In this paper, we introduce a hybrid
framework HiSim for social media user simu-
lation, wherein users are categorized into two
types. Core users are driven by Large Lan-
guage Models, while numerous ordinary users
are modeled by deductive agent-based models.
We further construct a Twitter-like environment
to replicate their response dynamics following
trigger events. Subsequently, we develop a
multi-faceted benchmark SoMoSiMu-Bench
for evaluation and conduct comprehensive ex-
periments across real-world datasets. Exper-
imental results demonstrate the effectiveness
and flexibility of our method",https://arxiv.org/abs/2402.16333,Simulation,Computers and Society (cs.CY),unveiling_the_truth_and_20240226,"Fudan University, Shanghai Collaborative Innovation Center of Intelligent Visual Computing"
User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen",2023.6.5,"Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.",https://arxiv.org/abs/2306.02552,Organization,Information Retrieval (cs.IR),user_behavior_simulation_with_20230605,"Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London"
User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen",2023.6.5,"Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.",https://arxiv.org/abs/2306.02552,Simulation,Information Retrieval (cs.IR),user_behavior_simulation_with_20230605,"Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London"
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies,"Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai",2022.8.18,"We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a ""hyper-accuracy distortion"" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.",https://arxiv.org/abs/2208.10264,Simulation,Computation and Language (cs.CL),using_large_language_models_20220818,"Olin College of Engineering, Georgia Tech, Microsoft Research"
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang",2023.11.28,"Can we avoid wars at the crossroads of history? This question has been pursued by
individuals, scholars, policymakers, and organizations throughout human history.
In this research, we attempt to answer the question based on the recent advances
of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose
WarAgent, an LLM-powered multi-agent AI system, to simulate the participating
countries, their decisions, and the consequences, in historical international conflicts,
including the World War I (WWI), the World War II (WWII), and the Warring
States Period (WSP) in Ancient China. By evaluating the simulation effectiveness,
we examine the advancements and limitations of cutting-edge AI systems’ abilities
in studying complex collective human behaviors such as international conflicts
under diverse settings. In these simulations, the emergent interactions among
agents also offer a novel perspective for examining the triggers and conditions that
lead to war. Our findings offer data-driven and AI-augmented insights that can
redefine how we approach conflict resolution and peacekeeping strategies. The
implications stretch beyond historical analysis, offering a blueprint for using AI to
understand human history and possibly prevent future international conflicts. Code
and data are available at https://github.com/agiresearch/WarAgent.",https://arxiv.org/abs/2311.17227,Simulation,Artificial Intelligence (cs.AI),war_and_peace_(waragent)_20231128,Rutgers University
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang",2023.11.28,"Can we avoid wars at the crossroads of history? This question has been pursued by
individuals, scholars, policymakers, and organizations throughout human history.
In this research, we attempt to answer the question based on the recent advances
of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose
WarAgent, an LLM-powered multi-agent AI system, to simulate the participating
countries, their decisions, and the consequences, in historical international conflicts,
including the World War I (WWI), the World War II (WWII), and the Warring
States Period (WSP) in Ancient China. By evaluating the simulation effectiveness,
we examine the advancements and limitations of cutting-edge AI systems’ abilities
in studying complex collective human behaviors such as international conflicts
under diverse settings. In these simulations, the emergent interactions among
agents also offer a novel perspective for examining the triggers and conditions that
lead to war. Our findings offer data-driven and AI-augmented insights that can
redefine how we approach conflict resolution and peacekeeping strategies. The
implications stretch beyond historical analysis, offering a blueprint for using AI to
understand human history and possibly prevent future international conflicts. Code
and data are available at https://github.com/agiresearch/WarAgent.",https://arxiv.org/abs/2311.17227,Organization,Artificial Intelligence (cs.AI),war_and_peace_(waragent)_20231128,Rutgers University