mirror of
https://github.com/OpenBMB/ChatDev.git
synced 2026-04-25 19:28:09 +00:00
59 KiB
Executable File
59 KiB
Executable File
| 1 | image_path | title | author | summary | affiliation | |
|---|---|---|---|---|---|---|
| 2 | 0 | ./images/2d.png | (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang | Recent advancements in machine translation (MT) have significantly enhancedtranslation quality across various domains. However, the translation of literarytexts remains a formidable challenge due to their complex language, figurative ex-pressions, and cultural nuances. In this work, we introduce a novel multi-agentframework based on large language models (LLMs) for literary translation, im-plemented as a company called TRANSAGENTS, which mirrors traditional trans-lation publication process by leveraging the collective capabilities of multipleagents, to address the intricate demands of translating literary works. To evaluatethe effectiveness of our system, we propose two innovative evaluation strategies:Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP).MHP assesses translations from the perspective of monolingual readers of the tar-get language, while BLP uses advanced LLMs to compare translations directlywith the original texts. Empirical findings indicate that despite lower d-BLEUscores, translations from TRANSAGENTS are preferred by both human evalua-tors and LLMs over human-written references, particularly in genres requiringdomain-specific knowledge. We also highlight the strengths and limitations ofTRANSAGENTS through case studies and suggests directions for future research. | Monash University, University of Macau, Tencent AI Lab |
| 3 | 1 | ./images/(perhaps)_beyond_human_translation_20240520.png | Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu | In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates theentire process of treating illness. All patients, nurses, and doctors are autonomous agents powered bylarge language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illnesswithin the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum cansimulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keepaccumulating experience from both successful and unsuccessful cases. Simulation experiments show thatthe treatment performance of doctor agents consistently improves on various tasks. More interestingly,the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicarebenchmarks. After treating around ten thousand patients (real-world doctors may take over two years),the evolved doctor agent achieves a state-of-the-art accuracy of 9 | Tsinghua University |
| 4 | 2 | ./images/agent_hospital_a_simulacrum_20240505.png | AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation | Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang | AutoGen2 is an open-source framework that allows developers to build LLM ap-plications via multiple agents that can converse with each other to accomplishtasks. AutoGen agents are customizable, conversable, and can operate in vari-ous modes that employ combinations of LLMs, human inputs, and tools. UsingAutoGen, developers can also flexibly define agent interaction behaviors. Bothnatural language and computer code can be used to program flexible conversationpatterns for different applications. AutoGen serves as a generic framework forbuilding diverse applications of various complexities and LLM capacities. Em-pirical studies demonstrate the effectiveness of the framework in many exampleapplications, with domains ranging from mathematics, coding, question answer-ing, operations research, online decision-making, entertainment, etc. | Microsoft Research, Pennsylvania State University, University of Washington, Xidian University |
| 5 | 3 | ./images/autogen_enabling_next-gen_llm_20230816.png | Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation | Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang | Recent breakthroughs in large language models (LLMs) have brought remark-able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumptionis that the information processed by LLMs is consistently honest, neglecting thepervasive deceptive or misleading information in human society and AI-generatedcontent.This oversight makes LLMs susceptible to malicious manipulations,potentially resulting in detrimental outcomes. This study utilizes the intricateAvalon game as a testbed to explore LLMs’ potential in deceptive environments.Avalon, full of misinformation and requiring sophisticated logic, manifests as a“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking andperspective-taking in the Avalon game, we introduce a novel framework, Recur-sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteractdeceptive information. ReCon combines formulation and refinement contempla-tion processes; formulation contemplation produces initial thoughts and speech,while refinement contemplation further polishes them. Additionally, we incor-porate first-order and second-order perspective transitions into these processesrespectively. Specifically, the first-order allows an LLM agent to infer others’mental states, and the second-order involves understanding how others perceivethe agent’s mental state....... | Tsinghua University, BIGAI, Technical University of Munich |
| 6 | 4 | ./images/avalon's_game_of_thoughts_20231002.png | Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik | Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of covering the part with needed information, while window extension struggles with focusing on the pertinent information for solving the task. To mitigate these limitations, we propose Chain-of-Agents (CoA), a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning across various LLMs over long-context tasks. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context. We perform comprehensive evaluation of CoA on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs. | Penn State University, Google Cloud AI Research |
| 7 | 5 | ./images/chain_of_agents_large_20240604.png | ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation | Zejun Wang, Jia Li, Ge Li, Zhi Jin | Large language models have shown good performances in generat-ing code to meet human requirements. However, human require-ments expressed in natural languages can be vague, incomplete,and ambiguous, leading large language models to misunderstandhuman requirements and make mistakes. Worse, it is difficult for ahuman user to refine the requirement. To help human users refinetheir requirements and improve large language models’ code gen-eration performances, we propose ChatCoder: a method to refinethe requirements via chatting with large language models. We de-sign a chat scheme in which the large language models will guidethe human users to refine their expression of requirements to bemore precise, unambiguous, and complete than before. Experimentsshow that ChatCoder has improved existing large language models’performance by a large margin. Besides, ChatCoder has the advan-tage over refine-based methods and LLMs fine-tuned via humanresponse. | Peking University |
| 8 | 6 | ./images/chatcoder_chat-based_refine_requirement_20231101.png | ChatDev: Communicative Agents for Software Development | Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun | Software development is a complex task thatnecessitates cooperation among multiple mem-bers with diverse skills. Numerous studies useddeep learning to improve specific phases in awaterfall model, such as design, coding, andtesting.However, the deep learning modelin each phase requires unique designs, lead-ing to technical inconsistencies across variousphases, which results in a fragmented and in-effective development process. In this paper,we introduce ChatDev, a chat-powered soft-ware development framework in which special-ized agents driven by large language models(LLMs) are guided in what to communicate(via chat chain) and how to communicate (viacommunicative dehallucination). These agentsactively contribute to the design, coding, andtesting phases through unified language-basedcommunication, with solutions derived fromtheir multi-turn dialogues. We found their uti-lization of natural language is advantageousfor system design, and communicating in pro-gramming language proves helpful in debug-ging. This paradigm demonstrates how linguis-tic communication facilitates multi-agent col-laboration, establishing language as a unify-ing bridge for autonomous task-solving amongLLM agents. The code and data are availableat https://github.com/OpenBMB/ChatDev. | Tsinghua University, The University of Sydney, BUPT, Modelbest Inc. |
| 9 | 7 | ./images/chatdev_communicative_agents_for_20230716.png | ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu | Text evaluation has historically posed significant challenges, often demandingsubstantial labor and time cost. With the emergence of large language models(LLMs), researchers have explored LLMs’ potential as alternatives for humanevaluation. While these single-agent-based approaches show promise, experi-mental results suggest that further advancements are needed to bridge the gapbetween their current effectiveness and human-level evaluation quality. Recog-nizing that best practices of human evaluation processes often involve multiplehuman annotators collaborating in the evaluation, we resort to a multi-agent debateframework, moving beyond single-agent prompting strategies. The multi-agent-based approach enables a group of LLMs to synergize with an array of intelli-gent counterparts, harnessing their distinct capabilities and expertise to enhanceefficiency and effectiveness in handling intricate tasks. In this paper, we con-struct a multi-agent referee team called ChatEval to autonomously discuss andevaluate the quality of generated responses from different models on open-endedquestions and traditional natural language generation (NLG) tasks. We deriveinsights and lessons from practical scenarios where humans instigate group dis-cussions for brainstorming and propose different communication strategies withinChatEval...... | Tsinghua University, Hong Kong University of Science and Technology, Peking University |
| 10 | 8 | ./images/chateval_towards_better_llm-based_20230814.png | CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving | Pei Chen, Boran Han, Shuai Zhang | Large Language Models (LLMs) have showngreat ability in solving traditional natural lan-guage tasks and elementary reasoning taskswith appropriate prompting techniques. How-ever, their ability is still limited in solving com-plicated science problems. In this work, weaim to push the upper bound of the reason-ing capability of LLMs by proposing a col-laborative multi-agent, multi-reasoning-path(CoMM) prompting framework. Specifically,we prompt LLMs to play different roles in aproblem-solving team, and encourage differ-ent role-play agents to collaboratively solvethe target task. In particular, we discover thatapplying different reasoning paths for differ-ent roles is an effective strategy to implementfew-shot prompting approaches in the multi-agent scenarios. Empirical results demonstratethe effectiveness of the proposed methods ontwo college-level science problems over com-petitive baselines. Our further analysis showsthe necessity of prompting LLMs to play dif-ferent roles or experts independently. We re-lease the code at: https://github.com/amazon-science/comm-prompt. | Texas A&M University, Amazon Web Services |
| 11 | 9 | ./images/comm_collaborative_multi-agent,_multi-reasoning-path_20240426.png | Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang | We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "Describe, Explain, Plan and Select" (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing self-explanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the 𝙾𝚋𝚝𝚊𝚒𝚗𝙳𝚒𝚊𝚖𝚘𝚗𝚍 grand challenge with our approach. | Peking University, University of California Los Angeles, Beijing Institute for General Artificial Intelligence |
| 12 | 10 | ./images/describe,_explain,_plan_and_20230203.png | Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization | Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang | Large language model (LLM) agents have been shown effective on a wide rangeof tasks, and by ensembling multiple LLM agents, their performances could befurther improved. Existing approaches employ a fixed set of agents to interactwith each other in a static architecture, which limits their generalizability to vari-ous tasks and requires strong human prior in designing these agents. In this work,we propose to construct a strategic team of agents communicating in a dynamicinteraction architecture based on the task query. Specifically, we build a frame-work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora-tion on complicated tasks like reasoning and code generation. DyLAN enablesagents to interact for multiple rounds in a dynamic architecture with inference-time agent selection and an early-stopping mechanism to improve performanceand efficiency. We further design an automatic agent team optimization algorithmbased on an unsupervised metric termed Agent Importance Score, enabling theselection of best agents based on the contribution each agent makes. Empirically,we demonstrate that DyLAN performs well in both reasoning and code generationtasks with reasonable computational cost. DyLAN achieves 1 | Tsinghua University, Georgia Tech, Stanford University |
| 13 | 11 | ./images/dynamic_llm-agent_network_an_20231003.png | EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities | Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao | The advent of artificial intelligence has led to agrowing emphasis on data-driven modeling inmacroeconomics, with agent-based modeling(ABM) emerging as a prominent bottom-upsimulation paradigm. In ABM, agents (e.g.,households, firms) interact within a macroe-conomic environment, collectively generatingmarket dynamics. Existing agent modeling typ-ically employs predetermined rules or learning-based neural networks for decision-making.However, customizing each agent presents sig-nificant challenges, complicating the modelingof agent heterogeneity. Additionally, the in-fluence of multi-period market dynamics andmultifaceted macroeconomic factors are oftenoverlooked in decision-making processes. Inthis work, we introduce EconAgent, a largelanguage model-empowered agent with human-like characteristics for macroeconomic simu-lation. We first construct a simulation envi-ronment that incorporates various market dy-namics driven by agents’ decisions regardingwork and consumption. Through the perceptionmodule, we create heterogeneous agents withdistinct decision-making mechanisms.Fur-thermore, we model the impact of macroeco-nomic trends using a memory module, whichallows agents to reflect on past individual ex-periences and market dynamics. Simulationexperiments show that EconAgent can makerealistic decisions, leading to more reasonablemacroeconomic phenomena compared to exist-ing rule-based or learning-based agents. Ourcodes are released at https://github.com/tsinghua-fib-lab/ACL24-EconAgent. | Tsinghua University |
| 14 | 12 | ./images/econagent_large_language_model-empowered_20231016.png | Experiential Co-Learning of Software-Developing Agents | Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun | Recent advancements in large language mod-els (LLMs) have brought significant changesto various domains, especially through LLM-driven autonomous agents. A representativescenario is in software development, whereLLM agents demonstrate efficient collabora-tion, task division, and assurance of softwarequality, markedly reducing the need for man-ual involvement. However, these agents fre-quently perform a variety of tasks indepen-dently, without benefiting from past experi-ences, which leads to repeated mistakes andinefficient attempts in multi-step task execu-tion. To this end, we introduce Experiential Co-Learning, a novel LLM-agent learning frame-work in which instructor and assistant agentsgather shortcut-oriented experiences from theirhistorical trajectories and use these past expe-riences for future task execution. The exten-sive experiments demonstrate that the frame-work enables agents to tackle unseen software-developing tasks more effectively. We antici-pate that our insights will guide LLM agentstowards enhanced autonomy and contributeto their evolutionary growth in cooperativelearning. The code and data are available athttps://github.com/OpenBMB/ChatDev. | Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens |
| 15 | 13 | ./images/experiential_co-learning_of_software-developing_20231228.png | Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf | Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu | Communication games, which we refer to asincomplete information games that heavily de-pend on natural language communication, holdsignificant research value in fields such as eco-nomics, social science, and artificial intelli-gence. In this work, we explore the problem ofhow to engage large language models (LLMs)in communication games, and in response, pro-pose a tuning-free framework. Our approachkeeps LLMs frozen, and relies on the retrievaland reflection on past communications and ex-periences for improvement. An empirical studyon the representative and widely-studied com-munication game, “Werewolf”, demonstratesthat our framework can effectively play Were-wolf game without tuning the parameters of theLLMs. More importantly, strategic behaviorsbegin to emerge in our experiments, suggest-ing that it will be a fruitful journey to engageLLMs in communication games and associateddomains. | Tsinghua University, Zhongguancun Laboratory |
| 16 | 14 | ./images/exploring_large_language_models_20230909.png | Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting | Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan | The emergence of online recruitment services has revolutionizedthe traditional landscape of job seeking and recruitment, neces-sitating the development of high-quality industrial applicationsto improve person-job fitting. Existing methods generally rely onmodeling the latent semantics of resumes and job descriptions andlearning a matching function between them. Inspired by the pow-erful role-playing capabilities of Large Language Models (LLMs),we propose to introduce a mock interview process between LLM-played interviewers and candidates. The mock interview conver-sations can provide additional evidence for candidate evaluation,thereby augmenting traditional person-job fitting based solely onresumes and job descriptions. However, characterizing these tworoles in online recruitment still presents several challenges, suchas developing the skills to raise interview questions, formulatingappropriate answers, and evaluating two-sided fitness.To this end, we propose MockLLM, a novel applicable frameworkthat divides the person-job matching process into two modules:mock interview generation and two-sided evaluation in handshakeprotocol, jointly enhancing their performance through collaborativebehaviors between interviewers and candidates. We design a role-playing framework as a multi-role and multi-behavior paradigmto enable a single LLM agent to effectively behave with multiplefunctions for both parties...... | Renmin University of China, BOSS Zhipin, King Abdullah University of Science and Technology, University of Electronic Science and Technology of China |
| 17 | 15 | ./images/facilitating_multi-role_and_multi-behavior_20240528.png | GameGPT: Multi-agent Collaborative Framework for Game Development | Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang | The large language model (LLM) based agents have demonstrated their capacityto automate and expedite software development processes. In this paper, wefocus on game development and propose a multi-agent collaborative framework,dubbed GameGPT, to automate game development. While many studies havepinpointed hallucination as a primary roadblock for deploying LLMs in production,we identify another concern: redundancy. Our framework presents a series ofmethods to mitigate both concerns. These methods include dual collaboration andlayered approaches with several in-house lexicons, to mitigate the hallucinationand redundancy in the planning, task identification, and implementation phases.Furthermore, a decoupling approach is also introduced to achieve code generationwith better precision. | AutoGame Research, X-Institute, University of Southern California |
| 18 | 16 | ./images/gamegpt_multi-agent_collaborative_framework_20231012.png | Generative Agents: Interactive Simulacra of Human Behavior | Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein | Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior. | Stanford University, Google Research, Google DeepMind |
| 19 | 17 | ./images/generative_agents_interactive_simulacra_20230407.png | Improving Multi-Agent Debate with Sparse Communication Topology | Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie | Multi-agent debate has proven effective in im-proving large language models quality for rea-soning and factuality tasks. While various role-playing strategies in multi-agent debates havebeen explored, in terms of the communica-tion among agents, existing approaches adopta brute force algorithm – each agent can com-municate with all other agents. In this paper,we systematically investigate the effect of com-munication connectivity in multi-agent systems.Our experiments on GPT and Mistral models re-veal that multi-agent debates leveraging sparsecommunication topology can achieve compara-ble or superior performance while significantlyreducing computational costs. Furthermore, weextend the multi-agent debate framework tomultimodal reasoning and alignment labelingtasks, showcasing its broad applicability andeffectiveness. Our findings underscore the im-portance of communication connectivity on en-hancing the efficiency and effectiveness of the“society of minds” approach. | Google, Google DeepMind |
| 20 | 18 | ./images/improving_multi-agent_debate_with_20240617.png | Iterative Experience Refinement of Software-Developing Agents | Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun | Autonomous agents powered by large languagemodels (LLMs) show significant potential forachieving high autonomy in various scenar-ios such as software development. Recent re-search has shown that LLM agents can lever-age past experiences to reduce errors and en-hance efficiency. However, the static experi-ence paradigm, reliant on a fixed collection ofpast experiences acquired heuristically, lacksiterative refinement and thus hampers agents’adaptability. In this paper, we introduce the It-erative Experience Refinement framework, en-abling LLM agents to refine experiences itera-tively during task execution. We propose twofundamental patterns: the successive pattern,refining based on nearest experiences within atask batch, and the cumulative pattern, acquir-ing experiences across all previous task batches.Augmented with our heuristic experience elim-ination, the method prioritizes high-quality andfrequently-used experiences, effectively man-aging the experience space and enhancing effi-ciency. Extensive experiments show that whilethe successive pattern may yield superior re-sults, the cumulative pattern provides more sta-ble performance...... | Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens |
| 21 | 19 | ./images/iterative_experience_refinement_of_20240507.png | Language Agents as Optimizable Graphs | Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber | Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. | King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI |
| 22 | 20 | ./images/language_agents_as_optimizable_20240226.png | Large Language Models are Diverse Role-Players for Summarization Evaluation | Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang | . Text summarization has a wide range of applications in many scenarios.The evaluation of the quality of the generated text is a complex problem. A bigchallenge to language evaluation is that there is a clear divergence between existingmetrics and human evaluation. A document summary’s quality can be assessedby human annotators on various criteria, both objective ones like grammar andcorrectness, and subjective ones like informativeness, succinctness, and appeal.Most of the automatic evaluation methods like BLUE/ROUGE may be not ableto adequately capture the above dimensions. In this paper, we propose a newevaluation framework based on LLMs, which provides a comprehensive evaluationframework by comparing generated text and reference text from both objective andsubjective aspects. First, we propose to model objective and subjective dimensionsof generated text based on roleplayers prompting mechanism. Furthermore, weintroduce a context-based prompting mechanism that is able to generate dynamicroleplayer profiles based on input context. Finally, we design a multi-roleplayerprompting technology based on batch prompting and integrate multiple outputsinto the final evaluation results. Experimental results on three real datasets forsummarization show that our model is highly competitive and has a very highconsistency with human annotators. | Microsoft |
| 23 | 21 | ./images/large_language_models_are_20230327.png | Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | Qianqiao Xu, Zhiliang Tian, Hongyan Wu, Zhen Huang, Yiping Song, Feng Liu, Dongsheng Li | With the enhanced performance of large models on natural language processingtasks, potential moral and ethical issues of large models arise. There exist ma-licious attackers who induce large models to jailbreak and generate informationcontaining illegal, privacy-invasive information through techniques such as promptengineering. As a result, large models counter malicious attackers’ attacks usingtechniques such as safety alignment. However, the strong defense mechanismof the large model through rejection replies is easily identified by attackers andused to strengthen attackers’ capabilities. In this paper, we propose a multi-agentattacker-disguiser game approach to achieve a weak defense mechanism that allowsthe large model to both safely reply to the attacker and hide the defense intent. First,we construct a multi-agent framework to simulate attack and defense scenarios,playing different roles to be responsible for attack, disguise, safety evaluation,and disguise evaluation tasks. After that, we design attack and disguise gamealgorithms to optimize the game strategies of the attacker and the disguiser and usethe curriculum learning process to strengthen the capabilities of the agents. Theexperiments verify that the method in this paper is more effective in strengtheningthe model’s ability to disguise the defense intent compared with other methods.Moreover, our approach can adapt any black-box large model to assist the model indefense and does not suffer from model version iterations. | National University of Defense Technology, Guangdong University of Foreign Studies, |
| 24 | 22 | ./images/learn_to_disguise_avoid_20240403.png | Leveraging Large Language Models for Collective Decision-Making | Marios Papachristou, Longqi Yang, Chin-Chia Hsu | In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from conversations and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and the LLM-based system. The system refines and improves its proposed options over time, ensuring that many of the members' individual preferences are satisfied in an equitable way. Finally, we conduct a survey study involving human participants to assess our system's ability to aggregate preferences and reasoning about them. Our findings show that the system exhibits strong performance in both dimensions | Cornell University, Microsoft |
| 25 | 23 | ./images/leveraging_large_language_models_20231103.png | LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay | Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang | This paper explores the open research prob-lem of understanding the social behaviors ofLLM-based agents. Using Avalon as a testbed,we employ system prompts to guide LLMagents in gameplay. While previous studieshave touched on gameplay with LLM agents,research on their social behaviors is lacking.We propose a novel framework, tailored forAvalon, features a multi-agent system facil-itating efficient communication and interac-tion. We evaluate its performance based ongame success and analyze LLM agents’ so-cial behaviors. Results affirm the framework’seffectiveness in creating adaptive agents andsuggest LLM-based agents’ potential in nav-igating dynamic social interactions. By ex-amining collaboration and confrontation be-haviors, we offer insights into this field’s re-search and applications.Our code is pub-licly available at https://github.com/3DAgentWorld/LLM-Game-Agent | The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent |
| 26 | 24 | ./images/llm-based_agent_society_investigation_20231023.png | LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang | Large language models (LLMs) have demon-strated impressive performance in understand-ing language and executing complex reasoningtasks. However, LLMs with long context win-dows have been notorious for their expensivetraining costs and high inference latency. Eventhe most advanced models such as GPT-4 andClaude2 often make mistakes when processinginputs of over 100k tokens, a phenomenon alsoknown as lost in the middle. In this paper,we propose LONGAGENT, a method basedon multi-agent collaboration, which scalesLLMs (e.g., LLaMA) to a context of 128K anddemonstrates potential superiority in long-textprocessing compared to GPT- | Fudan University |
| 27 | 25 | ./images/longagent_scaling_language_models_20240218.png | MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents | Yuan Li, Yixuan Zhang, Lichao Sun | Significant advancements have occurred in the application of Large LanguageModels (LLMs) for various tasks and social simulations. Despite this, their capac-ities to coordinate within task-oriented social contexts are under-explored. Suchcapabilities are crucial if LLMs are to effectively mimic human-like social be-havior and produce meaningful results. To bridge this gap, we introduce collab-orative generative agents, endowing LLM-based Agents with consistent behaviorpatterns and task-solving abilities. We situate these agents in a simulated job fairenvironment as a case study to scrutinize their coordination skills. We proposea novel framework that equips collaborative generative agents with human-likereasoning abilities and specialized skills. Our evaluation demonstrates that theseagents show promising performance. However, we also uncover limitations thathinder their effectiveness in more complex coordination tasks. Our work providesvaluable insights into the role and evolution of LLMs in task-oriented social sim-ulations. | University of Cambridge, William & Mary, Lehigh University |
| 28 | 26 | ./images/metaagents_simulating_interactions_of_20231010.png | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber | Remarkable progress has been made on automated problem solving through so-cieties of agents based on large language models (LLMs). Existing LLM-basedmulti-agent systems can already solve simple dialogue tasks. Solutions to morecomplex tasks, however, are complicated through logic inconsistencies due tocascading hallucinations caused by naively chaining LLMs. Here we introduceMetaGPT, an innovative meta-programming framework incorporating efficienthuman workflows into LLM-based multi-agent collaborations.MetaGPT en-codes Standardized Operating Procedures (SOPs) into prompt sequences for morestreamlined workflows, thus allowing agents with human-like domain expertiseto verify intermediate results and reduce errors. MetaGPT utilizes an assemblyline paradigm to assign diverse roles to various agents, efficiently breaking downcomplex tasks into subtasks involving many agents working together. On col-laborative software engineering benchmarks, MetaGPT generates more coherentsolutions than previous chat-based multi-agent systems. Our project can be foundat https://github.com/geekan/MetaGPT | DeepWisdom, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong (Shenzhen), Nanjing University, University of Pennsylvania University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI |
| 29 | 27 | ./images/metagpt_meta_programming_for_20230801.png | Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun | Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents. | Lehigh University, Microsoft Research |
| 30 | 28 | ./images/mora_enabling_generalist_video_20240320.png | Multi-Agent Software Development through Cross-Team Collaboration | Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang | The latest breakthroughs in Large LanguageModels (LLMs), e.g., ChatDev, have catalyzedprofound transformations, particularly throughmulti-agent collaboration for software devel-opment. LLM agents can collaborate in teamslike humans, and follow the waterfall modelto sequentially work on requirements analysis,development, review, testing, and other phasesto perform autonomous software generation.However, for an agent team, each phase in asingle development process yields only one pos-sible outcome. This results in the completionof only one development chain, thereby losingthe opportunity to explore multiple potentialdecision paths within the solution space. Con-sequently, this may lead to obtaining subop-timal results. To address this challenge, weintroduce Cross-Team Collaboration (CTC),a scalable multi-team framework that enablesorchestrated teams to jointly propose variousdecisions and communicate with their insightsin a cross-team collaboration environment forsuperior content generation. Experimental re-sults in software development reveal a notableincrease in quality compared to state-of-the-art baselines, underscoring the efficacy of ourframework. The significant improvements instory generation demonstrate the promisinggeneralization ability of our framework acrossvarious domains. We anticipate that our workwill guide LLM agents towards a cross-teamparadigm and contribute to their significantgrowth in but not limited to software devel-opment. The code and data will be available athttps://github.com/OpenBMB/ChatDev. | Zhejiang University, Tsinghua University, Beijing University of Posts and Telecommunications |
| 31 | 29 | ./images/multi-agent_software_development_through_20240613.png | MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate | Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang | Large Language Models (LLMs) have shownexceptional results on current benchmarkswhen working individually. The advancementin their capabilities, along with a reduction inparameter size and inference times, has facil-itated the use of these models as agents, en-abling interactions among multiple models toexecute complex tasks. Such collaborationsoffer several advantages, including the use ofspecialized models (e.g. coding), improvedconfidence through multiple computations, andenhanced divergent thinking, leading to morediverse outputs. Thus, the collaborative use oflanguage models is expected to grow signifi-cantly in the coming years. In this work, weevaluate the behavior of a network of modelscollaborating through debate under the influ-ence of an adversary. We introduce pertinentmetrics to assess the adversary’s effectiveness,focusing on system accuracy and model agree-ment. Our findings highlight the importanceof a model’s persuasive ability in influencingothers. Additionally, we explore inference-timemethods to generate more compelling argu-ments and evaluate the potential of prompt-based mitigation as a defensive strategy. | UC Santa Barbara, Rutgers University |
| 32 | 30 | ./images/multiagent_collaboration_attack_investigating_20240620.png | ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal | Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance. | UNC Chapel Hill |
| 33 | 31 | ./images/reconcile_round-table_conference_improves_20230922.png | Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? | Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song | Recent progress in LLMs discussion suggeststhat multi-agent discussion improves the rea-soning abilities of LLMs. In this work, wereevaluate this claim through systematic experi-ments, where we propose a novel group discus-sion framework to enrich the set of discussionmechanisms. Interestingly, our results showthat a single-agent LLM with strong promptscan achieve almost the same performance asthe best existing discussion approach on a widerange of reasoning tasks and backbone LLMs.We observe that the multi-agent discussion per-forms better than a single agent only when thereis no demonstration in the prompt. Furtherstudy reveals the common interaction mecha-nisms of LLMs during the discussion. | Zhejiang University, HKUST, UIUC |
| 34 | 32 | ./images/rethinking_the_bounds_of_20240228.png | Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems? | Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan | — A flurry of recent work has demonstrated thatpre-trained large language models (LLMs) can be effectivetask planners for a variety of single-robot tasks. The planningperformance of LLMs is significantly improved via promptingtechniques, such as in-context learning or re-prompting withstate feedback, placing new importance on the token budgetfor the context window. An under-explored but natural nextdirection is to investigate LLMs as multi-robot task planners.However, long-horizon, heterogeneous multi-robot planningintroduces new challenges of coordination while also pushingup against the limits of context window length. It is thereforecritical to find token-efficient LLM planning frameworks thatare also able to reason about the complexities of multi-robotcoordination. In this work, we compare the task success rate andtoken efficiency of four multi-agent communication frameworks(centralized, decentralized, and two hybrid) as applied tofour coordination-dependent multi-agent 2D task scenarios forincreasing numbers of agents. We find that a hybrid frameworkachieves better task success rates across all four tasks andscales better to more agents. We further demonstrate the hybridframeworks in 3D simulations where the vision-to-text problemand dynamical errors are considered. | Massachusetts Institute of Technology, Harvard University, MIT-IBM Watson AI Lab. |
| 35 | 33 | ./images/scalable_multi-robot_collaboration_with_20230927.png | Scaling Large-Language-Model-based Multi-Agent Collaboration | Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun | Pioneering advancements in large languagemodel-powered agents have underscored thedesign pattern of multi-agent collaboration,demonstrating that collective intelligence cansurpass the capabilities of each individual. In-spired by the neural scaling law, which positsthat increasing neurons leads to emergent abil-ities, this study investigates whether a simi-lar principle applies to increasing agents inmulti-agent collaboration.Technically, wepropose ::multi-agent:collaboration::networks(MACNET), which utilize directed acyclicgraphs to organize agents and streamline theirinteractive reasoning via topological ordering,with solutions derived from their dialogues.Extensive experiments show that MACNETconsistently outperforms baseline models, en-abling effective agent collaboration across var-ious network topologies and supporting coop-eration among more than a thousand agents.Notably, we observed a small-world collabo-ration phenomenon, where topologies resem-bling small-world properties achieved supe-rior performance. Additionally, we identifieda collaborative scaling law, indicating thatnormalized solution quality follows a logisticgrowth pattern as scaling agents, with collabo-rative emergence occurring much earlier thanpreviously observed instances of neural emer-gence. The code and data will be available athttps://github.com/OpenBMB/ChatDev. | Tsinghua University, Beijing University of Posts and Telecommunications |
| 36 | 34 | ./images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png | Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | Yoichi Ishibashi, Yoshimasa Nishimura | Recent advancements in automatic code gener-ation using large language model (LLM) agenthave brought us closer to the future of auto-mated software development. However, exist-ing single-agent approaches face limitationsin generating and improving large-scale, com-plex codebases due to constraints in contextlength. To tackle this challenge, we proposeSelf-Organized multi-Agent framework (SoA),a novel multi-agent framework that enables thescalable and efficient generation and optimiza-tion of large-scale code. In SoA, self-organizedagents operate independently to generate andmodify code components while seamlessly col-laborating to construct the overall codebase. Akey feature of our framework is the automaticmultiplication of agents based on problem com-plexity, allowing for dynamic scalability. Thisenables the overall code volume to be increasedindefinitely according to the number of agents,while the amount of code managed by eachagent remains constant. We evaluate SoA onthe HumanEval benchmark and demonstratethat, compared to a single-agent system, eachagent in SoA handles significantly less code,yet the overall generated code is substantiallygreater. Moreover, SoA surpasses the powerfulsingle-agent baseline by 5%...... | TsukushiAI |
| 37 | 35 | ./images/self-organized_agents_a_llm_20240402.png | StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving | Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam | Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% → 38.8\%), commonsense reasoning (70.3\% → 72.5\%), algorithmic reasoning (73.7\% → 85.0\%), and symbolic reasoning (30.0\% → 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios. | The Chinese University of Hong Kong, Sun Yat-sen University, Tencent AI Lab |
| 38 | 36 | ./images/strategyllm_large_language_models_20231115.png | TraveLER: A Multi-LMM Agent Framework for Video Question-Answering | Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig | Recently, Large Multimodal Models (LMMs) have made significant progressin video question-answering using a frame-wise approach by leveraginglarge-scale, image-based pretraining in a zero-shot manner. While image-based methods for videos have shown impressive performance, a currentlimitation is that they often overlook how key timestamps are selected andcannot adjust when incorrect timestamps are identified. Moreover, they areunable to extract details relevant to the question, instead providing generaldescriptions of the frame. To overcome this, we design a multi-LMM agentframework that travels along the video, iteratively collecting relevant in-formation from keyframes through interactive question-asking until thereis sufficient information to answer the question. Specifically, we proposeTraveLER, a model that can create a plan to “Traverse” through the video,ask questions about individual frames to “Locate” and store key informa-tion, and then “Evaluate” if there is enough information to answer thequestion. Finally, if there is not enough information, our method is able to“Replan” based on its collected knowledge. Through extensive experiments,we find that the proposed TraveLER approach improves performance onseveral video question-answering benchmarks, such as NExT-QA, STAR,and Perception Test, without the need to fine-tune on specific datasets. | University of California, Berkeley |
| 39 | 37 | ./images/traveler_a_multi-lmm_agent_20240401.png | Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji | Human intelligence thrives on cognitive syn-ergy, where collaboration among differentminds yield superior outcomes compared to iso-lated individuals. In this work, we propose SoloPerformance Prompting (SPP), which trans-forms a single LLM into a cognitive synergistby engaging in multi-turn self-collaborationwith multiple personas.A cognitive syner-gist is an intelligent agent that collaborativelycombines multiple minds’ strengths and knowl-edge to enhance problem-solving in complextasks. By dynamically identifying and simu-lating different personas based on task inputs,SPP unleashes the potential of cognitive syn-ergy in LLMs. Our in-depth analysis showsthat assigning multiple fine-grained personasin LLMs improves problem-solving abilitiescompared to using a single or fixed numberof personas. We evaluate SPP on three chal-lenging tasks: Trivia Creative Writing, Code-names Collaborative, and Logic Grid Puzzle,encompassing both knowledge-intensive andreasoning-intensive types. Unlike previousworks, such as Chain-of-Thought, that solelyenhance the reasoning abilities in LLMs, ex-perimental results demonstrate that SPP effec-tively reduces factual hallucination, and main-tains strong reasoning capabilities. Addition-ally, comparative experiments show that cog-nitive synergy only emerges in GPT-4 anddoes not appear in less capable models, suchas GPT- | University of Illinois Urbana-Champaign, Microsoft Research Asia |
| 40 | 38 | ./images/unleashing_the_emergent_cognitive_20230711.png | User Behavior Simulation with Large Language Model based Agents | Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen | Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications. | Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London |
| 41 | 39 | ./images/user_behavior_simulation_with_20230605.png | War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang | Can we avoid wars at the crossroads of history? This question has been pursued byindividuals, scholars, policymakers, and organizations throughout human history.In this research, we attempt to answer the question based on the recent advancesof Artificial Intelligence (AI) and Large Language Models (LLMs). We proposeWarAgent, an LLM-powered multi-agent AI system, to simulate the participatingcountries, their decisions, and the consequences, in historical international conflicts,including the World War I (WWI), the World War II (WWII), and the WarringStates Period (WSP) in Ancient China. By evaluating the simulation effectiveness,we examine the advancements and limitations of cutting-edge AI systems’ abilitiesin studying complex collective human behaviors such as international conflictsunder diverse settings. In these simulations, the emergent interactions amongagents also offer a novel perspective for examining the triggers and conditions thatlead to war. Our findings offer data-driven and AI-augmented insights that canredefine how we approach conflict resolution and peacekeeping strategies. Theimplications stretch beyond historical analysis, offering a blueprint for using AI tounderstand human history and possibly prevent future international conflicts. Codeand data are available at https://github.com/agiresearch/WarAgent. | Rutgers University |
| 42 | 40 | ./images/war_and_peace_(waragent)_20231128.png | To be Continued... | Your Contributions are Welcome! |