Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset
Shankar Kumar Jeyakumar · Alaa Ahmad · Adrian Gabriel
Keywords: [ Workflow Automation ] [ LLM ] [ Agentic Systems ] [ Dataset ] [ AI Agents ] [ Evaluation ] [ Autonomous Agents ]
The rapid advancements in Large Language Models (LLMs) and their enhanced reasoning capabilities are opening new avenues for dynamic, context-aware task decomposition, and automated tool selection. These developments lay the groundwork for sophisticated autonomous agentic systems powered by LLMs, which hold significant potential for process automation across various industries. These systems demonstrate remarkable abilities in performing complex tasks, interacting with external systems to augment LLMs' knowledge, and executing actions autonomously. To address the challenges and harness the opportunities presented by these advances, this paper makes three key contributions.- We propose an advanced agentic framework designed to autonomously process multi-hop user queries by dynamically generating and executing task graphs, selecting appropriate tools, and adapting to real-time changes in task requirements or tool availability.- We introduce novel evaluation metrics tailored for assessing agentic frameworks across diverse domains and tasks, namely Node F1 Score, Structural Similarity Index, and Tool F1 Score.- We develop a specialized dataset based on the AsyncHow dataset to enable in-depth analysis of agentic behavior across varying task complexities.Our findings demonstrate that asynchronous and dynamic task graph decomposition significantly improves system responsiveness and scalability, particularly in handling complex, multi-step tasks. Through detailed analysis, we observe that structural and node-level metrics are more critical in sequential tasks, whereas tool-related metrics dominate in parallel tasks. In particular, the Structural Similarity Index (SSI) emerged as the most significant predictor of performance in sequential tasks, while Tool F1 Score proved essential in parallel tasks. These findings highlight the need for balanced evaluation methods that capture both structural and operational aspects of agentic systems. Our specialized dataset enables comprehensive evaluation of these behaviors, providing valuable insights into improving overall system performance, with the importance of both structural and tool-related metrics validated through empirical analysis and statistical testing.The evaluation of agentic systems presents unique challenges due to the intricate relationships between task execution, tool usage, and goal achievement. Our evaluation framework, validated through empirical analysis, offers valuable insights for improving the adaptability and reliability of agentic systems in dynamic environments.