AI Agents · AI Agent Frameworks

OpenHands Microagents And Delegation

OpenHands microagents deep dive: architecture, delegation workflow, trigger mechanisms, and implementation details.

Runtime, Memory, And Microagentsadvanced1.4 hrReading
ai-agentopenhandsmicroagentsmulti-agentdelegation

0x00 Overview

Many agent systems employ a multi-agent architecture, dividing the system into different sub-modules/sub-agents, each with its own specific function, while a central scheduling agent manages the entire lifecycle. This modular architecture allows complex tasks to be broken down and assigned to the modules best suited for each sub-task, leveraging the strengths of each model and avoiding the weaknesses of a single model in certain tasks.

In OpenHands, Microagents are essentially a set of tailored instruction modules. Their core function is to inject more focused capabilities into OpenHands tools—whether it’s specialized knowledge in a specific field or standardized processes for a particular task, they can all be implemented through them. For developers, these small modules are like dedicated assistants: when encountering specific scenarios such as Git operations or code reviews, there’s no need to manually follow the steps; the micro-agents provide ready-made professional guidance. Repetitive tasks can be directly automated by them, and they can also ensure that the operational logic remains consistent across different projects, saving a lot of time.

Microagents allows us to “plug in” domain knowledge to the Agent without modifying the Agent’s core code or Prompt. The Memory component automatically loads the corresponding file content at the start of a task or when specific keywords are detected in a conversation, then provides it as context information to the LLM. This enables the Agent to:

  • Quickly adapt to specific projects: By loading project-specific information in repo.md, the Agent can understand the project’s architecture, coding standards, and testing methods.
  • Leverage domain knowledge: For example, when a user mentions “Python”, a guide on Python best practices can be automatically injected from knowledge.md.

picture

picture

0x01 Requirements

1.1 Current Issues

Since a single agent can handle many complex tasks, why bother with multi-agent collaboration? The core reason is simple: once a task exceeds a certain level of complexity, a single agent easily becomes “overwhelmed”.

Just as a person trying to handle project planning, data research, mathematical calculations, and copywriting simultaneously is prone to neglecting some tasks and being overwhelmed by massive amounts of information, an intelligent agent faces the same challenge. With too many tools to choose from and too much context to remember, its “thinking space” is simply insufficient, leading to a significant decline in decision-making efficiency and accuracy. Furthermore, some tasks span multiple professional fields; expecting an intelligent agent to be proficient in everything is like expecting a doctor to be an engineer simultaneously—clearly unrealistic.

When a single, large, “all-purpose” agent handles multiple tasks, its context window and toolset become abnormally bloated, leading to “context pollution” and ultimately affecting the reliability and efficiency of the system.

LangChain’s benchmark study shows that when the number of distractor domains increases from 0 to 8, the performance of the single-agent architecture drops drastically from 0.67 to 0.34, a decrease of 50%.

1.2 Multi-agent systems

Multi-agent systems can precisely address these pain points. Their core logic is “division of labor and collaboration”: when faced with complex tasks, large tasks are first broken down into smaller modules. For example, “completing a market analysis report” is broken down into sub-tasks such as “data collection,” “statistical calculation,” “report writing,” and “compliance review.” Then, each sub-task is assigned a specialized “expert” agent—some specialize in data scraping, some excel in mathematical modeling, and some are adept at text polishing.

These agents don’t need to do everything; they only need to excel in their respective areas. They can also communicate with each other at any time: the data crawling agent, after obtaining the materials, will simultaneously share them with the statistical calculation agent; once the calculation results are obtained, they will be sent to the copywriting agent. If problems arise during the process, they can coordinate and adjust accordingly. This model not only allows each agent to leverage its expertise but may also generate unexpected “collaboration dividends”—like a highly efficient team, the collective accomplishment far exceeds the sum of the individual members’ abilities.

From a practical development perspective, multi-agent systems are also more practical: each agent is an independent module, so there is no need to consider the whole system during development, and testing and maintenance are simpler; if an area needs to be upgraded, the corresponding agent can be replaced directly without affecting the entire system; moreover, the rules for how agents communicate and transmit information can be set in advance, which is much more controllable than a single agent randomly calling tools.

1.3 Multi-Agent

The core difference between multi-agent systems and single-agent workflows lies in breaking away from the “sequential relay” task execution model and achieving efficient parallel collaboration among agents. Some researchers also believe that sub-agents are an architectural pattern within the multi-agent framework, rather than a general paradigm opposed to multi-agent systems.

Multi-Agent systems are essentially a group of relatively independent intelligent agents collaborating. They may have their own goals, states, and contexts, and coordinate through communication protocols. Like a real engineering team, with product managers, architects, engineers, and testers, each has a clear division of labor, responsible for their own part, and collaborates through meetings, documents, and tools.

Sub-agents, in essence, represent a division of labor within a centralized architecture. A main agent controls the overall picture and delegates tasks to several specialized sub-agents. These sub-agents are more like tools, stateless, and only handle their assigned sub-tasks. It’s similar to a project manager having several specialized executors. The project manager understands the overall goals and context, distributes tasks to different people as needed, and then integrates the results.

The key difference lies in control and context.

In the multi-agent model, control is distributed, each agent has a certain degree of autonomy, and the context is isolated. In the sub-agent model, control is centralized, the main agent has the final say, and the context is shared.

1.4 Connection Mechanism

The core idea of multi-agent systems is to let specialists do what they are good at. We create multiple agents with independent prompts and tools, and then connect them through some mechanism. There are two basic connection mechanisms: Handoffs (routing) and Orchestrator-Workers (command/distribution).

Handoffs refer to the transfer of execution context and control from one agent to another. Handoffs require two basic elements:

  • Destination: The next intelligent agent.
  • State: Information passed to the next agent.

In fact, tool invocation is also a connection mechanism, such as an agent (e.g., a supervisor) invoking another agent as a tool. While handover is more suitable for autonomous collaboration scenarios, tool invocation provides clearer hierarchical control and interface constraints.

1.5 Sub-agent

The core design of the Sub-agent architecture is to introduce an “Orchestrator Agent” as the global management core. Its core responsibility is to first deeply understand the overall task objectives, and then, through a reasonable task decomposition strategy, break down the complex task into multiple independently executable sub-tasks, which are then delegated to multiple “sub-agents” working in parallel.

The implementation logic of this system is as follows:

  • From the perspective of the coordinating agent, the interaction pattern of calling a sub-agent is completely consistent with that of calling a regular tool. The coordinator issues explicit instructions to the sub-agent in the form of prompts through a tool calling mechanism.
  • After receiving instructions, the sub-agent autonomously completes the assigned tasks in an independent execution environment without interacting with other sub-agents, and finally only feeds back the results to the coordinator.

The essence of the efficient implementation of the sub-agent architecture lies in the successful practice of context engineering. Its core idea is to precisely control the “timing and content of information supply”—creating a focused and isolated execution environment for each sub-agent to ensure that it receives the most suitable information and tool support when processing its corresponding sub-task. This design not only significantly improves the overall system’s task processing performance but also reduces the cost of achieving complex goals through responsibility decomposition and environment isolation, becoming a highly efficient architectural solution for handling large-scale, multi-dimensional complex tasks.

1.6 microagents

A Google Cloud article describes the concept of “Agent as Tool.”

Agent as Tool acts like an expert advisor. When the main agent invokes it, it provides explicit input and receives explicit output, much like calling an API. This expert advisor has its own logic, but the main agent doesn’t need to know the details.

Sub-Agent (delegated sub-agent) is more like a project manager’s clone. It works in the global context of the main agent, handles complex multi-step processes, and can access the main agent’s dialogue history and state.

Based on Google Cloud’s core definitions of the two agent models and the functional characteristics of OpenHands’ microagents, microagents are essentially Agent as Tool sub-agents. The following analysis will examine the core differences between the two and the specific performance of microagents:

  1. From the perspective of core definitions and control logic

    • Google Cloud clarifies that Agent as Tool is a pre-packaged expert for specific tasks. When the main agent calls it, it only needs to pass clear input and get direct output, similar to a transactional API, without needing to care about its internal logic. Sub-Agent, on the other hand, is a delegated role that needs to handle complex, multi-step processes autonomously. It has a hierarchical collaborative relationship with the main agent and has a certain degree of autonomy in decision-making and process management.
    • OpenHands’ microagents, whether knowledge agents, task agents, or repository agents, all execute fixed functions in response to specific triggering conditions. For example, knowledge agents are triggered by keywords such as “docker” and “container,” providing standardized support for corresponding domains; task agents receive parameters according to preset interactive templates and perform operations such as generating PR descriptions. They do not autonomously plan complex task flows; they are entirely triggered and called by the system or main process, which aligns with the “passive response, execution of specific functions” control logic of Agent as Tool.
  2. From the perspective of context and state characteristics

    • Agent as Tool features context isolation and statelessness, running in its own independent session. It cannot access the caller’s dialogue history and state, and information for each interaction is transmitted via a single request. Sub-Agent, on the other hand, can share the main agent’s context and exist within the same session, making it suitable for stateful processes that require multiple steps.
    • Microagents are independently packaged modules, with different agents isolated from each other. For example, the repository agent only loads the specific specifications of its own project, and the knowledge agent focuses on the output of knowledge in a single domain. Their operation does not depend on the historical state of other agents. Each trigger executes a task and returns a result based on the current input information. There is no situation where they share context with the main process or other agents to advance multi-step tasks, which conforms to the stateless and context-isolated characteristics of Agent as Tool.
  3. From the perspective of reusability and coupling

    • One major advantage of Agent as Tool is its high reusability; it can be repeatedly invoked in different agents or systems with low coupling to the caller. In contrast, Sub-Agent is tightly coupled with the main agent, is part of a specific process, has weaker reusability, and is mostly adapted to its respective hierarchical collaboration system.
    • Microagents are designed with high reusability in mind. Agents in the public microagent library can be used across different projects, while agents in the private repository, though exclusive to a team, are used repeatedly in fixed scenarios within a project. Furthermore, they can be easily integrated into the system without deep hierarchical binding with the main workflow. This high reusability and low coupling characteristic contrasts with the strong coupling of Sub-Agents, instead aligning with the core features of Agent as Tool.

1.7 Principle

1.7.1 Collaboration

Multi-agent systems do more than just break down tasks; they introduce a completely new dimension of optimization: collaboration.

Researchers have asked a profound question: Why do two agents often work better together than a single super agent? The answer lies in a new probability term—the probability of cooperation: P(C_L | a_L).

In a multi-agent system, Agent A (such as a product manager) performs action a_L, then doesn’t just produce a result; it creates a context C_L and passes this C_L to Agent B (such as a programmer).

This may sound abstract, but understand it this way: Collaboration and negotiation are essentially about searching for the optimal communication context.

  • Single agent: can only work on its own and must solve a under a given S.
  • Multi-agent: Agent A’s task becomes “finding the best way to say it (C_L)” so this maximizes the probability of Agent B’s success.

Researchers point out that this ability to “dynamically adjust the context through dialogue” actually involves dynamically fine-tuning the system’s parameters at runtime without retraining the model. This is the powerful mathematical foundation of multi-agent systems, adding a huge, optimizable parameter space.

1.7.2 token

In his blog, Anthropic points out that the effectiveness of multi-agent systems stems primarily from their use of sufficient tokens to solve problems. In his analysis, three factors explained 95% of the performance differences in the BrowseComp evaluation (which tests a browsing agent’s ability to locate information where it is difficult to find). The study found that token usage alone explained 80% of the differences, with tool calls and model selection being the other two contributing factors.

1.7.3 Cost

While multi-agent collaboration sounds appealing, researchers have soberly poured cold water on it: collaboration comes at a cost.

Each new Agent added by a user, and each interaction, will bring about:

  • Latency: The time taken for network requests and generation.
  • Computing power consumption (Tokens): The cost in real money.
  • Complexity: The more complex a system is, the more prone it is to errors.

0x02 Basic Overview

2.1 Base Class Definition

Microagents is a modular knowledge injection mechanism in OpenHands. They are usually Markdown files containing knowledge, guidelines, or code snippets for a specific domain, repository, or task.

From a system architecture perspective, microagents are essentially lightweight “specialized executors”—they are not responsible for the overall planning and coordination of tasks, but focus on a specific type of work, such as handling single responsibilities like code formatting and data validation. Unlike the “commander-in-chief” role of the main agent, they are more like “specialized teams” on standby, not consuming too many system resources under normal circumstances. When needed by the main agent, they are either directly summoned or take over the subdivided tasks assigned by the main agent, making them flexible and efficient.

These specialized executors are not isolated “solo soldiers.” The system has long been designed with a unified integration logic, the core of which is get_microagents_from_selected_repo this core method. It’s very simple to use: users or teams can create a separate folder in their code repository specifically for storing micro-agents. Whether they are self-developed or adapted specialized tools, they can all be managed here uniformly. When the system sets this repository as the current working repository, it will automatically scan this dedicated folder and load all the micro-agents inside at once, essentially building a “specialized tool reserve” for the system. Later, when the main agent handles complex tasks, such as encountering stages requiring dedicated log analysis or interface debugging, it can directly retrieve the corresponding micro-agents from this reserve to collaboratively complete the work.

class BaseMicroagent(BaseModel):
    """Base class for all microagents."""

    name: str
    content: str
    metadata: MicroagentMetadata
    source: str  # path to the file
    type: MicroagentType

    PATH_TO_THIRD_PARTY_MICROAGENT_NAME: ClassVar[dict[str, str]] = {
        '.cursorrules': 'cursorrules',
        'agents.md': 'agents',
        'agent.md': 'agents',
    }

2.2 Types of Micro-Agents

Most micro-agents use Markdown files with a YAML header. For repository agents (repo.md), the header is optional - if not provided, the file will be loaded as a repository agent using the default settings.

KnowledgeMicroagent and RepoMicroagent are both subclasses of BaseMicroagent, but they have different uses and activation mechanisms. These two micro-agent types together constitute the flexible and powerful knowledge management mechanism in the OpenHands system, allowing for both on-demand access to specialized knowledge and continuously available repository-specific knowledge.

2.2.1 KnowledgeMicroagent

Knowledge agents provide specialized skills triggered by keywords in the conversation. They help with:

  • Language best practices
  • Framework guide
  • Common patterns
  • Tool usage

Basic features:

  • Type: MicroagentType.KNOWLEDGE or MicroagentType.TASK
  • Activation method: Keyword trigger; activated when a message contains a specific trigger word.

Main functions:

  • Provide professional knowledge and specific skills guidance.
  • For language best practices, framework guidelines, common patterns and tool usage.
  • Match trigger words in messages using the match_trigger method.

Activation mechanism:

  • An array of triggers needs to be defined in the frontmatter.
  • It will only be activated when the user inputs text that contains the trigger word.

Applicable scenarios:

  • User guide for specific technology stacks.
  • Best practices for frameworks or tools.
  • Specialized knowledge in a specific field.
  • Task-based micro-agents that require user input.
class KnowledgeMicroagent(BaseMicroagent):
    """Knowledge micro-agents provide specialized expertise that's triggered by keywords in conversations.

    They help with:
    - Language best practices
    - Framework guidelines
    - Common patterns
    - Tool usage
    """

    def __init__(self, **data):
        super().__init__(**data)
        if self.type not in [MicroagentType.KNOWLEDGE, MicroagentType.TASK]:
            raise ValueError('KnowledgeMicroagent must have type KNOWLEDGE or TASK')

    def match_trigger(self, message: str) -> str | None:
        """Match a trigger in the message.

        It returns the first trigger that matches the message.
        """
        message = message.lower()
        for trigger in self.triggers:
            if trigger.lower() in message:
                return trigger

        return None

    @property
    def triggers(self) -> list[str]:
        return self.metadata.triggers

A knowledge-based agent example can be seen in OpenHands’ GitHub micro-agent.

2.2.2 RepoMicroagent

Repository agents provide repository-specific knowledge and guidelines. These include:

  • Loading from .openhands/microagents/repo.md
  • Specific to individual repositories
  • Automatically activated for that repository
  • Ideal for team practices and project routines

Basic features:

  • Type is MicroagentType.REPO_KNOWLEDGE
  • Activation method: Always active, associated with a specific repository.

Main functions:

  • Provide repository-specific knowledge and guidelines.
  • Includes private, repository-specific instructions.
  • Automatically load and associate with the current repository.

Activation mechanism:

  • Automatic activation, no trigger word required.
  • Always available when processing related repository tasks.
  • Usually from .openhands/microagents/repo.md files.

Applicable scenarios:

  • Repository-specific development specifications.
  • Team practices and agreements.
  • Project-specific workflow.
  • Custom documentation traffic generation.
  • General repository guide.
class RepoMicroagent(BaseMicroagent):
    """Microagent specialized for repository-specific knowledge and guidelines.

    RepoMicroagents are loaded from `.openhands/microagents/repo.md` files within repositories
    and contain private, repository-specific instructions that are automatically loaded when
    working with that repository. They are ideal for:
        - Repository-specific guidelines
        - Team practices and conventions
        - Project-specific workflows
        - Custom documentation references
    """

    def __init__(self, **data):
        super().__init__(**data)
        if self.type != MicroagentType.REPO_KNOWLEDGE:
            raise ValueError(
                f'RepoMicroagent initialized with incorrect type: {self.type}'
            )

An example of a repository agent can be seen in the OpenHands repository itself.

2.2.3 Comparison

The following is a comprehensive comparison of the two types of micro-agents, covering core differences such as activation, functionality, and applicable scenarios, to facilitate quick differentiation and selection:

Comparison DimensionKnowledgeMicroagentRepoMicroagent (Repository-type Microagent)
Core typeGeneral knowledge/skill carrierSpecific repository-specific knowledge carrier
Activation methodKeyword trigger (must match preset trigger words)Automatic activation (always effective after being associated with the repository)
Scope of applicationUniversal across repositories (can be used in all scenarios)Bound to a specific repository (only available in the current repository)
Triggering conditionUsers need to input content containing the trigger wordNo additional steps required; linked repository takes effect immediately
Frequency of useOn demand (user-initiated)Continuously available (automatically invoked when processing repository tasks)
Configuration requirementMust define a triggers listNo triggers configuration required
Content sourceGeneral technical documents / public knowledge base.openhands/microagents/repo.md documents in the repository
Typical contentTechnology stack guide, framework usage methods, domain expertiseRepository development guidelines, team collaboration agreements, project workflows
Applicable scenarioGeneral technical Q&A and tool/framework guidanceRepository constraints, team practice alignment, project process guidance

2.2.4 TaskMicroagent

TaskMicroagent is a subclass of KnowledgeMicroagent and has special task-oriented characteristics, requiring user input to execute.

Core features:

  • User input required
    • Variable extraction: extract_variables extracts variables from content (format: ${variable_name}).
    • Input detection: requires_user_input checks whether the content contains variables.
    • Input definition: inputs retrieves predefined input metadata via attributes.
  • Special triggering method
    • Imperative triggering: triggered by /agent_name format, such as /test_task.
    • Automatic trigger word addition: If no trigger word is defined in frontmatter, the system will automatically add a / prefix trigger word.

Applicable scenarios:

  • Tasks requiring parameters from the user.
  • Interactive operations requiring user input.
  • Templated tasks that execute by filling variables.

2.3 Sources of Micro-Agents

OpenHands loads micro-agents from two sources.

2.3.1 Shared Micro-Agent (Public)

This directory (OpenHands/microagents) contains shareable micro-agents that are available to all OpenHands users:

  • Maintained by the OpenHands repository.
  • Ideal for reusing knowledge and common workflows.

Directory structure:

OpenHands/microagents/
├── # 关键词触发的专业技能
│   ├── git.md         # Git 操作
│   ├── testing.md     # 测试实践
│   └── docker.md      # Docker 指南
├── # 这些微型代理总是加载
    ├── pr_review.md   # PR 审查流程
    ├── bug_fix.md     # Bug 修复工作流程
    └── feature.md     # 功能实现

2.3.2 Warehouse Commands (Private)

Each repository can have its own instructions in .openhands/microagents/repo.md. These instructions are:

  • Privately owned by this repository.
  • Automatically loaded when using this repository.
  • Ideal for repository-specific guidelines and team practices.

Example repository structure:

your-repository/
├── .openhands/
    ├── microagents/
        ├── repo.md    # 仓库特定的指令
        ├── ...        # 仅在此仓库内可用的私有微型代理

When OpenHands collaborates with a repository, it will:

  • If it exists, load repository-specific instructions from .openhands/microagents/repo.md.
  • Load relevant knowledge agents based on keywords in the dialogue.

2.4 Upgraded to Skill

Note: In the latest OpenHands code, MicroAgent has been upgraded to Skills. We will conduct related learning and analysis of Skills in other series.

https://github.com/OpenHands/extensions

https://docs.openhands.dev/overview/skills

https://docs.openhands.dev/sdk/arch/skill

KnowledgeMicroagent

KnowledgeMicroagent is the legacy name for what is now called a Knowledge Skill (keyword-triggered skill).

Knowledge Skills are keyword-triggered skills that activate when specific keywords are detected in user messages. They use a KeywordTrigger with regex patterns to match against user input, and when matched, inject domain-specific knowledge into the agent’s context.

RepoMicroagent

RepoMicroagent is the legacy term for what is now called a Repository Skill (or “General Skill” / “Permanent Context”). These are always-active, repository-specific guidelines that are automatically loaded into the agent’s context at conversation start.

The recommended approach is to create an AGENTS.md file at your repository root. This file contains project purpose, setup instructions, repo structure, and development guidelines. It has no trigger — it’s always injected into the system prompt.

You can also use model-specific variants like GEMINI.md or CLAUDE.md. Legacy paths (.openhands/microagents/) are still supported but deprecated in favor of .agents/skills/.

0x03 Implementation

3.1 Processing Flow

CodeActAgent.response_to_actions converts tool calls into AgentDelegateAction

The AgentController receives the AgentDelegateAction request and calls start_delegate to create a new agent controller to handle the delegated task.

3.2 Triggering Conditions

AgentDelegateAction will be generated under the following conditions:

  • The LLM decides to delegate the task to another specialized agent, such as by calling a utility function named delegate_to_browsing_agent.

This utility function requires the following parameters:

  • agent: The name of the agent to be entrusted.
  • task: Detailed description of the task being delegated.
  • Optional inputs: Additional input parameters passed to the delegate agent.
            # ================================================
            # AgentDelegateAction (Delegation to another agent)
            # ================================================
            elif tool_call.function.name == 'delegate_to_browsing_agent':
                action = AgentDelegateAction(
                    agent='BrowsingAgent',
                    inputs=arguments,
                )

3.3 AgentDelegateAction

The AgentDelegateAction is generated by the LLM when it decides to delegate a task by calling the corresponding utility function. It is then processed by the response_to_actions method and added to the queue of pending actions. Finally, it is returned for execution in the step method.

@dataclass
class AgentDelegateAction(Action):
    agent: str
    inputs: dict
    thought: str = ''
    action: str = ActionType.DELEGATE

    @property
    def message(self) -> str:
        return f"I'm asking {self.agent} for help with this task."

3.4 Features

3.4.1 Delegation Mechanism

AgentController achieves full lifecycle control of the microAgent (sub-agent) through a “delegation mechanism,” and the core process is as follows:

  1. Trigger startup: The main agent generates an AgentDelegateAction action (including sub-agent name and task parameters), and the main controller start_delegate() initializes the sub-agent controller (AgentController instance, with is_delegate=True) through the method.
  2. Resource and state isolation: The sub-agent controller inherits resources such as event stream and file storage from the main controller, but has its own independent state State, including independent iteration count, budget limit and task context, to avoid mutual interference with the main agent.
  3. Event forwarding and independent execution: During operation of the sub-agent, the main controller forwards all events (such as user messages and tool feedback) to the sub-controller for processing, and the sub-agent executes tasks independently (without intervention from the main agent).
  4. Status monitoring and termination: The main controller checks the status of sub-agents in real time. When a sub-agent reaches FINISHED, REJECTED, or encounters an error (ERROR), the main controller terminates it through end_delegate(), reclaims resources, and receives its execution result.
  5. Result integration: After the sub-agent terminates, the main controller encapsulates its output as an AgentDelegateObservation event and sends it back to the main agent. The main agent then continues to execute subsequent tasks based on this result.

3.4.2 Functional Overview

AgentController is the core control component of the agent in the OpenHands framework. It is responsible for managing the agent’s lifecycle (startup, running, termination), event handling (actions/observations), state maintenance, resource scheduling, and delegation and collaboration of sub-agents. It is the core hub for realizing “master-sub-agent collaboration” and “task splitting and execution” in multi-agent systems.

  1. Master-Sub Agent Interaction Mechanism:

    • The main agent uses AgentDelegateAction as the “call interface” and delegates subtasks to the micro agent, analogous to the concise model of tool calling.
    • Each sub-agent has an independent controller instance. During operation, all events are forwarded and all states are isolated. After execution, results are returned through AgentDelegateObservation to achieve a closed loop of “delegation-execution-callback”.
  2. Event handling mechanism:

    • Event triage: Based on the presence of active sub-agents, determine whether an event is forwarded to a sub-controller or handled by the main controller itself.
    • Type adaptation: Distinguish between Action (action) and Observation (observation) events, and call the corresponding processing methods respectively, supporting AgentDelegateAction multiple types of events such as user messages and tool feedback.
    • Step triggering: should_step() Determine whether to trigger the next step of Agent execution through the method (such as automatic triggering when user messages or sub-agent result callbacks are received).
  3. Full lifecycle state management:

    • It supports multiple state transitions such as RUNNING, AWAITING_USER_INPUT, AWAITING_USER_CONFIRMATION, and FINISHED and automatically synchronizes to the event stream and persists it when the state changes.
    • The status of sub-agents is monitored in real time, and they are automatically terminated and resources are reclaimed when abnormalities occur, ensuring system stability.
  4. Robust design:

    • Anti-freezing mechanism: Built-in StuckDetector detection of Agent looping and freezing, triggering exception handling.
    • Budget and iteration limits: Control the maximum number of iterations and task budget through iteration_flag and budget_flag to avoid resource exhaustion.
    • Fault tolerance: Provides degradation strategies (such as historical truncation and retry) for scenarios such as LLM errors (e.g., context window overflow, API timeout) and sub-agent execution errors.
  5. Multi-agent collaborative support:

    • Sub-agents inherit the resources of the main agent (event stream, file storage, security analyzer), but their states are independent, and multi-level delegation is supported (sub-agents can further delegate to other micro-agents).
    • The main controller centrally aggregates the execution metrics (cost, token consumption) of all intelligent agents for easy global monitoring.

3.4.3 Hierarchical Cooperation Model

The collaboration between Agent and Microagent follows a hierarchical model.

Hierarchical structure:

  • The agent is the primary decision-maker and is responsible for overall execution.
  • Microagents are specialized assistants designed to handle specific subtasks.
  • Agent can invoke microagent, but microagent cannot directly invoke agent.

Specific implementation:

  • Microagent is loaded into memory and exposed as tools for Agent.
  • In openhands/server/session/agent_session.py, microagents are loaded via get_microagents_from_selected_repo and added through memory.load_user_workspace_microagents.
  • Agent invokes these microagents as needed.

Two types of microagents:

  • Repo agents: repository-specific task handlers.
  • Knowledge agents: domain-specific knowledge providers.

Workflow:

  • Agent decides whether to invoke a specific microagent.
  • Microagent completes delegated task and returns output.
  • Agent continues subsequent operations based on output.

Loading mechanism:

  • Microagents can be loaded from .openhands/microagents in selected repository.
  • Or from organization/user-level repositories (e.g., github.com/acme-co/.openhands/microagents).

Therefore, this is a hierarchical cooperation model, where Agent acts as main controller and Microagent as specialized assistant.

3.4.4 Flowchart

Master-Sub Agent Interaction and Control Flowchart

14-1

14-1

AgentController Core Event Handling Flowchart

14-2

14-2

3.4.5 Fault Tolerance and Isolation Mechanisms

Independent Controller Instance:

When parent Agent creates a delegate, it creates an AgentController instance for the child Agent, ensuring independent state management and event flow.

        # Create the delegate with is_delegate=True so it does NOT subscribe directly
        self.delegate = AgentController(
            sid=self.id + '-delegate',
            file_store=self.file_store,
            user_id=self.user_id,
            agent=delegate_agent,
            event_stream=self.event_stream,
            conversation_stats=self.conversation_stats,
            iteration_delta=self._initial_max_iterations,
            budget_per_task_delta=self._initial_max_budget_per_task,
            agent_to_llm_config=self.agent_to_llm_config,
            agent_configs=self.agent_configs,
            initial_state=state,
            is_delegate=True,
            headless_mode=self.headless_mode,
            security_analyzer=self.security_analyzer,
        )

Independent state management:

Each AgentController has its own State instance, and changes in child Agent state do not affect parent Agent.

Event stream isolation:

Although parent and child share the same EventStream, they are distinguished by different IDs and event sources.

Exception handling:

The system handles exceptions through _step_with_exception_handling, and child-agent exceptions do not interrupt parent-agent operation.

Error state propagation:

When a child agent encounters an error, it sends error messages to the parent through AgentDelegateObservation.

State recovery mechanism:

The _react_to_exception function allows the system to recover from error state.

3.5 Code

The specific workflow is as follows.

3.5.1 Calling the Large Model

class CodeActAgent(Agent):
    def step(self, state: State) -> 'Action':
        """Performs one step using the CodeAct Agent."""
        initial_user_message = self._get_initial_user_message(state.history)
        messages = self._get_messages(condensed_history, initial_user_message)
        params: dict = {
            'messages': messages,
        }
        params['tools'] = check_tools(self.tools, self.llm.config)
        params['extra_body'] = {
            'metadata': state.to_llm_metadata(
                model_name=self.llm.config.model, agent_name=self.name
            )
        }
        response = self.llm.completion(**params)
        actions = self.response_to_actions(response) # 在这里处理返回值
        for action in actions:
            self.pending_actions.append(action)
        return self.pending_actions.popleft()

3.5.2 Analysis

If the model decides delegate_to_browsing_agent should be called, an AgentDelegateAction is generated.

def response_to_actions(
    response: ModelResponse, mcp_tool_names: list[str] | None = None
) -> list[Action]:
            # ================================================
            # AgentDelegateAction (Delegation to another agent)
            # ================================================
            elif tool_call.function.name == 'delegate_to_browsing_agent':
                action = AgentDelegateAction(
                    agent='BrowsingAgent',
                    inputs=arguments,
                )

3.5.3 Execution

The AgentController handles AgentDelegateAction and executes the microAgent.

    async def start_delegate(self, action: AgentDelegateAction) -> None:
        """启动委托智能体以处理子任务。

        OpenHands 是多智能体系统:
        - 「任务(task)」:系统与用户之间的完整对话,始于用户初始输入(通常是任务描述),
          终于智能体发起的完成动作、用户停止操作或错误触发。
        - 「子任务(subtask)」:智能体与用户或其他智能体之间的对话。
          若单个智能体即可完成任务,则任务与子任务合一;否则任务由多个子任务组成,每个子任务由独立智能体处理。

        参数:
            action (AgentDelegateAction):包含待启动委托智能体信息的动作对象
        """
        # 根据动作中指定的智能体名称,获取对应的智能体类
        agent_cls: Type[Agent] = Agent.get_cls(action.agent)
        # 获取智能体配置:优先使用动作指定的配置,未指定则复用当前智能体的配置
        agent_config = self.agent_configs.get(action.agent, self.agent.config)
        # 创建委托智能体实例(确保父子智能体共享LLM注册信息)
        # 注:父子智能体共享指标,实现全局指标累积
        delegate_agent = agent_cls(
            config=agent_config, llm_registry=self.agent.llm_registry
        )

        # 启动委托智能体前,创建初始状态(继承父智能体关键配置)
        state = State(
            session_id=self.id.removesuffix('-delegate'),  # 会话ID:移除父智能体的委托后缀
            user_id=self.user_id,  # 继承用户ID,保持用户关联
            inputs=action.inputs or {},  # 子任务输入参数(默认为空字典)
            iteration_flag=self.state.iteration_flag,  # 继承迭代控制标志(限制迭代次数)
            budget_flag=self.state.budget_flag,  # 继承预算控制标志(限制资源使用)
            delegate_level=self.state.delegate_level + 1,  # 委托层级+1(标识子智能体层级)
            metrics=self.state.metrics,  # 共享全局指标(父子智能体指标统一累积)
            start_id=self.event_stream.get_latest_event_id() + 1,  # 事件起始ID:从最新事件后开始记录
            parent_metrics_snapshot=self.state_tracker.get_metrics_snapshot(),  # 父智能体指标快照(用于后续对比)
            parent_iteration=self.state.iteration_flag.current_value,  # 父智能体当前迭代次数
        )
        # 输出调试日志:记录委托智能体启动信息
        self.log(
            'debug',
            f'start delegate, creating agent {delegate_agent.name}',
        )

        # 创建委托智能体的控制器(核心:标记is_delegate=True,避免直接订阅事件流)
        self.delegate = AgentController(
            sid=self.id + '-delegate',  # 会话ID:在父ID后添加委托后缀,唯一标识
            file_store=self.file_store,  # 继承文件存储对象(用于状态持久化)
            user_id=self.user_id,  # 继承用户ID
            agent=delegate_agent,  # 待管理的委托智能体实例
            event_stream=self.event_stream,  # 共享事件流(父子智能体事件互通)
            conversation_stats=self.conversation_stats,  # 继承对话统计信息
            iteration_delta=self._initial_max_iterations,  # 迭代次数增量(子任务的最大迭代限制)
            budget_per_task_delta=self._initial_max_budget_per_task,  # 单任务预算增量(子任务的资源限制)
            agent_to_llm_config=self.agent_to_llm_config,  # 继承LLM配置映射
            agent_configs=self.agent_configs,  # 继承智能体配置字典
            initial_state=state,  # 初始状态(继承父智能体配置后的状态)
            is_delegate=True,  # 标记为委托智能体(关键:避免重复订阅事件流)
            headless_mode=self.headless_mode,  # 继承无头模式(无交互界面)配置
            security_analyzer=self.security_analyzer,  # 继承安全分析器(用于安全校验)
        )

3.5.4 Micro-agent Memory Hint Template

openhands/microagent/prompts/generate_remember_prompt.j2 is a Jinja2 template file whose main function is to generate prompts for updating specific reference files. This special file stores important information and learning outcomes for performing specific tasks and can be expanded over time to incorporate new knowledge and experience.

Core functions:

  • Event analysis: Analyze the new subset of events provided to it.
  • Update decision: Determine whether specific reference documents need to be updated.
  • Hint generation: Generate prompts to guide another AI to perform these updates correctly and efficiently.

Processing flow:

  • Receive event data via {{events}}.
  • Analyze event content and determine which parts of file should be updated.
  • Generate structured update instructions.

Guiding principles:

Content requirements:

  • Clearly specify: which parts need update or new sections.
  • Provide context: why updates are required.
  • Specific information: exactly what should be added/modified.

Formatting requirements:

  • Maintain structure: preserve existing structure and format.
  • Maintain consistency: ensure no contradictions with existing content.

Technical implementation:

Template structure:

<update_prompt></update_prompt>

Data stream:

  • Input: new event data through events variable.
  • Processing: template logic analyzes events and generates prompts.
  • Output: generated hints within <update_prompt> tag.

Application scenarios:

  • Memory maintenance: continuously update important knowledge files during task execution.
  • Automated updates: enable AI to maintain its own knowledge base with less manual effort.

Relationship with other components:

  • Integration with micro-agent system: supports dynamic knowledge updates.
  • Works with KnowledgeMicroagent and RepoMicroagent.
  • Connects to event system: uses EventStream and works with AgentController for state/history maintenance.

This template is a key component for continuous learning and knowledge management in OpenHands, allowing AI systems to learn from interactions and persist that knowledge.

3.5.5 The get_prompt function

The get_prompt function is a FastAPI route handler located in /openhands/server/routes/manage_conversations.py, at path /conversations/{conversation_id}/remember-prompt. Its primary function is to generate a prompt template based on a specific event for updating a reference file. This is one of the key components for continuous learning and knowledge management in OpenHands.

Parameter processing:

  • conversation_id: Conversation ID verified via dependency injection.
  • event_id: Query parameter specifying the event ID used to retrieve context.
  • Other dependencies: user settings storage, conversation metadata, etc.

Core processing steps:

Get Event Storage:

event_store = EventStore(
    sid=conversation_id, file_store=file_store, user_id=metadata.user_id
)

Create an event storage instance to access the event history of a specific conversation.

Extract contextual events:

Call _get_contextual_events(event_store, event_id):

  • Obtain 4 events before and 4 events after the target event (about 9 total).
  • Filter meaningless event types (NullAction, NullObservation, etc.).
  • Return formatted event string.

Generate prompt template:

  • Load LLM config from user settings.
  • Use generate_prompt_template to build template from event content.
  • The template uses generate_remember_prompt.j2.

Generate final prompt:

  • Use generate_prompt (LLM call).
  • Extract content between <br> tags from response.

Relationship with other system components:

Template system:

  • Uses generate_remember_prompt.j2 (in /openhands/microagent/prompts/).
  • Designed specifically for generating prompts to update reference files.

Event system:

  • Integrates with EventStore and event filtering via EventFilter.
  • Works with conversation_manager for LLM completion requests.

Application scenarios:

  • Memory maintenance: generate prompts for AI memory file updates.
  • Knowledge accumulation: build/update knowledge from specific event sequences.
  • Automated learning: allow AI to learn from interactions and update references.

Return value:

Returns JSON containing:

  • status: request status (success/failure)
  • prompt: generated prompt content used to update reference files

0xFF Reference

Large models are always “forgetful”? Dify’s memory engineering architecture practices: enabling AI to truly remember what it needs to remember.

Multi-Agent Systems Take Off! A Comprehensive Guide to the Core Architecture of Multi-Agent Systems and the LangGraph Framework.

Year-End Summary: In-Depth Research Report on the Evolution of AI Products and Architectures in 2025.

AI Agents, Chapter 16 Collaboration Modes: Handover and Command.

In-depth analysis of Claude’s agent architecture: the three-dimensional collaboration of MCP + PTC, Skills, and Subagents.

Performance, Cost, Controllability: A Detailed Engineering Guide from Sub-Agents to Multi-Agents.