CoMLRL supports both homogeneous and heterogeneous models.
Users can assign agent_model/critic_model with HuggingFace model identifiers for homogeneous setups, or provide agents/critics lists for heterogeneous setups.
Homogeneous Agents#
The easiest way to start the journey of CoMLRL is to load num_agents homogeneous agents with a single model identifier.
Users can set agent_model.name to a single model identifier while keeping agents: null.
For example, to load 3 Qwen/Qwen2.5-1.5B agents:
trainer = MAGRPOTrainer(
agent_model="Qwen/Qwen2.5-1.5B",
agents=None,
num_agents=3,
)Heterogeneous Agents#
Although homogeneous LLM agents can be specified into different roles by prompting, using heterogeneous LLMs with different skills can further unleash the potential of multi-agent collaboration.
Users can load a list of heterogeneous agents in agents, where the length of the list should match num_agents. Each entry should specify a model identifier and optional tokenizer/model kwargs.
When agents is provided, agent_model should be set to null or ignored; if both are provided, they must match (same names, correct length) or training will raise an error.
For example, to load a Qwen/Qwen2.5-Coder-3B and a Qwen/Qwen2.5-Coder-7B:
trainer = MAGRPOTrainer(
agent_model=None,
agents=["Qwen/Qwen2.5-Coder-3B", "Qwen/Qwen2.5-Coder-7B"],
num_agents=2,
)Loading LLM Critics#
The loading of critics depends on the algorithm and the use_separate_critic setting.
In Multi-Agent Actor-Critic (MAAC), a single separated centralized critic is used, so one model should be provided in critic_model or critics without any constraints on model types.
For example, to load a Qwen/Qwen2.5-Coder-3B and a Qwen/Qwen2.5-Coder-1.5B with a centralized Qwen/Qwen2.5-Coder-7B critic:
trainer = MAACTrainer(
agent_model=None,
agents=["Qwen/Qwen2.5-Coder-3B", "Qwen/Qwen2.5-Coder-1.5B"],
critic_model="Qwen/Qwen2.5-Coder-7B",
critics=None,
num_agents=2,
)In Independent Actor-Critic (IAC), each agent can have its own critic. When use_separate_critic=true, users should provide critic_model.name or critics with length num_agents to load separate critics for each agent.
When use_separate_critic=false, each agent shares its LLM agent backbone with its critic, and the critic is loaded at the same time as the agent. In this case, critic_model and critics should not be provided and set to null or None.
Similarly to agents, if both critic_model and critics are provided, they must match (same names, correct length) or training will raise an error.
For example, to load a Qwen/Qwen2.5-Coder-3B and a Qwen/Qwen2.5-Coder-7B with separate critics of the same models:
trainer = IACTrainer(
agent_model=None,
agents=["Qwen/Qwen2.5-Coder-3B", "Qwen/Qwen2.5-Coder-7B"],
critic_model=None,
critics=["Qwen/Qwen2.5-Coder-3B", "Qwen/Qwen2.5-Coder-7B"],
num_agents=2,
)Or actor can share the same model with its critic:
trainer = IACTrainer(
agent_model=None,
agents=["Qwen/Qwen2.5-Coder-3B", "Qwen/Qwen2.5-Coder-7B"],
critic_model=None,
critics=None,
num_agents=2,
)Internally, trainers always work with
agents/criticslists.agent_modelandcritic_modelare convenience shortcuts for homogeneous settings; if both are provided, they must be consistent.
Tokenizers are loaded per agent by default. If your models use incompatible vocabularies, training may fail (e.g., in shared-critic settings). Prefer models from the same family or ensure tokenizer compatibility.