Changelog | CoMLRL

Version 1.3.6#

Fixed critical bug of loading heterogeneous models and reform the model loading logics
Polish the docs

Version 1.3.5#

Add unit tests for hyperparameter constraints.
Clean legacy interfaces.

Version 1.3.4#

Fix the bug of loading heterogeneous models and reform the loading logics.
Enable MBGD in MAGRPO to align with MAAC and IAC.
Remove redundant and legacy hyperparameters (e.g., model kwargs, patching hyperparameters).
Clean multi-device legacy, like drop last and num_workers.
Add unit tests for model loading and separate it from CI as a badge.
Clean short functions.
Reorganize the docs and align the parameters.

Version 1.3.3#

Compact MAREINFORCETrainer derivation, and move to the new folder.
Unify the interface for different trainers.
Remove redundant patches and wrappers.
Reorganize the variables in the config yamls.

Version 1.3.2#

Fix wandb logging issue in MAGRPOTrainer

Version 1.3.1#

Allow batch training in MAGRPOTrainer, IACTrainer and MAACTrainer
Allow multi-turn training in IACTrainer and MAACTrainer
Change the x-axis from data_step to env_step

Version 1.3.0#

Use TD error as critic update target in IACTrainer and MAACTrainer.

Version 1.2.9#

Add MAACTrainer (separated centralized critic), now both IACTrainer and MAACTrainer can support single-turn training.

Version 1.2.8#

The critic in IACTrainer now estimate V rather than Q.

Version 1.2.7#

Change the IPPOTrainer to be IACTrainer.

Version 1.2.6#

The first release of CoMLRL:

Including MAGRPO, MAREINFORCE, MARLOO, MAREMAX, and IPPO trainers for multi-agent reinforcement learning with LLMs.
Support for multi-turn training with custom external feedback mechanisms.
LLM collaboration environments for various tasks.
Comprehensive documentation and examples for getting started.