Version 1.3.1#
- Allow batch training in MAGRPOTrainer, IACTrainer and MAACTrainer
- Allow multi-turn training in IACTrainer and MAACTrainer
- Change the x-axis from data_step to env_step
Version 1.3.0#
Use TD error as critic update target in IACTrainer and MAACTrainer.
Version 1.2.9#
Add MAACTrainer (separated centralized critic), now both IACTrainer and MAACTrainer can support single-turn training.
Version 1.2.8#
The critic in IACTrainer now estimate V rather than Q.
Version 1.2.7#
Change the IPPOTrainer to be IACTrainer.
Version 1.2.6#
The first release of CoMLRL:
- Including MAGRPO, MAREINFORCE, MARLOO, MAREMAX, and IPPO trainers for multi-agent reinforcement learning with LLMs.
- Support for multi-turn training with custom external feedback mechanisms.
- LLM collaboration environments for various tasks.
- Comprehensive documentation and examples for getting started.