Environments that simulate real-world tasks for training and evaluating LLM collaboration:
Writing: Multiple LLM agents collaborate on processing articles.
- TLDR - Summarizing Reddit posts.
- ArXiv - Expanding abstracts into introductions.
Coding: Generate code solutions for programming problems.
- MBPP - Mostly basic python problems.
- HumanEval - Handwritten evaluation problems.
- CoopHumanEval - HumanEval with cooperative nature.
Minecraft: Multi-agent building environments.
- StrBuild - Structured builds from text.
- HouseBuild - Coordinated house construction.