Environments that simulate real-world tasks for training and evaluating LLM collaboration:

  • Writing: Multiple LLM agents collaborate on processing articles.

    • TLDR - Summarizing Reddit posts.
    • ArXiv - Expanding abstracts into introductions.
  • Coding: Generate code solutions for programming problems.

    • MBPP - Mostly basic python problems.
    • HumanEval - Handwritten evaluation problems.
    • CoopHumanEval - HumanEval with cooperative nature.
  • Minecraft: Multi-agent building environments.

    • StrBuild - Structured builds from text.
    • HouseBuild - Coordinated house construction.