Environments that simulate real-world tasks for training and evaluating LLM collaboration:

  • Writing Collaboration: Multiple LLM agents collaborate on processing articles.

    • TLDR - Summarizing Reddit posts.
    • ArXiv - Expanding abstracts into introductions.
  • Code Generation: Generate code solutions for programming problems.

    • MBPP - Mostly basic python problems.
    • HumanEval - Handwritten evaluation problems
    • CoopHumanEval - HumanEval with cooperative nature.
  • Code Completion: Complete code snippets based on given contexts.

    • ClassEval - Complete class-level code based on method stubs and docstrings.