Interactive Analysis of Reinforcement Learning in Multi-Agent Systems

This paper presents an analysis of multi-agent reinforcement learning within controlled environments, exploring the impact of different state space complexities and learning policies. Along with the paper, a user interface is provided, making the experiments replicable and interactive.

While the paper gives a much more comprehensive explanation of the technical aspects and results of the work, I will provide a general overview here. This is a simulation of agents in a 2D environment, attempting to learn the routes between a pickup and drop-off point.

Below is a preview of what a 2D gridworld might look like. D/P are pickup and drop-off points, respectively.

2D Gridworld Preview

Example of a more complex gridworld

These pickup/drop-off points can be moved randomly, they can have maximum storage capacities, they can only accept packages from certain agents, and much more can be done to test highly specific scenarios.

Simulation Configuration

Below is a screenshot from the "simulation configuration" section of the software.

Simulation Configuration Interface

Once the simulation has been configured, you can move step-by-step to see how the agents act. The agent's estimated value of a given square is shown on the right of the screen. You can see their entire q-table in the "charts and data" tab, as well.

Simulation View with Agent Actions and Values

Performance Metrics

As the simulation runs, certain variables should be improving, such as the total number of steps used by an agent to complete all payload deliveries. If these values are not improving, you probably need to play with the settings or environment to see what variable is restricting the learning process.

Total number of steps per episode, note the downward trend.

Here, "blockages" is when an agent wants to go to a given square, but that square is occupied by another agent.

Agent Behavior Visualization

Agents moving around in simulated world, with minimal training. Note the behavior of A2 when it encounters the empty pickup point.

By contrast, the movement of agents with more training

To learn more, read the paper or get the project code on GitHub.