Neural Arena - RL Learning Playground

🎮 What is Neural Arena?

Neural Arena is a browser-based reinforcement learning playground. Each arena runs for 50 steps. Train a mouse to find cheese using different RL algorithms, compare their performance, and compete on the leaderboard!

🧠 How Reinforcement Learning Works

The agent (mouse) learns through trial and error:

Observe - The agent sees its current position on the grid
Act - It chooses an action (up, down, left, right)
Reward - Gets +10 for finding cheese, -0.1 per step (encourages efficiency)
Learn - Updates its strategy based on the reward received

Over many episodes, the agent learns which actions lead to higher rewards!

📊 Algorithm Tiers

Tier 1: Foundations

Q-Learning - Classic off-policy algorithm. Learns Q(s,a) values representing expected future reward for each state-action pair.

SARSA - On-policy variant that learns from actual actions taken, making it more conservative.

Tier 2: Deep RL

PPO - Policy gradient method with clipped objectives. Industry standard for complex tasks.

SAC - Maximum entropy RL that balances exploration and exploitation automatically.

Tier 3: Cutting Edge (2025)

EQO - Exploration via Quasi-Optimism. Uses UCB-style bonuses for principled exploration.

ICM - Intrinsic Curiosity Module. Generates internal rewards for visiting novel states.

RND - Random Network Distillation. Measures novelty via prediction error on random features.

Tier 4: Experimental

TabPFN-RL - Transformer-based Q-learning using attention over past experiences (in-context learning).

ADEPT - Adaptive multi-armed bandit that selects between multiple learning strategies dynamically.

⚙️ Hyperparameters Explained

Learning Rate (α) - How quickly the agent updates its knowledge. Higher = faster learning but less stable.
Discount Factor (γ) - How much future rewards matter. 0.99 = long-term thinking, 0.5 = short-term focus.
Epsilon (ε) - Exploration rate. Higher = more random exploration, lower = more exploitation of learned knowledge.

🏆 Scoring & Leaderboard

After training completes (50 episodes), your score is automatically submitted:

Success Rate - Percentage of episodes where the mouse found the cheese
Best Steps - Fewest steps taken to reach the cheese in any episode

Compete with others to find the best algorithm and hyperparameter combinations!

💡 Tips for Success

Start with Q-Learning to understand the basics
Try higher epsilon (0.3-0.5) early, lower (0.1) once patterns emerge
Compare algorithms side-by-side using multiple arenas
Curiosity-based methods (ICM, RND) excel in sparse reward environments

⚡ NEURAL ARENA

🏆 Leaderboard

🏆 Leaderboard

🏆 Leaderboard

How It Works

🎮 What is Neural Arena?

🧠 How Reinforcement Learning Works

📊 Algorithm Tiers

Tier 1: Foundations

Tier 2: Deep RL

Tier 3: Cutting Edge (2025)

Tier 4: Experimental

⚙️ Hyperparameters Explained

🏆 Scoring & Leaderboard

💡 Tips for Success