🎮 What is Neural Arena?
Neural Arena is a browser-based reinforcement learning playground. Each arena runs for 50 steps. Train a mouse to find cheese using different RL algorithms, compare their performance, and compete on the leaderboard!
🧠 How Reinforcement Learning Works
The agent (mouse) learns through trial and error:
- Observe - The agent sees its current position on the grid
- Act - It chooses an action (up, down, left, right)
- Reward - Gets +10 for finding cheese, -0.1 per step (encourages efficiency)
- Learn - Updates its strategy based on the reward received
Over many episodes, the agent learns which actions lead to higher rewards!
📊 Algorithm Tiers
Tier 1: Foundations
Q-Learning - Classic off-policy algorithm. Learns Q(s,a) values representing expected future reward for each state-action pair.
SARSA - On-policy variant that learns from actual actions taken, making it more conservative.
Tier 2: Deep RL
PPO - Policy gradient method with clipped objectives. Industry standard for complex tasks.
SAC - Maximum entropy RL that balances exploration and exploitation automatically.
Tier 3: Cutting Edge (2025)
EQO - Exploration via Quasi-Optimism. Uses UCB-style bonuses for principled exploration.
ICM - Intrinsic Curiosity Module. Generates internal rewards for visiting novel states.
RND - Random Network Distillation. Measures novelty via prediction error on random features.
Tier 4: Experimental
TabPFN-RL - Transformer-based Q-learning using attention over past experiences (in-context learning).
ADEPT - Adaptive multi-armed bandit that selects between multiple learning strategies dynamically.
⚙️ Hyperparameters Explained
- Learning Rate (α) - How quickly the agent updates its knowledge. Higher = faster learning but less stable.
- Discount Factor (γ) - How much future rewards matter. 0.99 = long-term thinking, 0.5 = short-term focus.
- Epsilon (ε) - Exploration rate. Higher = more random exploration, lower = more exploitation of learned knowledge.
🏆 Scoring & Leaderboard
After training completes (50 episodes), your score is automatically submitted:
- Success Rate - Percentage of episodes where the mouse found the cheese
- Best Steps - Fewest steps taken to reach the cheese in any episode
Compete with others to find the best algorithm and hyperparameter combinations!
💡 Tips for Success
- Start with Q-Learning to understand the basics
- Try higher epsilon (0.3-0.5) early, lower (0.1) once patterns emerge
- Compare algorithms side-by-side using multiple arenas
- Curiosity-based methods (ICM, RND) excel in sparse reward environments