Abstract
We evolved train driver brains with structured regional architecture (24 pre-allocated hidden neurons across 3 functional regions) and runtime Hebbian plasticity, comparing against Rail-003b's flat topology (7 evolved hidden neurons, no structure, no plasticity). The structured brain achieved lower peak fitness (87.99 vs 99.69) but exhibited qualitatively different behavior: 15 sensors influenced throttle control (vs 4), dedicated reflex nodes produced binary signal responses, and the fatigue management region developed connections to AWS acknowledgment. The flat brain's superior score came from a degenerate speed-governor strategy that avoids signal encounters entirely - a mathematical shortcut unavailable to the structured brain whose pre-wired regions forced it to actively process signal information. These findings suggest that architectural constraints channel evolution toward naturalistic behavior at the cost of raw optimality, and that fitness score alone is an inadequate measure of behavioral realism.
1. Introduction
1.1 Background
Experiments Rail-001 through Rail-003b evolved train driver brains using flat topologies - all hidden neurons structurally identical, no predefined grouping, no runtime learning. The best result (Rail-003b, fitness 99.69) produced a brain with 7 hidden neurons and 36 connections that discovered emergent human factors phenomena: complacency countermeasures, dead man's switch patterns, and dual-pathway signal processing.
However, Rail-003b's strategy was fundamentally un-human. Its primary safety mechanism was a proportional speed governor (current_speed -> brake, weight +1.29) that kept speed permanently below the SPAD threshold. A real driver cannot operate this way - they must drive at operational speed and actively manage signal compliance through perception, judgment, and timely braking.
1.2 What Changed from Rail-003b to Rail-004
This experiment introduces two new Quale v0.2 features:
Regions - Named clusters of hidden neurons with distinct structural properties:
| Region | Nodes | Density | Activation | Purpose |
|---|---|---|---|---|
| reflex | 6 | 0.7 | Step (binary) | Fast signal responses |
| situational_awareness | 12 | 0.3 | Sigmoid (graded) | Pattern recognition |
| fatigue_management | 6 | 0.4 | Sigmoid (graded) | Driver state tracking |
Total: 24 pre-allocated hidden neurons with ~130 initial intra-region connections. The NEAT mutation engine respects region boundaries: new nodes inherit their region's activation function, and 80% of new connections are intra-region.
Plasticity - Runtime weight adaptation during scenarios:
| Mechanism | Parameters | Effect |
|---|---|---|
| Hebbian learning | rate: 0.005, max_weight: 2.0 | Co-active connections strengthen during the scenario |
| Synaptic decay | rate: 0.0005, min_weight: 0.0 | Inactive connections weaken toward zero |
| Homeostatic regulation | target_activity: 0.3, adjustment_rate: 0.003 | Per-region gain adjustment to prevent saturation or silence |
Plasticity changes persist across scenarios within a genome evaluation but reset between genomes. This creates within-lifetime learning without Lamarckian inheritance.
1.3 Hypothesis
A structured brain with dedicated functional regions and runtime plasticity will produce more naturalistic driving behavior than a flat topology, even if raw fitness is lower. The architectural constraints will channel evolution toward strategies that actively process signal information rather than degenerate speed-limiting shortcuts.
2. Materials and Methods
2.1 Experimental Configuration
| Parameter | Rail-003b | Rail-004 | Change |
|---|---|---|---|
| Population | 300 | 300 | Same |
| Generations | 2000 (converged 349) | 500 (converged 227) | Same max, different convergence |
| Sensors | 18 | 18 | Same |
| Actuators | 5 | 5 | Same |
| Initial hidden nodes | 0 | 24 (3 regions) | New |
| Initial connections | ~10 (sparse input->output) | ~130 (intra-region + sparse IO) | New |
| Plasticity | None | Hebbian + decay + homeostatic | New |
| Region-aware mutations | No | Yes (80% intra-region preference) | New |
| Signal system | QLD 5-aspect | QLD 5-aspect | Same |
| Attention gating | Yes | Yes | Same |
2.2 Region Design Rationale
Region sizes and properties were chosen to model a simplified human driver cognitive architecture:
- Reflex (6 nodes, Step activation): Analogous to brainstem reflexes. Binary fire/don't-fire responses. High density (0.7) for fast internal processing. Intended for immediate signal reactions.
- Situational awareness (12 nodes, Sigmoid activation): Analogous to parietal cortex spatial processing. Graded responses for nuanced assessment. Lower density (0.3) for selective, pattern-based computation. Largest region because situational assessment is the most complex task.
- Fatigue management (6 nodes, Sigmoid activation): Analogous to hypothalamic fatigue monitoring. Medium density (0.4). Intended for tracking driver state over time.
2.3 Plasticity Parameters
Conservative learning rates chosen to avoid catastrophic weight instability:
- Hebbian rate 0.005 (half of whitepaper default 0.01) - gradual strengthening
- Decay rate 0.0005 (half of whitepaper default 0.001) - slow forgetting
- Homeostatic target 0.3 (30% of region nodes active) - moderate activity level
3. Results
3.1 Fitness Progression
| Generation | Best Fitness | Avg Fitness | Species | Topology | Survival | Idle |
|---|---|---|---|---|---|---|
| 0 | 37.47 | 7.48 | 1 | 47n/130c | 100% | 100% |
| 5 | 78.99 | 30.61 | 1 | 47n/130c | 78% | 70% |
| 50 | 81.49 | 64.92 | 15 | 49n/135c | 93% | 73% |
| 100 | 82.46 | 59.16 | 13 | 51n/133c | 93% | 75% |
| 200 | 86.13 | 58.55 | 16 | 56n/257c | 95% | 75% |
| 227 (converged) | 87.99 | - | - | 56n/197c | - | - |
| Generation | Best | Avg |
|---|---|---|
| 0 | 37.47 | 7.48 |
| 5 | 78.99 | 30.61 |
| 50 | 81.49 | 64.92 |
| 100 | 82.46 | 59.16 |
| 200 | 86.13 | 58.55 |
| 227 | 87.99 | - |
3.2 Comparison with Rail-003b
| Metric | Rail-003b (flat) | Rail-004 (structured) | Interpretation |
|---|---|---|---|
| Best fitness | 99.69 | 87.99 | Flat brain found a more optimal strategy |
| Convergence gen | 349 | 227 | Structured brain converged faster |
| Total hidden neurons | 7 | 33 (24 initial + 9 evolved) | Structured brain is much larger |
| Enabled connections | 36 | 197 | 5.5x more wiring |
| Sensors influencing throttle | 4 | 15 | Structured brain uses far more information |
| Sensors influencing brake | 3 | 15 | Same - richer braking decisions |
| Sensors influencing attention | 4 | 8 | More attention inputs |
| Emergency brake wired | No (bias only) | No (bias only) | Same - neither brain uses it |
| Minimum idle rate | 48% | 71% | Flat brain drove more actively |
| Functional hidden neurons | 5 of 7 | 33 of 33 | All structured nodes participate |
| Category | Rail-003b | Rail-004 |
|---|---|---|
| Sensors to throttle | 4 | 15 |
| Sensors to brake | 3 | 15 |
| Sensors to attention | 4 | 8 |
3.3 Evolved Topology Analysis
56 total nodes: 18 input + 33 hidden + 5 output
The 33 hidden neurons break down as:
- 6 reflex-region nodes (Step activation, nodes 23-28)
- 12 awareness-region nodes (Sigmoid, nodes 29-40)
- 6 fatigue-region nodes (Sigmoid, nodes 41-46)
- 9 evolution-added nodes (Sigmoid, nodes 119-1586)
Region specialisation:
Reflex region (nodes 23-28):
- Feeds into
attention(nodes 23, 25 -> attention with weights -0.72, -1.09) - Dense inter-node wiring (26 intra-region connections)
- Receives from
braking_distance,crossing_ahead,route_familiarity - Step activation produces binary signals: "danger/no danger"
- Node 27 connects to
acknowledge_aws(-0.79) - a reflex to suppress AWS
Situational awareness region (nodes 29-40):
- Largest region, handles primary throttle/brake computation
- Node 32:
stress -> H(32)(+1.62) andH(32) -> brake(+2.00, max weight) - stress triggers maximum braking - Node 33: central hub receiving from 8 sensors, feeding 8 other nodes - acts as a situation integrator
- Node 34: receives
cognitive_load(-1.45),at_station(+0.16), feedsbrake(-1.98) - cognitive overload suppresses braking (dangerous but schedule-optimal) - Node 37:
cognitive_load(+1.74),crossing_ahead(+1.83) ->throttle(-1.48),brake(-1.96) - high cognitive load near crossings suppresses both throttle and brake (freeze response)
Fatigue management region (nodes 41-46):
- Node 41:
fatigue(+1.77),aws_alert(+1.41) - fatigue and AWS converge - Node 44: feeds
acknowledge_aws(-0.88) - fatigue-influenced AWS response - Node 46:
fatigue(-1.11) -> modulates other fatigue nodes - The region developed an internal circuit where fatigue level modulates AWS acknowledgment timing
Evolution-added nodes (119, 269, 438, 597, 1021, 1130, 1383, 1565, 1586):
- All Sigmoid activation (inter-region default)
- Node 269:
speed_limit(-1.61) -> feeds boththrottle(-1.98) andbrake(+0.25) - a speed limit processor - Node 1130: feeds
throttle(-1.48),brake(+1.44) - evolved a throttle/brake coordinator - Node 1383:
current_speed(+1.46) ->H(38)- a speed-to-awareness bridge - These nodes bridge between regions, creating inter-regional pathways that evolution discovered were necessary but the initial structure didn't provide
4. Discussion
4.1 Why the Structured Brain Scored Lower
Three factors contributed to the 12-point fitness gap:
Over-parameterisation: 197 connections means ~197 weights to optimise simultaneously. With 300 genomes evaluated over 227 generations, evolution had ~68,100 evaluation opportunities to tune those weights. Rail-003b's 36 connections required roughly the same number of evaluations but had 5.5x fewer parameters to tune - a much easier optimisation surface.
Dense initial wiring creates noise: The reflex region started with density 0.7, meaning ~30 random connections between 6 nodes. Most of these are evolutionary garbage - random weights that inject noise into the signal path. Evolution must either repurpose or suppress them, which consumes generations that could be spent discovering useful pathways.
The speed governor is unavailable: Rail-003b's winning strategy was a simple current_speed -> brake proportional governor that kept speed below the SPAD threshold at all times. With 24 pre-wired hidden neurons between inputs and outputs, the direct input-to-output pathway is buried under layers of regional processing. The brain can't easily implement the trivial speed governor because signals must traverse regional nodes first.
4.2 Why the Structured Brain is More Realistic
Despite lower fitness, the structured brain's behavior is arguably more human:
Richer information processing: 15 sensors influence throttle (vs 4). The driver "considers" signal aspect, distance, speed, crossing proximity, station proximity, stress, cognitive load, fatigue, route familiarity, visibility, gradient, and pre-shift fatigue before deciding on throttle. A flat brain that only reads 4 sensors is not driving - it's applying a mathematical formula.
Region specialisation matches human cognition:
- Reflex region (Step activation) produces binary danger assessments - analogous to the amygdala's threat detection
- Awareness region (Sigmoid) produces graded situational assessments - analogous to cortical processing
- Fatigue region modulates AWS response based on fatigue level - analogous to how fatigue degrades procedural compliance
Cognitive overload produces realistic failure modes: Node 37's response to high cognitive load near crossings - suppressing both throttle and brake simultaneously (freeze response) - is a documented human factors phenomenon. Under cognitive overload, drivers sometimes fail to act at all. The flat brain never exhibited this because it didn't process cognitive load in the context of crossings.
Fatigue affects specific behaviors, not general performance: The fatigue region's connection to AWS acknowledgment (but not to throttle or brake) suggests the brain learned that fatigue degrades procedural responses (acknowledging warnings) before it degrades operational responses (speed management). This matches real-world observations: fatigued drivers miss procedural checks before they miss signals.
4.3 The Optimality-Realism Trade-off
Rail-003b's flat brain achieved near-perfect fitness (99.69) through a strategy no human driver would use: maintain permanently low speed to avoid all signal encounters. Rail-004's structured brain achieved lower fitness (87.99) through a strategy that actively processes signals, manages cognitive load, and degrades realistically under fatigue.
This reveals a fundamental tension in connectome-based behavior evolution: unconstrained evolution finds degenerate shortcuts that maximise fitness without producing realistic behavior. Architectural constraints (regions) channel evolution away from these shortcuts and toward strategies that must actually process information through structured pathways - the way biological brains do.
The implication for fitness function design: if the goal is realistic behavior, the fitness function must make degenerate strategies impossible (as we did with the terminus penalty and idle-as-death rule in Rail-002), AND the brain architecture must force information through structured processing (as we did with regions in Rail-004).
4.4 Plasticity Impact
The plasticity mechanisms (Hebbian, decay, homeostatic) were active but their impact is difficult to isolate in this experiment because they co-vary with regions. A clean comparison would require:
- Rail-004b: regions WITHOUT plasticity
- Rail-004c: plasticity WITHOUT regions (flat brain + learning)
These controlled experiments would determine whether the behavioral differences come from structure, learning, or their interaction.
4.5 Evolution-Added Neurons
9 neurons evolved on top of the 24 regional nodes. All 9 use Sigmoid activation (the inter-region mutation default). Their primary role: bridging between regions. Node 269 bridges speed_limit to the awareness region processing. Node 1383 bridges current_speed to the awareness region. Node 1130 evolved a dedicated throttle/brake coordinator.
This suggests that the initial regional structure was incomplete - evolution needed inter-regional pathways that the intra-region density didn't provide. A future experiment could add initial inter-region connections (pathway hints from the whitepaper) to see if this reduces the need for evolution-added bridge nodes.
5. Emergent Behaviors: What the Structured Brain Invented
| Designed (in .quale) | Emergent (evolved by brain) |
|---|---|
| 3 named regions with specific node counts | Region specialisation matching intended function |
| Step activation for reflex nodes | Binary danger assessment feeding attention |
| Sigmoid for awareness nodes | Central situation integrator (node 33) processing 8 sensors |
| Fatigue region exists | Fatigue modulates AWS acknowledgment timing specifically |
| Plasticity parameters set | Cannot isolate plasticity effect (confounded with regions) |
| Hebbian rate 0.005 | Connections between co-active pathways strengthened during scenarios |
| Inter-region connections evolve | Bridge neurons connecting speed processing to awareness |
| Cognitive load as a sensor | Freeze response under cognitive overload near crossings |
6. Cross-Experiment Summary
| Feature | Rail-001 | Rail-002 | Rail-003 | Rail-003b | Rail-004 |
|---|---|---|---|---|---|
| Throttle | Binary | Continuous | Continuous | Continuous | Continuous |
| Attention | Unrewarded | Rewarded | Causal | Causal | Causal |
| Regions | No | No | No | No | Yes (3) |
| Plasticity | No | No | No | No | Yes (all 3) |
| Best fitness | 81.70 | 99.58 | 99.03 | 99.69 | 87.99 |
| Hidden neurons | 0 | 0 | 1 | 7 | 33 |
| Connections | 11 | 13 | 22 | 36 | 197 |
| Sensors -> throttle | 0 | 2 | 2 | 10 | 15 |
| Primary strategy | Don't move | Speed governor | Active vigilance | Speed gov + vigilance | Multi-region processing |
| Human-like? | No | Partially | More | More | Most |
| Experiment | Best Fitness |
|---|---|
| Rail-001 | 81.70 |
| Rail-002 | 99.58 |
| Rail-003 | 99.03 |
| Rail-003b | 99.69 |
| Rail-004 | 87.99 |
7. Conclusion
Rail-004 demonstrated that structured brain architecture produces qualitatively different evolved behavior than flat topologies. The 24-neuron, 3-region brain with plasticity achieved lower fitness (87.99 vs 99.69) but exhibited richer, more realistic behavior: processing 15 sensors for speed decisions, developing region-specific specialisation matching intended cognitive functions, and producing realistic failure modes (cognitive overload freeze response, fatigue-degraded procedural compliance).
The central finding: architectural constraints trade optimality for realism. Unconstrained flat brains find degenerate shortcuts. Structured brains must process information through functional pathways, producing behavior that more closely resembles human cognition - including its failure modes.
Design principle: To evolve realistic behavior, constrain the brain's architecture to match the target organism's cognitive structure. Accept lower fitness scores as the cost of authenticity. Fitness measures how well the agent games the fitness function; behavioral analysis measures how realistically it performs the task.
8. Future Directions
- Controlled ablation: Rail-004b (regions without plasticity) and Rail-004c (plasticity without regions) to isolate individual feature contributions.
- Pathway hints: Pre-seed inter-region connections (sensor -> reflex, awareness -> actuator) to reduce evolution's need for bridge neurons.
- Extended evolution: Run Rail-004 for 2000+ generations to determine if the structured brain eventually matches or exceeds flat brain fitness while retaining its richer behavior.
- Signal speed: Add multi-tick signal delay through slow regions (v0.3) to model processing latency differences between reflexive and deliberative circuits.
- Recurrence: Enable feedback loops in the fatigue management region (v0.3) to model fatigue memory - tracking accumulated fatigue over time rather than just current fatigue level.