Home / Research / Rail-001
Rail-001

Emergent Train Driver Behaviour from Evolved Connectome Topologies: Baseline Signal Compliance Without Explicit Driving Logic

Abstract

We apply the Quale connectome evolution framework to a railway operations domain to test whether NEAT-evolved agents can discover train driving behaviour (signal compliance, throttle control, and station stopping) from survival pressure alone, without any explicit driving logic. This experiment serves as a domain agnosticism validation: the same evolutionary architecture that produced foraging behaviour in the survival domain (Experiment 001) is applied, without modification, to a fundamentally different problem.

Across 10 independent seeds over 200 generations, evolved drivers achieved only a 4.1% mean fitness improvement over generation 0. The dominant evolved strategy was stationary behaviour: agents learned that remaining idle avoids signal violations and collisions, exploiting a loophole in the fitness function. A 71% idle rate across best genomes confirmed that the fitness landscape contains a strong stationary basin of attraction that evolution cannot escape under the current configuration.

Despite the failure to produce meaningful driving behaviour, the experiment successfully validates domain agnosticism: the Quale architecture transferred to the railway domain without code changes. The results identify three critical design flaws (binary throttle, weak idle penalty, no terminus pressure) that must be corrected in Rail-002.

1. Introduction

1.1 Background

Experiments 001 through 003 established that NEAT-evolved connectomes can discover survival, discrimination, and social behaviours in a 2D foraging environment. The natural question is whether this approach generalises: can the same evolutionary architecture produce meaningful behaviour in a completely different domain?

Railway operations provide an ideal test case. Train driving requires signal compliance (stopping at red signals, proceeding at green), throttle modulation (accelerating and braking appropriately), and station stopping (halting at designated stations to collect passengers). These behaviours are qualitatively different from foraging, requiring temporal sequencing and constraint satisfaction rather than spatial search.

1.2 Hypothesis

If the Quale architecture is genuinely domain-agnostic, then applying NEAT evolution with appropriate sensory inputs and fitness pressures should produce emergent driving behaviour without any explicit driving code. Agents should learn to obey signals, modulate speed, and stop at stations purely through topology evolution.

1.3 Domain Agnosticism Validation

This experiment is explicitly designed as a domain transfer test. The NEAT implementation, speciation logic, mutation operators, and genome encoding are identical to those used in the survival experiments. Only the simulation environment, sensory inputs, motor outputs, and fitness function change. If the architecture works here, it validates the claim that Quale is a general-purpose behavioural evolution framework, not a system tuned for one specific problem.

2. Materials and Methods

2.1 Simulation Environment

The environment is a simplified railway line consisting of:

  • Track: A single linear track with defined start and terminus points
  • Signals: Block signals at fixed intervals, each displaying red (stop) or green (proceed)
  • Stations: Designated stopping points along the track where passengers board
  • Speed limits: Maximum permitted speed for each track section

Each simulation tick represents a discrete time step. The train has realistic (simplified) physics: acceleration is gradual, braking distance depends on current speed, and overshooting a red signal constitutes a Signal Passed at Danger (SPAD) violation.

2.2 Agent Architecture

Each agent is controlled entirely by its evolved connectome. The agent has:

Sensory inputs (8 neurons):

  • Current speed (normalised 0-1)
  • Distance to next signal (normalised)
  • Next signal state (0 = red, 1 = green)
  • Distance to next station (normalised)
  • Whether currently at a station (binary)
  • Current speed limit (normalised 0-1)
  • Distance to terminus (normalised)
  • Bias node (constant 1.0)

Motor outputs (2 neurons):

  • Throttle (binary: 0 = brake/coast, 1 = accelerate)
  • Door open (binary: 0 = closed, 1 = open at station)

There is no hidden layer at initialisation. NEAT begins with direct input-to-output connections and may add hidden nodes and connections through mutation.

2.3 NEAT Configuration

Parameter Value
Population size 150
Generations 200
Simulation ticks per evaluation 1,000
Weight mutation rate 0.8
Add-node mutation rate 0.03
Add-connection mutation rate 0.05
Species compatibility threshold 3.0
Survival threshold 0.2

2.4 Fitness Function

The fitness function combines multiple objectives:

fitness = (distance_travelled / track_length) * 0.4
        + signal_compliance_rate * 0.3
        + station_stop_rate * 0.2
        - spad_violations * 0.1
        - idle_penalty * 0.05

The components are:

  • Distance travelled (40%): Fraction of total track length covered
  • Signal compliance (30%): Fraction of signals correctly obeyed (stopping at red, proceeding at green)
  • Station stops (20%): Fraction of stations where the train stopped and opened doors
  • SPAD penalty (10%): Deduction for each signal passed at danger
  • Idle penalty (5%): Small deduction for ticks spent stationary outside of stations

2.5 Experimental Protocol

  1. 10 independent seeds: Each run uses a different random seed for NEAT initialisation
  2. 200 generations per seed: Reduced from 300 (survival experiments) due to early convergence patterns observed in pilot runs
  3. Metrics recorded: Best fitness, mean fitness, species count, node count, connection count, idle rate, SPAD count, distance travelled, and station stops per generation
  4. Track layout: Fixed across all seeds: 5 signals, 3 stations, 1 terminus

3. Results

3.1 Fitness Progression

Fitness improvement was minimal across all 10 seeds:

Seed Gen 0 Best Gen 200 Best Improvement Idle Rate
1 0.301 0.318 5.6% 74%
2 0.298 0.312 4.7% 69%
3 0.305 0.314 3.0% 72%
4 0.295 0.310 5.1% 68%
5 0.302 0.311 3.0% 73%
6 0.300 0.315 5.0% 70%
7 0.297 0.306 3.0% 75%
8 0.303 0.316 4.3% 71%
9 0.299 0.309 3.3% 69%
10 0.304 0.317 4.3% 72%

Mean improvement: 4.1% | Best seed: 5.6% (Seed 1) | Worst seed: 3.0% (Seeds 3, 5, 7) | Mean idle rate: 71%

Rail-001: Fitness progression (converged at gen 200)
Best fitness Average fitness
GenerationBestAvg
078.5054.28
5078.8162.25
10078.8162.25
20081.7060.15

3.2 Behavioural Analysis

The dominant evolved strategy across all seeds was stationary behaviour. Rather than learning to drive the train, agents discovered that remaining idle maximises fitness by avoiding SPAD violations and signal non-compliance penalties. A stationary train cannot pass a signal at danger, and the 5% idle penalty is far too weak to overcome the 30% signal compliance bonus that a stationary agent receives by default (a train that never encounters a signal technically has a 100% compliance rate with encountered signals).

Behaviour Gen 0 Gen 50 Gen 100 Gen 200
Stationary 4/10 7/10 8/10 9/10
Creeping (minimal movement) 3/10 2/10 2/10 1/10
Driving (meaningful progress) 2/10 1/10 0/10 0/10
Erratic (random throttle) 1/10 0/10 0/10 0/10

By generation 200, 9 out of 10 seeds had converged on stationary or near-stationary strategies. No seed produced a driver capable of traversing even half the track length.

Idle rate: the stationary driver problem
Idle rate %
GenerationIdle %
071%
5077%
10077%
20072%

3.3 Topology Analysis

Evolved topologies remained minimal, reflecting the simplicity of the discovered (stationary) strategy:

Metric Gen 0 Gen 200 Mean Gen 200 Range
Hidden nodes 0 0.0 0-0
Connections 16 11.2 8-14
Enabled connections 16 9.8 7-12
Species 1 2.1 1-4

No seed evolved any hidden nodes. Connection counts actually decreased from generation 0, as evolution pruned unnecessary connections. The resulting topologies encode a trivially simple strategy: suppress the throttle output regardless of input. This is the topological equivalent of a driver who never touches the controls.

4. Discussion

4.1 The Stationary Driver Problem

The central finding of Rail-001 is the stationary basin of attraction: a region of the fitness landscape where remaining idle is locally optimal. This occurs because the fitness function rewards signal compliance (30% weight) and penalises SPAD violations (10% weight), but only weakly penalises idleness (5% weight). A stationary agent achieves perfect signal compliance (it never encounters a signal it could violate), zero SPAD violations, and loses only the small idle penalty and the distance component. The net result is a fitness of approximately 0.30, which is difficult for a moving agent to exceed without sophisticated driving behaviour.

This creates a survival-idle tradeoff stagnation: any mutation that causes the train to move risks incurring SPAD violations, which are penalised more heavily than idleness is. Evolution therefore selects against movement, producing ever more committed stationary strategies.

The tradeoff is analogous to the stationary behaviour observed in early generations of Experiment 001. In that case, agents initially remained still (avoiding danger) before discovering that movement toward resources was necessary for survival. The critical difference is that in Experiment 001, starvation and dehydration provided inescapable pressure to move. In Rail-001, there is no equivalent pressure: a stationary train does not "die."

4.2 Design Flaws Identified

Three specific design flaws in the Rail-001 configuration prevented meaningful behaviour from emerging:

Flaw 1: Binary throttle. The throttle output is binary (accelerate or brake/coast), providing no ability to modulate speed. A continuous throttle with proportional control would allow agents to creep slowly toward signals, reducing SPAD risk while still making progress. The binary design forces agents into an all-or-nothing choice: full acceleration (high SPAD risk) or no acceleration (zero risk). Evolution reliably chooses zero risk.

Flaw 2: Weak idle penalty. The 5% idle penalty weight is insufficient to overcome the benefits of remaining stationary. A stationary agent scores approximately 0.30 (from signal compliance and zero violations). To exceed this by moving, an agent would need to travel a significant fraction of the track without any SPAD violations, a behaviour that requires coordinated throttle and signal-reading, which is unlikely to emerge in a single mutation step. The idle penalty must be strong enough that stationary behaviour scores worse than even clumsy driving.

Flaw 3: No terminus pressure. There is no reward or pressure for reaching the terminus. In real railway operations, completing the journey is the fundamental objective. Without a strong terminus reward, there is no evolutionary incentive to make forward progress. The distance component (40% weight) is insufficient because it competes with the compliance and SPAD components, which are easier to optimise by remaining still.

4.3 Domain Agnosticism Confirmed

Despite the failure to produce driving behaviour, Rail-001 successfully validates the core domain agnosticism claim. The Quale architecture (NEAT evolution, connectome encoding, speciation, mutation operators) transferred to the railway domain without any code changes. The system correctly:

  • Initialised connectomes with railway-specific inputs and outputs
  • Evolved topologies in response to the railway fitness function
  • Maintained species diversity through speciation
  • Converged on a locally optimal strategy (stationary behaviour)

The failure is not in the architecture but in the fitness function design. The system did exactly what it was asked to do: find the strategy that maximises fitness. The problem is that the fitness landscape has a degenerate optimum (staying still) that does not correspond to useful driving behaviour. This is a design flaw, not an architectural limitation.

This finding is itself valuable. It demonstrates that the Quale framework is sensitive to fitness function design, just as biological evolution is sensitive to environmental pressures. If the environment does not create sufficient pressure to move, organisms will not evolve locomotion. Likewise, if the fitness function does not sufficiently penalise idleness, agents will not evolve driving behaviour.

5. Conclusion

Rail-001 produced a negative result for driving behaviour but a positive result for domain agnosticism. The key findings are:

  1. 4.1% mean fitness improvement over random baseline, with 71% idle rate across best genomes
  2. Stationary basin of attraction: evolution converges on idle strategies that avoid violations rather than learning to drive
  3. Zero topology complexity: no hidden nodes evolved across any seed, with connection counts decreasing from 16 to 11 on average
  4. Three design flaws identified: binary throttle, weak idle penalty (5%), and no terminus pressure
  5. Domain agnosticism confirmed: the Quale architecture transferred to the railway domain without code changes, correctly optimising the (flawed) fitness function

The experiment demonstrates that meaningful domain behaviour requires carefully designed fitness pressure. Survival pressure (Experiment 001) naturally creates movement incentives because starvation is inescapable. Railway operations require explicit pressure to make progress, because a stationary train is in no danger.

6. Recommendations for Rail-002

Based on the failures identified in Rail-001, the following changes are recommended for Rail-002:

  1. Continuous throttle: Replace binary throttle with a continuous output (0.0 to 1.0) mapping to a speed range, allowing proportional speed control and gradual approaches to signals
  2. Stronger idle penalty: Increase the idle penalty weight from 5% to at least 20%, ensuring that stationary behaviour scores significantly worse than even imperfect driving
  3. Terminus reward: Add a substantial reward (30% weight) for reaching the terminus, creating strong forward-progress pressure
  4. Progressive signal density: Start with fewer signals and increase density across generations, allowing agents to learn basic movement before introducing compliance challenges
  5. Curriculum structure: Consider a phased fitness function that initially rewards only movement, then gradually introduces signal compliance requirements
# Proposed Rail-002 fitness function
fitness = (distance_travelled / track_length) * 0.25
        + terminus_reached * 0.30
        + signal_compliance_rate * 0.20
        + station_stop_rate * 0.10
        - spad_violations * 0.10
        - idle_penalty * 0.20

The revised function doubles the idle penalty weight, adds a 30% terminus reward, and rebalances the remaining components. This should eliminate the stationary basin of attraction while preserving pressure for signal compliance and station stopping.