Skip to content
Scaling & Monitoring

Scale from 1 to 100 Test Stations

Architecture guide for scaling test infrastructure from 1 to 100 stations, covering naming, script distribution, config management, networking, and monitoring.

JJulien Buteau
advanced12 min readMarch 14, 2026

Run one station or a hundred: TofuPilot handles both, but the architecture decisions you make at 10 stations determine whether you succeed at 100. This guide walks through the changes needed at each scale threshold.

Scaling Challenges by Stage

StationsWhat breaksRoot cause
1NothingYou're fine
5-10Config driftManual setup per station
10-25Script version mismatchNo deployment pipeline
25-50Debugging blindnessNo centralized logs
50-100Network saturationAll stations hit cloud simultaneously
100+Identity collisionNon-unique station names

Architecture at Each Scale

1 Station

A single machine running your test script. No special infrastructure needed.

[DUT] -> [Test Station] -> [TofuPilot Cloud]

10 Stations

At 10 stations, the bottleneck is configuration. You need a shared config source, consistent naming, and a way to push script updates.

[10 DUTs] -> [10 Test Stations] -> [TofuPilot Cloud] | [Git repo for scripts]

100 Stations

At 100, you need a deployment pipeline, centralized monitoring, and staggered upload scheduling.

[100 DUTs] -> [100 Test Stations] -> [TofuPilot Cloud] | | [CI/CD pipeline] [Health dashboard] [Config server]

Station Naming and Registration

TofuPilot identifies stations by name. A collision corrupts per-station analytics.

{site}-{line}-{station_number} # Examples: taipei-line1-001 munich-eol-001

Set the station name as an environment variable:

/etc/environment
TOFUPILOT_STATION_ID=taipei-line1-001
TOFUPILOT_API_KEY=tp_live_xxxxxxxxxxxx

Test Script Distribution

Option 1: Git Pull (1-20 stations)

/etc/cron.d/update-test-script
0 6 * * * testuser cd /opt/testscripts && git pull origin main

Pros: Simple. Cons: Requires network access to Git host. Fails silently.

Option 2: PyInstaller Binary (20-50 stations)

build_and_deploy.sh
#!/bin/bash
pyinstaller --onefile test_main.py --name test_runner

for station in taipei-line1-{001..020}; do
  rsync -az dist/test_runner "${station}:/opt/testscripts/test_runner_new"
  ssh "${station}" "mv /opt/testscripts/test_runner_new /opt/testscripts/test_runner"
done

Pros: No Python on stations. Cons: Slower iteration, larger binary.

Option 3: Docker (50-100 stations)

Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY test_main.py .
CMD ["python", "test_main.py"]
docker-compose.yml
version: "3.9"
services:
  test_runner:
    image: your-registry/test-runner:latest
    restart: unless-stopped
    environment:
      - TOFUPILOT_STATION_ID
      - TOFUPILOT_API_KEY
    devices:
      - /dev/ttyUSB0:/dev/ttyUSB0

Update all stations:

deploy.sh
#!/bin/bash
parallel-ssh -h stations.txt "docker compose pull && docker compose up -d"

Distribution Method Comparison

MethodStationsPython on stationUpdate speedRollback
Git pull1-20YesFastgit revert
PyInstaller20-50NoMediumReplace binary
Docker50-100NoFastPrevious image tag

Centralized Configuration

Never hardcode values that vary per station or product.

Config tierScopeExample
Environment variablePer stationTOFUPILOT_STATION_ID, TOFUPILOT_API_KEY
Config filePer productVoltage limits, timing thresholds
Test scriptPer test phaseOpenHTF phase logic
config/product_v2.yaml
voltage_rail_3v3:
  min: 3.2
  max: 3.4
voltage_rail_5v:
  min: 4.8
  max: 5.2
boot_time_ms:
  min: 0
  max: 3000
test_main.py
import yaml
import openhtf as htf
from openhtf.util import units
from tofupilot.openhtf import TofuPilot

with open("config/product_v2.yaml") as f:
    cfg = yaml.safe_load(f)

@htf.measures(
    htf.Measurement("voltage_3v3").in_range(
        cfg["voltage_rail_3v3"]["min"],
        cfg["voltage_rail_3v3"]["max"]
    ).with_units(units.VOLT)
)
def phase_power_check(test):
    voltage = read_voltage_rail("3v3")
    test.measurements.voltage_3v3 = voltage

Network Topology

Stations communicate only outbound to TofuPilot. No inbound ports required.

DestinationPortPurpose
tofupilot.app443Test result upload
Your Git host443Script updates
Your container registry443Image pulls

At 100 stations, stagger uploads to avoid thundering herd:

upload_utils.py
import time
import random

def upload_with_jitter(upload_fn, max_jitter_seconds=10):
    time.sleep(random.uniform(0, max_jitter_seconds))
    upload_fn()

Monitoring Station Health

TofuPilot tracks per-station yield, throughput, and test duration automatically. Open the Analytics tab filtered by station to spot degradation.

Station Health Checklist

CheckIntervalAction on failure
Heartbeat to TofuPilot1 minAlert on-call, check network
Test script versionOn startAuto-update via pipeline
Disk space1 hourArchive old logs, alert if under 5 GB
USB fixture connectivityBefore each runFail test with clear error

Station Bootstrap Script

bootstrap.sh
#!/bin/bash
set -e

STATION_ID=$1
API_KEY=$2

if [ -z "$STATION_ID" ] || [ -z "$API_KEY" ]; then
  echo "Usage: bootstrap.sh <station-id> <api-key>"
  exit 1
fi

echo "TOFUPILOT_STATION_ID=${STATION_ID}" >> /etc/environment
echo "TOFUPILOT_API_KEY=${API_KEY}" >> /etc/environment

curl -fsSL https://get.docker.com | sh

mkdir -p /opt/testrunner
cat > /opt/testrunner/docker-compose.yml << EOF
version: "3.9"
services:
  test_runner:
    image: your-registry/test-runner:latest
    restart: unless-stopped
    env_file: /etc/environment
EOF

docker compose -f /opt/testrunner/docker-compose.yml up -d
echo "Station ${STATION_ID} is running."
terminal
sudo bash bootstrap.sh taipei-line1-042 tp_live_xxxxxxxxxxxx

More Guides

Put this guide into practice