Test Station Monitoring and Performance

A test station that's silently degrading is worse than one that's down. Slow tests eat throughput. Dropping yields hide root causes. You need visibility into station health before problems hit your production line.

This guide covers what to monitor, how to capture station health metrics inside your OpenHTF tests, and how to detect drift using TofuPilot's analytics.

What to Monitor

Four metrics tell you most of what you need to know about a test station.

Test throughput. Units per hour, per station. A drop means something changed: slower tests, more retests, or operator delays.

Pass rate trends. FPY over time, not just today's number. A slow decline from 97% to 93% over two weeks is easy to miss in daily reports.

Average test duration. Track per-phase and total. If your calibration phase went from 4s to 12s, the instrument connection is probably degrading.

Station errors. Uncaught exceptions, instrument timeouts, fixture faults. These don't always fail the DUT, but they signal trouble.

Capture Station Health Metrics

You can log station health data (CPU, memory, disk) as OpenHTF measurements alongside your DUT tests. This gives you a per-run snapshot of station condition.

station_health.py

import openhtf as htfimport psutil@htf.measures(    htf.Measurement("cpu_percent").in_range(maximum=90),    htf.Measurement("memory_percent").in_range(maximum=85),    htf.Measurement("disk_percent").in_range(maximum=90),    htf.Measurement("disk_read_mb"),    htf.Measurement("cpu_temp"),)def station_health_check(test):    """Capture station health metrics before running DUT tests."""    test.measurements.cpu_percent = psutil.cpu_percent(interval=1)    test.measurements.memory_percent = psutil.virtual_memory().percent    test.measurements.disk_percent = psutil.disk_usage("/").percent    test.measurements.disk_read_mb = psutil.disk_io_counters().read_bytes / (1024 * 1024)    # CPU temperature (Linux only, returns empty list on other platforms)    temps = psutil.sensors_temperatures()    if temps and "coretemp" in temps:        test.measurements.cpu_temp = temps["coretemp"][0].current    else:        test.measurements.cpu_temp = 0.0

Add this phase at the start of your test sequence. If CPU or memory is pegged, you'll see it in TofuPilot before it causes flaky test results.

main_test.py

import openhtf as htffrom openhtf.util import unitsfrom tofupilot.openhtf import TofuPilotfrom station_health import station_health_check# Your DUT test phases@htf.measures(htf.Measurement("voltage_3v3").in_range(3.1, 3.5).with_units(units.VOLT))def test_power_rail(test):    test.measurements.voltage_3v3 = 3.28def main():    test = htf.Test(        station_health_check,        test_power_rail,    )    with TofuPilot(test):        test.execute(test_start=lambda: "DUT-001")if __name__ == "__main__":    main()

Detect Performance Drift in TofuPilot

TofuPilot tracks test duration, pass rates, and measurement trends per station automatically. Use the Analytics tab to spot drift:

Test duration trend. Filter by station and check whether average test time is increasing. A 15%+ increase over baseline signals instrument connection degradation, fixture wear, or background process interference.
FPY by station. Compare yield across stations running the same procedure. A station with 3+ points lower FPY than its neighbors needs fixture inspection.
Measurement histograms. Check whether station health measurements (CPU, memory, disk) are creeping toward their limits over time.
Failure Pareto. If one station accounts for a disproportionate share of failures, investigate that station's fixture and connections.

Monitoring Checklist

Metric	Frequency	Threshold	Action
Test throughput (units/hr)	Hourly	Below 80% of target	Check for operator delays, instrument timeouts
First pass yield	Per shift	Below 95% (or your target)	Investigate top failing phases
Average test duration	Daily	More than 15% above baseline	Check instrument connections, fixture wear
CPU usage	Per run	Above 90%	Close background processes, check for memory leaks
Memory usage	Per run	Above 85%	Restart station, check for leaking test processes
Disk usage	Daily	Above 90%	Clean logs, archive old data
Station errors	Per run	Any uncaught exception	Fix root cause, add error handling
Instrument timeout rate	Daily	Above 1%	Check cables, GPIB/USB connections

Test Station Monitoring and Performance

What to Monitor

Capture Station Health Metrics

Detect Performance Drift in TofuPilot

Monitoring Checklist

More Guides

Put this guide into practice