Run-over-Run Test Comparison with TofuPilot

When a unit fails, the first question is always: "What's different from a passing unit?" Run-over-run comparison in TofuPilot lets you put two or more test runs side by side and see exactly where they diverge.

When to Use Run Comparison

Diagnosing a failure: Compare a failed run to a recent passing run for the same procedure
Investigating a retest: Compare first test to retest for the same unit
Validating a fix: Compare runs before and after a corrective action
Tracking unit history: Compare the same unit's results across different test stages
Benchmarking stations: Compare the same unit tested on different stations

How Run Comparison Works

TofuPilot stores every measurement from every run. When you compare runs, the system aligns measurements by name and shows the values side by side with their limits.

                    Run A (Pass)      Run B (Fail)
                    ────────────      ────────────
vcc_3v3             3.30 V    ✓      3.28 V    ✓
vcc_1v8             1.81 V    ✓      1.74 V    ✗
clk_freq            24.00 MHz ✓      23.98 MHz ✓
boot_time           320 ms    ✓      1,240 ms  ✗
current_idle        45 mA     ✓      78 mA     ✗

Three measurements differ significantly. The 1.8V rail is low, boot time is 4x longer, and idle current is 73% higher. These symptoms together point to a partial short on the 1.8V power rail causing excess current draw and slow boot.

Comparing Runs in TofuPilot

Step 1: Find the Runs

Navigate to the procedure page and filter to find the runs you want to compare. Common filters:

Filter	Use case
Status: Failed	Find failing runs to diagnose
Serial number	Find all runs for a specific unit
Date range	Narrow to a time period
Station	Compare across stations

Step 2: Select Runs for Comparison

Select two or more runs from the run list. TofuPilot aligns their measurements by step name and measurement name.

Step 3: Read the Comparison

Focus on measurements where the values differ significantly. Small variations (3.30V vs. 3.31V) are normal measurement noise. Large deviations (1.81V vs. 1.74V) indicate a real difference.

Color coding helps:

Green: Both values within limits
Red: Value outside limits
Yellow: Value within limits but significantly different from the reference

Common Comparison Patterns

Pattern 1: Single Measurement Failure

One measurement fails, everything else is identical. This usually means:

Component value out of tolerance
Solder defect on that specific circuit
Test probe contact issue (retest to confirm)

Pattern 2: Correlated Failures

Multiple related measurements fail together (e.g., voltage low + current high + boot slow). This points to a systemic issue:

Power rail problem affecting multiple circuits
Firmware crash causing downstream test failures
Fixture contact issue on a shared connection

Pattern 3: All Measurements Shifted

Every measurement is slightly different from the reference, but most are still within limits. This suggests:

Different environmental conditions (temperature affecting all measurements)
Different station (instrument calibration differences)
Different component lot (systematic parameter shift)

Pattern 4: Intermittent Failure

Same unit, same station, same procedure. Sometimes passes, sometimes fails. Compare the passing and failing runs:

If the failing measurement is always the same one, it's a marginal value near a limit
If different measurements fail each time, it's likely a contact issue (pogo pin, cable)
If the pattern is time-dependent, check for thermal effects

Comparing Across Production Batches

Run comparison isn't just for debugging. Use it to validate that a new production batch matches the previous one.

Select a representative passing run from batch N
Select the first runs from batch N+1
Compare measurement distributions

If batch N+1 measurements are systematically shifted (even if still within limits), investigate before the full batch runs through production.

Using the API for Programmatic Comparison

compare_runs.py

from tofupilot import TofuPilotClientclient = TofuPilotClient()# Get two runs to comparerun_pass = client.get_run(run_id="run-id-pass")run_fail = client.get_run(run_id="run-id-fail")# Compare measurementsfor step_p, step_f in zip(run_pass["steps"], run_fail["steps"]):    for m_p, m_f in zip(step_p["measurements"], step_f["measurements"]):        diff = abs(m_p["value"] - m_f["value"])        if diff > 0:            pct = diff / m_p["value"] * 100 if m_p["value"] != 0 else float("inf")            status = "DIFF" if pct > 5 else "ok"            print(f"{m_p['name']:30s} {m_p['value']:10.3f} {m_f['value']:10.3f} {pct:6.1f}% {status}")

This script highlights measurements that differ by more than 5%, giving you a quick programmatic way to identify where two runs diverge.

Run-over-Run Test Comparison with TofuPilot

Run-over-Run Test Comparison with TofuPilot

When to Use Run Comparison

How Run Comparison Works

Comparing Runs in TofuPilot

Step 1: Find the Runs

Step 2: Select Runs for Comparison

Step 3: Read the Comparison

Common Comparison Patterns

Pattern 1: Single Measurement Failure

Pattern 2: Correlated Failures

Pattern 3: All Measurements Shifted

Pattern 4: Intermittent Failure

Comparing Across Production Batches

Using the API for Programmatic Comparison

More Guides

Put this guide into practice