Run-over-Run Test Comparison with TofuPilot
When a unit fails, the first question is always: "What's different from a passing unit?" Run-over-run comparison in TofuPilot lets you put two or more test runs side by side and see exactly where they diverge.
When to Use Run Comparison
- Diagnosing a failure: Compare a failed run to a recent passing run for the same procedure
- Investigating a retest: Compare first test to retest for the same unit
- Validating a fix: Compare runs before and after a corrective action
- Tracking unit history: Compare the same unit's results across different test stages
- Benchmarking stations: Compare the same unit tested on different stations
How Run Comparison Works
TofuPilot stores every measurement from every run. When you compare runs, the system aligns measurements by name and shows the values side by side with their limits.
Run A (Pass) Run B (Fail)
──────────── ────────────
vcc_3v3 3.30 V ✓ 3.28 V ✓
vcc_1v8 1.81 V ✓ 1.74 V ✗
clk_freq 24.00 MHz ✓ 23.98 MHz ✓
boot_time 320 ms ✓ 1,240 ms ✗
current_idle 45 mA ✓ 78 mA ✗
Three measurements differ significantly. The 1.8V rail is low, boot time is 4x longer, and idle current is 73% higher. These symptoms together point to a partial short on the 1.8V power rail causing excess current draw and slow boot.
Comparing Runs in TofuPilot
Step 1: Find the Runs
Navigate to the procedure page and filter to find the runs you want to compare. Common filters:
| Filter | Use case |
|---|---|
| Status: Failed | Find failing runs to diagnose |
| Serial number | Find all runs for a specific unit |
| Date range | Narrow to a time period |
| Station | Compare across stations |
Step 2: Select Runs for Comparison
Select two or more runs from the run list. TofuPilot aligns their measurements by step name and measurement name.
Step 3: Read the Comparison
Focus on measurements where the values differ significantly. Small variations (3.30V vs. 3.31V) are normal measurement noise. Large deviations (1.81V vs. 1.74V) indicate a real difference.
Color coding helps:
- Green: Both values within limits
- Red: Value outside limits
- Yellow: Value within limits but significantly different from the reference
Common Comparison Patterns
Pattern 1: Single Measurement Failure
One measurement fails, everything else is identical. This usually means:
- Component value out of tolerance
- Solder defect on that specific circuit
- Test probe contact issue (retest to confirm)
Pattern 2: Correlated Failures
Multiple related measurements fail together (e.g., voltage low + current high + boot slow). This points to a systemic issue:
- Power rail problem affecting multiple circuits
- Firmware crash causing downstream test failures
- Fixture contact issue on a shared connection
Pattern 3: All Measurements Shifted
Every measurement is slightly different from the reference, but most are still within limits. This suggests:
- Different environmental conditions (temperature affecting all measurements)
- Different station (instrument calibration differences)
- Different component lot (systematic parameter shift)
Pattern 4: Intermittent Failure
Same unit, same station, same procedure. Sometimes passes, sometimes fails. Compare the passing and failing runs:
- If the failing measurement is always the same one, it's a marginal value near a limit
- If different measurements fail each time, it's likely a contact issue (pogo pin, cable)
- If the pattern is time-dependent, check for thermal effects
Comparing Across Production Batches
Run comparison isn't just for debugging. Use it to validate that a new production batch matches the previous one.
- Select a representative passing run from batch N
- Select the first runs from batch N+1
- Compare measurement distributions
If batch N+1 measurements are systematically shifted (even if still within limits), investigate before the full batch runs through production.
Using the API for Programmatic Comparison
from tofupilot import TofuPilotClient
client = TofuPilotClient()
# Get two runs to compare
run_pass = client.get_run(run_id="run-id-pass")
run_fail = client.get_run(run_id="run-id-fail")
# Compare measurements
for step_p, step_f in zip(run_pass["steps"], run_fail["steps"]):
for m_p, m_f in zip(step_p["measurements"], step_f["measurements"]):
diff = abs(m_p["value"] - m_f["value"])
if diff > 0:
pct = diff / m_p["value"] * 100 if m_p["value"] != 0 else float("inf")
status = "DIFF" if pct > 5 else "ok"
print(f"{m_p['name']:30s} {m_p['value']:10.3f} {m_f['value']:10.3f} {pct:6.1f}% {status}")This script highlights measurements that differ by more than 5%, giving you a quick programmatic way to identify where two runs diverge.