Production Test Validation at Scale with TofuPilot

Running a test that works for 10 prototypes is different from running it for 10,000 production units. At scale, you need to validate not just the product, but the test process itself. Are your limits correct? Is your test repeatable? Are you catching real defects without creating false failures? TofuPilot's analytics help answer these questions.

What Production Test Validation Means

Production test validation (PVT) answers three questions:

Are the test limits correct? Limits that are too tight cause false failures. Limits that are too loose let defective units ship.
Is the test repeatable? The same unit tested twice should give the same result.
Is the test effective? Does it catch the defects it's supposed to catch?

Step 1: Analyze Measurement Distributions

After running your test on the first 100-200 production units, analyze the measurement distributions in TofuPilot.

distribution_analysis.py

import numpy as npfrom tofupilot import TofuPilotClientclient = TofuPilotClient()runs = client.get_runs(    procedure_id="FINAL-FUNCTIONAL-V3",    limit=200,)# Extract measurement valuesvcc_values = []for run in runs:    for step in run.get("steps", []):        for m in step.get("measurements", []):            if m["name"] == "vcc_3v3":                vcc_values.append(m["value"])values = np.array(vcc_values)print(f"N:      {len(values)}")print(f"Mean:   {np.mean(values):.4f} V")print(f"Std:    {np.std(values, ddof=1):.4f} V")print(f"Min:    {np.min(values):.4f} V")print(f"Max:    {np.max(values):.4f} V")print(f"Range:  {np.max(values) - np.min(values):.4f} V")

What to look for:

Observation	Action
Distribution centered within limits, Cpk > 1.33	Limits are well-set
Distribution skewed toward one limit	Investigate process bias
Distribution wider than expected	Tighten process or widen limits
Outliers beyond 3-sigma	Investigate those specific units
Bimodal distribution	Two populations, likely mixed lots

Step 2: Calculate Process Capability

Cpk tells you how well your process fits within the test limits. TofuPilot provides the measurement data; you calculate the Cpk.

cpk_validation.py

import numpy as npdef calculate_cpk(values, lsl, usl):    mean = np.mean(values)    std = np.std(values, ddof=1)    cpu = (usl - mean) / (3 * std)    cpl = (mean - lsl) / (3 * std)    cpk = min(cpu, cpl)    return cpk, mean, std# From TofuPilot datavcc_values = [3.30, 3.31, 3.29, 3.32, 3.30, 3.31, 3.29, 3.30, 3.33, 3.31]lsl, usl = 3.25, 3.35cpk, mean, std = calculate_cpk(vcc_values, lsl, usl)print(f"Cpk: {cpk:.2f}")print(f"Mean: {mean:.3f} V")print(f"Std: {std:.4f} V")if cpk >= 1.67:    print("Excellent process capability")elif cpk >= 1.33:    print("Acceptable process capability")elif cpk >= 1.0:    print("Marginal. Consider tightening process or widening limits")else:    print("Poor capability. Action required")

Cpk	Meaning	DPMO (approx)
2.0	Excellent	0.002
1.67	Very good	0.6
1.33	Good	63
1.0	Marginal	2,700
0.67	Poor	45,500

Step 3: Validate Test Repeatability (Gauge R&R)

Test the same unit multiple times to measure your test system's repeatability.

repeatability_test.py

from tofupilot import TofuPilotClientclient = TofuPilotClient()# Test the same unit 30 timesserial = "GRR-GOLDEN-001"for i in range(30):    vcc = measure_voltage()    client.create_run(        procedure_id="GRR-FUNCTIONAL-V3",        unit_under_test={"serial_number": serial},        run_passed=True,        steps=[{            "name": "Power Rail",            "step_type": "measurement",            "status": True,            "measurements": [{                "name": "vcc_3v3",                "value": vcc,                "unit": "V",                "limit_low": 3.25,                "limit_high": 3.35,            }],        }],    )

After 30 runs, analyze the spread in TofuPilot:

Metric	Target	Meaning
GR&R % of tolerance	< 10%	Excellent measurement system
GR&R % of tolerance	10-30%	Acceptable, monitor
GR&R % of tolerance	> 30%	Measurement system needs improvement

If your test measurement varies by 0.04V on the same unit and your tolerance is 0.10V, that's 40% GR&R. Your test is too noisy to reliably distinguish good from bad units.

Step 4: Optimize Test Limits

Use production data to optimize limits. The goal: catch real defects without rejecting good units.

Tightening Limits

If Cpk > 2.0 and you're seeing no false failures, your limits might be too loose. Tighter limits catch marginal units before they become field failures.

Widening Limits

If Cpk < 1.0 and you're seeing false failures (units that fail test but work fine in the field), your limits are too tight for your current process capability.

Dynamic Limits

Some teams use TofuPilot data to set limits based on the production distribution:

dynamic_limits.py

# Calculate limits from production datamean = 3.310std = 0.015# 4-sigma limits for Cpk = 1.33dynamic_low = mean - 4 * std   # 3.250dynamic_high = mean + 4 * std  # 3.370

Step 5: Monitor at Scale

Once your test is validated, monitor it continuously. Scale introduces new variables:

Different operators
Different component lots across months
Fixture wear over thousands of cycles
Environmental changes (season, humidity)
Equipment calibration drift

TofuPilot's trend dashboards surface these changes. Set up monitoring for:

FPY trend: Catch yield drops within hours
Cpk trend: Catch process capability degradation within days
Measurement mean shift: Catch drift before it causes failures
Failure pareto changes: Catch new failure modes early

Validation Checklist

Before approving a test for production volume:

Measurement distributions are normal (or expected shape)
Cpk > 1.33 for all critical measurements
GR&R < 30% for all measurements
No false failures in the last 200 units
Test catches known defect modes (verified with known-bad units)
Test cycle time meets throughput requirements
All stations produce equivalent results (cross-station correlation)

Production Test Validation at Scale