Production Test Validation at Scale with TofuPilot
Running a test that works for 10 prototypes is different from running it for 10,000 production units. At scale, you need to validate not just the product, but the test process itself. Are your limits correct? Is your test repeatable? Are you catching real defects without creating false failures? TofuPilot's analytics help answer these questions.
What Production Test Validation Means
Production test validation (PVT) answers three questions:
- Are the test limits correct? Limits that are too tight cause false failures. Limits that are too loose let defective units ship.
- Is the test repeatable? The same unit tested twice should give the same result.
- Is the test effective? Does it catch the defects it's supposed to catch?
Step 1: Analyze Measurement Distributions
After running your test on the first 100-200 production units, analyze the measurement distributions in TofuPilot.
import numpy as np
from tofupilot import TofuPilotClient
client = TofuPilotClient()
runs = client.get_runs(
procedure_id="FINAL-FUNCTIONAL-V3",
limit=200,
)
# Extract measurement values
vcc_values = []
for run in runs:
for step in run.get("steps", []):
for m in step.get("measurements", []):
if m["name"] == "vcc_3v3":
vcc_values.append(m["value"])
values = np.array(vcc_values)
print(f"N: {len(values)}")
print(f"Mean: {np.mean(values):.4f} V")
print(f"Std: {np.std(values, ddof=1):.4f} V")
print(f"Min: {np.min(values):.4f} V")
print(f"Max: {np.max(values):.4f} V")
print(f"Range: {np.max(values) - np.min(values):.4f} V")What to look for:
| Observation | Action |
|---|---|
| Distribution centered within limits, Cpk > 1.33 | Limits are well-set |
| Distribution skewed toward one limit | Investigate process bias |
| Distribution wider than expected | Tighten process or widen limits |
| Outliers beyond 3-sigma | Investigate those specific units |
| Bimodal distribution | Two populations, likely mixed lots |
Step 2: Calculate Process Capability
Cpk tells you how well your process fits within the test limits. TofuPilot provides the measurement data; you calculate the Cpk.
import numpy as np
def calculate_cpk(values, lsl, usl):
mean = np.mean(values)
std = np.std(values, ddof=1)
cpu = (usl - mean) / (3 * std)
cpl = (mean - lsl) / (3 * std)
cpk = min(cpu, cpl)
return cpk, mean, std
# From TofuPilot data
vcc_values = [3.30, 3.31, 3.29, 3.32, 3.30, 3.31, 3.29, 3.30, 3.33, 3.31]
lsl, usl = 3.25, 3.35
cpk, mean, std = calculate_cpk(vcc_values, lsl, usl)
print(f"Cpk: {cpk:.2f}")
print(f"Mean: {mean:.3f} V")
print(f"Std: {std:.4f} V")
if cpk >= 1.67:
print("Excellent process capability")
elif cpk >= 1.33:
print("Acceptable process capability")
elif cpk >= 1.0:
print("Marginal. Consider tightening process or widening limits")
else:
print("Poor capability. Action required")| Cpk | Meaning | DPMO (approx) |
|---|---|---|
| 2.0 | Excellent | 0.002 |
| 1.67 | Very good | 0.6 |
| 1.33 | Good | 63 |
| 1.0 | Marginal | 2,700 |
| 0.67 | Poor | 45,500 |
Step 3: Validate Test Repeatability (Gauge R&R)
Test the same unit multiple times to measure your test system's repeatability.
from tofupilot import TofuPilotClient
client = TofuPilotClient()
# Test the same unit 30 times
serial = "GRR-GOLDEN-001"
for i in range(30):
vcc = measure_voltage()
client.create_run(
procedure_id="GRR-FUNCTIONAL-V3",
unit_under_test={"serial_number": serial},
run_passed=True,
steps=[{
"name": "Power Rail",
"step_type": "measurement",
"status": True,
"measurements": [{
"name": "vcc_3v3",
"value": vcc,
"unit": "V",
"limit_low": 3.25,
"limit_high": 3.35,
}],
}],
)After 30 runs, analyze the spread in TofuPilot:
| Metric | Target | Meaning |
|---|---|---|
| GR&R % of tolerance | < 10% | Excellent measurement system |
| GR&R % of tolerance | 10-30% | Acceptable, monitor |
| GR&R % of tolerance | > 30% | Measurement system needs improvement |
If your test measurement varies by 0.04V on the same unit and your tolerance is 0.10V, that's 40% GR&R. Your test is too noisy to reliably distinguish good from bad units.
Step 4: Optimize Test Limits
Use production data to optimize limits. The goal: catch real defects without rejecting good units.
Tightening Limits
If Cpk > 2.0 and you're seeing no false failures, your limits might be too loose. Tighter limits catch marginal units before they become field failures.
Widening Limits
If Cpk < 1.0 and you're seeing false failures (units that fail test but work fine in the field), your limits are too tight for your current process capability.
Dynamic Limits
Some teams use TofuPilot data to set limits based on the production distribution:
# Calculate limits from production data
mean = 3.310
std = 0.015
# 4-sigma limits for Cpk = 1.33
dynamic_low = mean - 4 * std # 3.250
dynamic_high = mean + 4 * std # 3.370Step 5: Monitor at Scale
Once your test is validated, monitor it continuously. Scale introduces new variables:
- Different operators
- Different component lots across months
- Fixture wear over thousands of cycles
- Environmental changes (season, humidity)
- Equipment calibration drift
TofuPilot's trend dashboards surface these changes. Set up monitoring for:
- FPY trend: Catch yield drops within hours
- Cpk trend: Catch process capability degradation within days
- Measurement mean shift: Catch drift before it causes failures
- Failure pareto changes: Catch new failure modes early
Validation Checklist
Before approving a test for production volume:
- Measurement distributions are normal (or expected shape)
- Cpk > 1.33 for all critical measurements
- GR&R < 30% for all measurements
- No false failures in the last 200 units
- Test catches known defect modes (verified with known-bad units)
- Test cycle time meets throughput requirements
- All stations produce equivalent results (cross-station correlation)