Culling out bad from good parts is a necessary step in any manufacturing process, be it a spark plug for a car, the highchair for your infant child or a computer chip for your mobile device. Semiconductor testing moved from testing explicitly for Good parts to identifying Bad parts decades ago. The exponential growth in the number of circuits on a computer chip (a.k.a. Moore’s Law) motivated this shift. Instead of checking correct functionality of a circuit, tests detect bad parts by checking for correct circuit manufacturing. Such a test approach assumes a valid design and requires using fault models to generate test vectors to meet fault coverage targets. In the mid-’80s, test generation tools relied upon the stuck-at-fault model (s@0,s@1.) With a stable fab process and robust design, not too many parts fail. Hence, a semiconductor test pattern looks for that needle (faulty circuit) in a haystack (fully manufactured die.)

Belton Toy and I created an evaluation plan for comparing Weighted Random Patterns versus the conventional Stuck at Fault generated patterns. Essentially, we compared finding needles in haystacks and used an actual design as the haystack. Belton focused on the test generation comparison and I focused on the empirical comparison.

For Belton’s work he could assess the fault coverage achieved via the two methods and look deeper into the faults detected, undetected, undetectable faults and the redundant faults. In the academic world researchers compare test generation approaches using benchmark circuits; they compare s@ coverage, time to generate, # of patterns to achieve coverage. Over the last three decades, more sophisticated fault models for digital circuitry have evolved due to the subtle complexity in fault behaviors with advanced fab processing. I recall Belton poring over printouts from a dot-matrix printer of 11 by 17 inches, with the edges of the paper with tiny holes that worked on the gears. Once he had the circuit design he applied the test generation algorithms; then it was he and the results. My evaluation involved statistical comparisons and the logistics of collecting the data on actual product at wafer-level test (aka Sort.)

To start off this process, I met with one of the team members, Rubin; his responsibilities included supporting the empirical study. As I proceeded to ask a lot of technical questions, he quickly decided that I would need to meet more of the team. This meeting provided me a better understanding of the differences in the two approaches and provided some considerations for the data collection. It would require a two-pass test that would mean having the wafers be retested with the second test method. I had to determine the number of parts that would be tested with both methods–the sample size.

The majority of parts would pass both methods. Units that passed both parts held no interest; this study was all about the fails. We expected that the majority of failing parts would fail both WRP and S@; the unique fails would be of most interest. The diagram below conceptually captures the failing parts’ relationships to each other. The sample size– haystack needed to be large enough not only to have fails– needles; it needed to be large enough to distinguish any significant differences between the two results.

Statistics would be involved. I could estimate the parts expected to fail from previous test results; i.e. the parts that don’t yield. Semiconductor companies closely guard these figures; this is sensitive information because competitors can determine the cost of doing business. As IBM ASIC die often went into large multi-chip modules (9-100 die on a package,) the manufacturing facility required at minimum a 95% S@ fault coverage. An escape from wafer sort would be costly at the module level. WRP tests could detect additional failures at Sort. They could also miss failures detected with S@F. These possibilities motivated the sizing of the sample.

The empirical study required that both units receive both tests. The following steps to calculate the sample size would be needed:

- Estimate how many failing parts to expect; let’s choose 20%.
- A sample size will be determined from the number of failing parts needed.
- Determine the confidence level. Let’s choose 95%.
- Determine the type II error. Let’s choose 10%.
- Decide the resolution of difference to see between the various groups. Typically, the resolution is needed in parts per million (ppm). Let’s choose 200 ppm–0.02%.
- Applying two-sided interval formula results in a sample size of 60,392,052.

My, that’s a large number–most likely, it was smaller by two orders of magnitude, i.e 100,000’s of die. Such a sample size would need to reduce the confidence levels, resolution and maybe the type II error. Honestly I can’t recall the exact number; I’ll play with this later. As you can see, to compare needles in haystacks, the haystacks need to be fairly large- perhaps there exists a more precise analogy.

I reviewed the plan with Belton, our respective managers and the engineering team that created WRP. Other than having a statistician confirm the calculations, I don’t recall any major feedback. The next step—arrange for the empirical study to be done on the manufacturing test floor. Check back later for the next installment of the WRP Chronicles.

Have a productive day,

Anne Meixner

Dear Reader, What memory or question does this piece spark in you? Have you had to find a needle in a Haystack? Please share your comments or stories below. You too can write for the Engineers’ Daughter– See Contribute for more Information.

Additional Reading

Many sources exist which describe the prediction that Gordon Moore made for the scaling of integrated circuits. I found this one to be succinct.

For some basics on determining Sample Size go here.

Z-scores came up when I researched so I needed to determine that as well.

For this posting, I used this 2-interval formula for determining the sample size. statistical formula.