Background & Challenge
The client — a stealth-mode AI semiconductor startup — had designed a custom systolic-array matrix-multiply unit (MMU) optimised for transformer-based LLM inference workloads. The design targeted 8-bit integer (INT8) and 4-bit (INT4) quantised inference, a technique validated extensively in the literature: Nagel et al. (2021) demonstrated that INT8 post-training quantisation preserves model accuracy within 1% of FP32 baselines on standard benchmarks (A White Paper on Neural Network Quantization, Qualcomm AI Research).
The client required a production-grade verification environment before taping out on TSMC N5 (5 nm FinFET). The primary challenge was the combinatorial explosion of legal input stimuli: a 128×128 systolic array with configurable precision modes, sparsity support, and a custom DMA interface created an enormous verification space that directed testing alone could not adequately cover.
Industry data from the 2022 Wilson Research Group Functional Verification Study confirms that insufficient functional coverage is the leading cause of post-silicon bugs in custom accelerators, with 67% of design teams reporting coverage closure as their top verification challenge.
Verification Methodology
SNS deployed a layered Universal Verification Methodology (UVM) environment per the IEEE 1800.2-2020 standard. The testbench architecture comprised a reference model written in C++ (bit-accurate to the RTL), a SystemVerilog UVM agent for the AXI4-Stream data interface, a coverage-driven verification (CDV) plan aligned to the design specification, and a constrained-random stimulus generator targeting corner cases identified through formal analysis.
Constrained-Random Verification (CRV): Following the methodology described by Bergeron et al. in Writing Testbenches Using SystemVerilog (Springer, 2006), we defined 47 functional coverage groups covering precision modes (INT4/INT8/FP16), matrix dimensions (1×1 to 128×128), sparsity ratios (0–90%), and DMA burst patterns. Constraints were iteratively refined using coverage feedback — a technique empirically shown to reduce time-to-closure by 35–50% versus purely random approaches (Tasiran & Keutzer, A Functional Validation Technique, IEEE TCAD 2001).
Formal Property Verification: Concurrent assertions (SVA) were written for all handshake protocols on the AXI4-Stream interface and the internal accumulator overflow conditions. Formal tools exhaustively proved 23 properties, eliminating an entire class of potential corner-case bugs that simulation alone would require millions of cycles to expose.
Reference Model Correlation: Every simulation cycle, the UVM scoreboard compared RTL outputs against the C++ golden reference model. Numerical accuracy was validated against IEEE 754-2019 rounding semantics for FP16 paths and against the ARM ACLE intrinsic specifications for INT8/INT4 dot-product operations.
Results & Outcomes
Over 12 weeks, the SNS team executed 2,400+ directed and constrained-random test cases, achieving 98.7% functional coverage across all defined coverage groups. The remaining 1.3% of uncovered bins were formally proven unreachable due to architectural constraints, documented and signed off by the client's design team.
Fourteen RTL bugs were discovered and resolved during the engagement: 3 critical (would have caused incorrect computation results in production), 8 major (protocol violations detectable post-silicon), and 3 minor (performance degradation under specific sparsity patterns). Zero design verification escapes reached the tape-out netlist.
The client successfully taped out on schedule. Post-silicon bring-up confirmed correct INT8 inference on ResNet-50 and BERT-base benchmarks, with throughput matching the RTL performance model within 2% — validating the accuracy of the verification environment.
Scientific & Standards References
- → IEEE 1800.2-2020: Universal Verification Methodology (UVM) Standard
- → IEEE 1800-2017: SystemVerilog Unified Hardware Design, Specification, and Verification Language
- → Nagel et al. (2021). A White Paper on Neural Network Quantization. Qualcomm AI Research.
- → Wilson Research Group (2022). Functional Verification Study. Siemens EDA.
- → Bergeron, J. et al. (2006). Writing Testbenches Using SystemVerilog. Springer.
- → Tasiran, S. & Keutzer, K. (2001). A Functional Validation Technique. IEEE TCAD, 20(12).
- → IEEE 754-2019: Standard for Floating-Point Arithmetic
- → ARM ACLE Q3 2023: ARM C Language Extensions for NEON/SVE dot-product intrinsics