# An Approach to Single-Event Testing AMD-Xilinx Versal's Most Important New Architectural Feature: Network-on-Chip (NoC)



Sebastián E. García sebastian@slabs.com.ar



+54 911 4403 1725 Coronel Suárez, BA, Argentina. Gary M. Swift gary.m.swift@ieee.org



+1 408 679 3785 San José, CA.



Microelectronics Reliability and Qualification Workshop

February 7–9, 2023 | The Aerospace Corporation | El Segundo, CA



- What's a Network-on-Chip? NOT just marketing.
- Why NoC's? Revolution, not merely evolution.
- Versal VC1902's NoC Overview
  - Three blocks: two AXI4 endpoints (NMU, NSU) & packet switch (NPS).
- How to test NoC's? Best and ready: XRTC Test Platform
  - Custom DUT Board on Gen-4 apparatus.
  - 3 stripchart logs: error counters, CRAM upsets, and all power rails.
- How to test NoC's? Custom in-beam DUT design(s).
- Conclusion



### Concept: from wires to switched packet network.

- Replace signal wires with shared "virtual highways,"
  - $\circ\,$  Reduces area and power use,
- Communication via "info convoys,"
- Priority rules for sharing the highway,

### NoC Implementation in the Versal Architecture





 A NoC solves the cross-chip hi-speed communication problem for a large heterogeneous SoC plus programmable fabric (ACAP).

García, S.E. and Swift, G.M.

### NoC Implementation in the VC1902



(Diagram based on figure taken from [2])

NMU: NoC Master Unit (28 on PL, 10 on PS, 16 on AIEs; 54 total);

NSU: NoC Slave Unit (28 on PL, 6 on PS, 16 on AIEs; 16 in DDRMCs; 66 total);

NPS: NoC Packet Switch (24 on North-HNoC, 66 on South-HNoC, 56 on VNoCs; 146 total);

NCRB: NoC Clock Re-convergent Buffer (6 total);

**RPTR:** NoC Repeater (repeater block used in devices with large distance between NPSs);

DDRMC: DDR Memory Controller (tightly integrated in the NoC);

AIE: Artificial Intelligence Engine.

Radiation Test Consortiu



- A packet-switching network infrastructure enables efficient data movement (*vs.* wiring scaling challenges) in modern ACAP devices like Versal.
- *Ad-hoc* network topology, optimized for ACAP's components and use models. The endpoints implement AXI4 & AXI4-Stream interfaces.
- Atomic packet units are called *flits*.
  - All buffers and datapaths are flit-sized
  - $\circ~$  128-bit data payload (182-bit total physical width in each direction).
- The NoC is full duplex & has its own clock domain; max. data throughput is
  ~ 128 Gb/s at 1 GHz (1 clock cycle to move between adjacent buffers).



- *Wormhole* routing: Packets flow through the network like carriages of a train; it's possible for the header flit to arrive at the destination endpoint before the last flit has left the source endpoint.
- QoS scheme: 8 virtual channels per physical link, balancing latency and bandwidth requirements of the communication tasks.
- In Vivado, the NoC Compiler gathers conns. and configs. from the user's design, and outputs a piece of bitstream with conns. and parameters' values for the NoC and DDRMCs. This is the "NPI" part in the whole PDI image file.
- Data integrity: Flits are encoded with SECDED ECC, checked at endpoints. A parity bit is included in the destination ID; this is checked at switches. Errors are signaled via processor interrupts.

### NoC Configuration with an Independent Mini NoC (NPI)





García, S.E. and Swift, G.M.

### **Estimated bit inventories**

- In visible NPI (NoC/DDRMCs) 32-b status/control registers: 1,261,568 bits [4].
- 2. In hidden memory cells (estimated):
  - NoC routing: NPS' (switches) routing tables configuration
    - 5,376 bits per NPS in use (max. bound:  $784,896 = 146 \times 5,376$ ).
  - NoC buffers
    - Virtual channel buffers per NPS in HNoCs: 40,768 (= 4×7×8×182).
    - Virtual channel buffers per NPS in VNoCs: **29,120** (= 4×5×8×182).
    - Registered outputs per NPS: 728 (= 4×182).
    - Pipeline registers: 182 bits per unit in use.
  - NMU & NSU modules: TBD



### Testing the NoC – XRTC's Gen-4 Test Platform





- Shown is UltraScale KU060 DUT on Gen-4 tester in beam (2021). Gen-0 tested Virtex in 1999.
- Very flexible, fine-grained visibility, high-speed test apparatus.
- Modular passive backplane based: DUT board, FuncMon (vector exerciser/logger) board on back.

García, S.E. and Swift, G.M.

### XRTC's Custom VC1902 DUT Board





• Now ready for beam. Shown is Versal DUT card in benchtop check out using ConfigMon daughter module.

García, S.E. and Swift, G.M.



- 1. Number of involved NPI configuration bits? Number of critical ones?
- 2. How to scrub the NPI regs after errors caught by an NPI-scan? (XilSEM)
- 3. How to put the NoC into a "quiescent" state, in order to scrub the NPI registers' bits? NoC\_RESET bit [5]? NoC power-down?
- 4. Possible ways to rewrite on NPI registers? (besides a master reset of the whole device by POR\_B assertion)
- 5. How many different ways can the beam produce a lockup, including deadlocks?

- Test goals:
  - Separate cross-sections for master and slave endpoints, and packet switches.
  - Understand the individual sources of different upset signatures.
- Test constraints:
  - ∘ High beam efficiency  $\Rightarrow$  Max. target area  $\Rightarrow$  Max. # of NoC elements.
- Test approach:
  - Start as-simple-as-possible.
    - Direct traffic (1-input port, 1-output port) through switches.
    - Shortest route: One master, 2 switches, 1 slave.
  - Capture error counts and signatures.
    - Upset buffer bits  $\Rightarrow$  Bad packet  $\Rightarrow$  PMC interrupt.
    - Latency increases.
    - Traffic lockups.
    - Others.
  - Investigate mitigation and recovery mechanisms.







• Deterministic latency is considered in the comparison.



### NoC complexity will require:

- Capable test apparatus.
  - XRTC's Gen-4 test infrastructure was designed to handle as much complexity as possible.
- Complex HDL test designs (even for fundamental NoC components)
- Multiple tests and multiple test trips
  - When testing goes as expected, next test will add complexity.
  - When testing doesn't go well, then troubleshoot and re-test.

## Thank you!

#### References



- AMD, Versal ACAP Programmable NoC and Integrated Memory Controller, PG313 (v1.0), 2022.12.14.
- I. E. Lang, Worst-Case Latency Analysis for the Versal Network-on-Chip, MS Thesis, University of Waterloo, Canada, 2021.
- I. A. Swarbrick and D. P. Schultz, Peripheral Interconnect for Configurable Slave Endpoint Circuits, US Patent 10621129, 2020.04.14.
- [4] AMD, NoC and Integrated Memory Controller NPI Register Reference, AM019 (v1.0), 2021.09.09.
- [5] —, Versal ACAP Technical Reference Manual, AM011 (v1.5), 2022.12.16.
- [6] —, Versal AI Core Series Data Sheet: DC and AC Switching Characteristics, DS957 (v1.4), 2022.05.03.
- [7] I. A. Swarbrick, D. Gaitonde, S. Ahmad, B. Gaide, and Y. Arbel, Network-on-Chip Programmable Platform in Versal ACAP Architecture, FPGA Conference, 2019.
- [8] AMD, Xilinx Standalone Library Documentation BSP and Libraries Document Collection, UG643 (v2022.2), 2022.10.19.
- [9] D. P. Schultz, I. A. Swarbrick, and D. Nagendra, Circuit for and Method of Configuring and Partially Reconfiguring Function Blocks of an Integrated Circuit Device, US Patent 10680615, 2020.06.09.

## **Backup Slides**

#### **NoC Master Unit (NMU)**





(Figure taken from [1])

### **NoC Slave Unit (NSU)**





(Figure taken from [1])

### NPI interface/interconnect: The big picture



NPI: NoC Programming Interface (a.k.a. NoC Peripheral Interconnect). This is an auxiliary and independent NoC (32-bit RD/WR [5], 300 MHz [6]) to access config./status registers for the main NoC, DDRMCs, MGTs, etc. (block type details on [5], Chapter 21). It's memory-mapped on PMC's processing unit.



An Approach to Single-Event Testing AMD-Xilinx Versal's NoC

NPI interface tree (independent NoC for accessing main NoC's registers)





- Paper [7] mentions a "tree-structured peripheral bus" implemented to program the NoC.
- UG [8] says: "This memory (NPI registers) is physically distributed throughout the device".



### NPI's place on device's floorplan (as per patent)

| AIE 1132<br>AIE 1132           | AIE 1132 | AIE 1132 | AIF 1122 | AIE 4422     | ALL 1102          | AIE 1132 | AIE 1132 | AIE 1132     | AIE 1132    | AIE 1132          | AIE 1132        | AIE 1132  | AIE 1132                       | AIE 1132          | AIE 1132    | AIE 1132   | AIE 1132 | AIE 1132   | AIE 1132          | AIE 1132   | AIE 1132 | AIE 1132          | AIE 1132   | AIE 1132      | AIE 1132      | AIE 1132      | AIE 1132  | AIE 1132 | AIC 1120 |
|--------------------------------|----------|----------|----------|--------------|-------------------|----------|----------|--------------|-------------|-------------------|-----------------|-----------|--------------------------------|-------------------|-------------|------------|----------|------------|-------------------|------------|----------|-------------------|------------|---------------|---------------|---------------|-----------|----------|----------|
| •••••                          |          |          | -        | -            |                   | 100      | (112     | 2            |             | 10                | CURG            | 122       |                                | NoC               | 1118<br>NPI | 112<br>CLX | 1122     |            |                   |            | RCI      | K TT              | 2          |               |               | 1             |           |          |          |
| TX/RX<br><u>1104</u>           |          |          |          |              |                   |          | LR<br>10 |              | CFRAME 1124 | PLR<br><u>110</u> |                 |           |                                | PLR<br><u>110</u> |             |            |          |            | PLR<br><u>110</u> |            |          | PLR<br><u>110</u> |            |               | TX/RX<br>1104 |               |           |          |          |
| TX/RX<br>1104<br>TX/RX<br>1104 |          |          |          | G 1106       |                   | 1        | LR<br>10 |              | E 1124      |                   | PL<br><u>11</u> | 0         |                                |                   |             | PL<br>11   | 0        |            |                   | PLI<br>11( | 2        |                   | PLR<br>110 |               |               | TX/RX<br>1104 |           |          |          |
|                                |          |          |          | ROUTING 1106 |                   | Ρ        | LR<br>10 |              | CERAM       | PLR<br><u>110</u> |                 |           |                                |                   | PLR<br>110  |            |          |            |                   | PLR<br>110 |          | <u>.K 11</u>      | PLR<br>110 |               |               | TX/RX<br>1104 |           |          |          |
|                                | X/R      |          |          |              | PLR<br><u>110</u> |          |          | PLR 110      |             |                   |                 | NoC 11120 | PLR<br><u>110</u><br>RCLX 1122 |                   |             |            |          | PLF<br>11( |                   |            | PL<br>11 | .R<br>0           | NPI 1120   | TX/RX<br>1104 |               |               |           |          |          |
| CPM<br>1108                    |          |          |          |              |                   |          |          | 9 PLR<br>110 |             |                   |                 |           |                                | PLR<br><u>110</u> |             |            |          |            | PLR 1138          |            |          | PLR<br>110        |            |               |               | "X/I<br>11(   |           |          |          |
| 1102<br>PS<br><u>1116</u>      |          |          |          |              |                   |          | 1124     |              | PL          | 2                 |                 |           |                                | PL<br>11          | 0           |            |          | PLF<br>11( |                   | HDIO 1126  | PL<br>11 |                   |            | TX/RX<br>1104 |               |               |           |          |          |
|                                |          |          |          |              | <u>6</u>          |          |          |              | CFRAME 112  |                   | PLR<br>110      |           |                                |                   |             | PLR<br>110 |          |            |                   | PLF<br>110 |          | HDIO 1126         | PL<br>11   |               |               |               | ГХ/<br>11 |          |          |
|                                |          |          |          |              | <b>,</b> ,        |          |          |              |             |                   |                 |           |                                |                   | N           | 120<br>C 1 | 118      | TT         |                   |            | 11       | <br>TT            |            |               | <u>т</u> т    |               |           | 11       | -        |
| Ш                              |          | H        |          |              | Ħ                 | Ħ        | H        | Ħ            |             | H<br>10           | H.              | Ų         | 111                            | Щ                 |             | 114        | ╢        | H          |                   |            |          |                   |            |               |               | Ħ             | ╟         |          | +        |

### Features visible in Vivado's device view (vc1902)





- Four VNoC (full) columns clearly identified, each immediately adjacent to an unlabeled column (NPI?).
- Two HNoC rows (NPI infrastructure there, too?).
- "PNoC" block (tightly coupled to the PS) contains an "NPI\_NIR" box. This is the NPI root node [1].

García, S.E. and Swift, G.M.



- CMT\_MMCM module type: 2688 bits (12 modules; 7 registers/module)
- CMT\_XPLL module type: 5376 bits (24 modules; 7 registers/module)
- DDRMC\_DDR4\_XRAM module type: 399616 bits (4 modules; 3122 registers/module)
- DDRMC\_LPDDR4\_XRAM module type: 390144 bits (4 modules; 3048 registers/module)
- DDRMC\_MAIN module type: 9216 bits (4 modules; 72 registers/module)
- DDRMC\_NOC module type: 31744 bits (4 modules; 248 registers/module)
- DDRMC\_UB module type: 1408 bits (4 modules; 11 registers/module)
- NOC\_NCRB module type: 5760 bits (6 modules; 30 registers/module)
- NOC\_NMU module type: 115776 bits (54 modules; 67 registers/module)
- NOC\_NPS module type: 210240 bits (146 modules; 45 registers/module)
- NOC\_NSU module type: 89600 bits (50 modules; 56 registers/module)

#### TOTAL NUMBER OF BITS: 1,261,568

NOTE:

- 1. Register width is 32-bit in any case.
- 2. Above numbers were taken from [4].





- Each switch using only 1 input and 1 output port; no switch repeated.
- To achieve this, we'd need to force the following settings:
  - Specify the particular NPS instances to be used;
  - $\circ~$  On each NPS, specify the routing table for the input port in use.