### Plan for new device's R&D using Versal for the experiments at KEK

Yun-Tsung Lai

**KEK IPNS** 

ytlai@post.kek.jp

Workshop on Realtime Machine Learning

11<sup>th</sup> Apr., 2024





### Outline

- Application of FPGA in HEP experiments
  - DAQ, L1 Trigger, HLT systems
- Versal project @ KEK IPNS, Collider Electronic Forum:
  - Introduction & Overview
  - Progress on functionality study: PAM4, PCIe, AI engine, DPU
  - HLS and ML inference study plan
  - Algorithm implementation
- Summary & To do

## Application of FPGA in HEP experiments

• Here we use Belle II Central Drift Chamber (CDC) as an example.



## Application of FPGA in HEP experiments (cont'd)

**Data Link** 

**Data Link** 

**FPGA** 

(FEE)

• Hardware acceleration:

**High-level** 

- Not only CPU, but also GPU and FPGA.
- Acceleration on softwarebased calculation.

- FPGA FPGA transmission:
  - Optical link with FPGA MGT and optical modules.
  - Non-Return-to-Zero (NRZ).
  - Different encoding based on protocol design purposes.
     e.g. 8B/10B and 64B/66B.
    - <10 Gbps for DAQ.
    - <25 Gbps for TRG.

- Strong FPGA devices with:
  - Larger number of cells.
  - Larger data bandwidth.

are critical for the usage in:

- **TRG**: complicated algorithm implementation.
- **DAQ**: collect and process large data.

**FPGA** 

(Trigger)

**FPGA** 

(Readout)

- **FPGA server transmission:** 
  - Data transmission and system slow control.
  - GbE, PCI-express, VME, etc.
  - PCI-Express is the most popular one nowadays: PCIe40 in ALICE, LHCb, and Belle II.

## DAQ system

 Readout: PCIe has been the most popular solution for electronics → server interface.





### PCIe40: PCIe Gen3







## L1 Trigger system

- Provide L1 trigger signal to DAQ using FPGA chips for real-time processing on detector raw data.
- Reason for L1: Buffer storage are not enough for all data due to high event rate and short bunch spacing in collider experiment.





#### Yun-Tsung Lai (KEK IPNS) @ Workshop on Realtime ML

6: 40 10 7: 44 11 8: 48 12

### Trigger device for Belle II and ATLAS

- For TRG purpose, complicated algorithm is implemented to process detector raw data in real-time. Utilization of machine-learning in the logic design became a trend recently.
- Strong FPGA with large resource: improve the logic itself, resolution of triggering, reduce the background rate, and perform everything within a latency limit.



### **Belle II UT3**



### ATLAS Muon Trigger processor



Xilinx Virtex-6 xc6vhx380t, xc6vhx565t 11.2 Gbps with 64B/66B

Xilinx UltraScale XCVU080, XCVU160 25 Gbps with 64B/66B

Xilinx UltraScale+ XCVU13P XCZU5EV GTH,GTY: 16.8 Gbps with 64B/66B

#### 2024/04/11

### HLT

- HLT: Computing servers with reconstruction software.
  - In Belle II: HLT software = offline software.
- How about the options other than CPU?
  - GPU? FPGA for hardware acceleration?



### source: Qi-Dong Zhou, Shandong Univ.



### Versal project @ KEK IPNS

- "Collider Electronics Forum": A new platform for electronics associated technical communication and common device R&D in Japanese HEP community.
  - KEK IPNS: E-sys, Belle II, Energy Frontier groups.
  - Experiment groups (Belle II, ATLAS, ALICE, nuclear physics) in Japan.
- We purchased a few evaluation kits of the Xilinx Versal series ACAP for joint study.
  - Plan: Common and general studies on the new technologies for future electronics device's R&D. Now we plan to use Versal for L1 TRG, DAQ or HLT purpose.



#### 2024/04/11

### Versal project: General plan and roadmap

- Our goal: R&D of a new general FPGA device using the Versal ACAP.
  - A L1 TRG, DAQ, or HLT device, and also general for different experiments.
  - One clear target is UT5 for L1 TRG of both Belle II and ATLAS.



## New technology in Versal FPGA: PAM4, PCIe, AI engine

- Pulse Amplitude Modulation (PAM4):
  - Four distinct voltage levels to break through the limit of Non-Return-to-Zero (NRZ), which is ~25 Gbps.
  - Using VPK120 to study it.
  - Suitable for high-speed link in L1 TRG. Hope to be pioneer to use it in future TRG board.
- PCle Gen5:
  - PCIe has been popular option in HEP.
    - ALICE, LHCb and Belle II has been using PCIe40 (Gen3).
  - Study the properties of newer generation of PCIe is beneficial for the future readout device's development.
  - Using VPK120.
- Al engine: A new technology for data processing.
  - Help for our algorithm construction in TRG.
  - C programmable.
  - Together, we study many options of HLS and ML inference in FPGA, and their performance in different TRG algorithms.
  - Will use VCK190.

2024/04/11







Limit: ~25 Gbps

PAM4 (Pulse Amplitude Modulation)



Four distinct voltage levels. Two bits per clock cycle.

4 levels







#### 2024/04/11

Yun-Tsung Lai (KEK IPNS) @ Workshop on Realtime ML

## Test bench setup @ KEK E-sys group

- The test bench of VPK120 has been built at E-sys group and released to our members for dedicated studies.
- VCK190 has also arrived at KEK in March. Preparation study is ongoing and will be ready soon.
- Special thanks to Mathis Maurice, internship in E-sys group in 2023 summer, for helping this VPK120 preparation work!

PC side: PCIe Gen5 x16 slot

VPK120 test bench: 2023 summer

PC side: PCIe Gen4 x8 slot







### Firmware making with Versal: PS, CIPS and NOC

- In our experience, FPGA firmware making is:
  - Writing HDL codes and using IPcore to control all the **Programmable Logic (PL)**.
- But Versal is an ACAP containing lots of sub-systems together with the FPGA.
  - Not only PL, but also **Processor System (PS)**.
  - Firmware making tends to rely on the automatic block design rather than the traditional code-writing way.
  - For now, we still have limited understanding in PS.



### Latency measurement with NOC



### Versal transceivers: GTYP and GTM

- GTYP: PCIe 5.0 (16) and FMC+ (8)
  - 1.25 ~ 32.75 Gb/s.
  - Various encoder supported.
- GTM: QSFP-DD (8\*2)
  - NRZ:
    - 9.5 ~ 15, 19 ~ 29 Gb/s.
  - PAM4:
    - 19 ~ 30, 38 ~ 60 Gb/s
    - 76 ~ 112 Gb/s: "Half density mode" by combining two lanes.
    - No encoding is supported. Need to be make them manually in RTL.
- Our test setup for transceiver study:





## PAM4 56 Gbps with GTM IBERT, QSFPDD loopback

- PAM4, 56 Gbps per lane. QSFPDD loopback module.
  - Parameter tuning on cursor position and termination voltage, etc, is necessary to have stable transmission (0 bit error).

DesignCon 2019 Enabling IBIS-AMI Simulations for Systems Containing PAM4 Retimers at 112Gbps



### Plan

- Further study with realistic QSFPDD module and MPO-16 is ongoing.
  - Much higher BER (~10<sup>-6</sup>)
  - Forward-Error-Correction will be implemented in our protocol.
  - Also other types of PAM4-supported modules: FireFly, etc.



### OSFP-DD-SR8



#### 2024/04/11

### Protocol development and connection test

- Both 8B/10B and 64B/66B (sync. gearbox) are tested with GTM.
- Raw mode with No encoding: A new generalized protocol has been also made.
  - Similar logic to my Belle II TRG protocol design.
  - (de)scrambler for DC balance.
  - Tested to be stable for both NRZ and PAM4.
- Using this new generalized protocol, connection test (25 Gbps x4, NRZ) between Belle II UT4 and VPK120 has been also tested. Stable in few hours.
  - Will test with ATLAS muon board soon.

2024/04/11



### Latency for Versal GTYP and UltraScale(+) GTY

- Latency is a big concern for L1 TRG system.
  - Since the beginning of Belle II TRG preparation, we have been studying latency reduction in data links.
  - Now we have 25 Gbps running.
- The following are the simulation values from Xilinx website with internal encoder.
  - UT4: Virtex UltraScale
- Measured latency in **bold**: Based on the Belle II TRG protocol.

|                      | Raw (UI) | Raw +<br>Async.<br>64B/66B<br>(UI) | 10 (<br>Rav<br>(ns) |            | 10 G<br>64B/<br>(ns)                                              | •        | 25 C<br>Raw<br>(ns) | Sbps,<br>/ |    | Gbps,<br>8/66B |
|----------------------|----------|------------------------------------|---------------------|------------|-------------------------------------------------------------------|----------|---------------------|------------|----|----------------|
| Versal GTYP<br>64/64 | 1127     |                                    |                     | Typi       | cal va                                                            | alue for | 1 lir               | nk in the  | ح  |                |
| Versal GTYP<br>64/32 | 688      |                                    |                     | <b>.</b> . | ical value for 1 link in the<br>nt Belle II TRG: <b>50~100 ns</b> |          |                     |            |    |                |
| UT4 GTY<br>64/64     | 768      | 1458                               | 77                  | 115        | 146                                                               | 147      | 31                  | 33         | 58 | 58             |
| UT4 GTY<br>64/32     | 414      | 990                                | 41                  | 90         | 99                                                                | 122      |                     |            |    |                |

- If we adapt to use Versal GTM: Larger latency will be introduced.
- The following are the max. simulation values from Xilinx website with No encoding.
  - Measured latency in **bold**: Based on our generalized protocol.
- For the same setup, latency in term of clock-cycle is basically the same.
  - Higher speed is preferred as the processing latency is much smaller.
  - In general, latency of GTM is much larger that that of UltraScale(+) GTY or so.
- If we use GTM, just go with PAM4 with > 50 Gbps.

| Versal GTM | Unit Interval<br>(UI) | 10 Gbps<br>(ns) | 25 Gbps<br>(ns) | 56 Gbps<br>(ns) | 106 Gbps<br>(ns) |
|------------|-----------------------|-----------------|-----------------|-----------------|------------------|
| NRZ 64b    | 5833                  | 583 <b>640</b>  | 233 <b>256</b>  |                 |                  |
| NRZ 160b   | 4964                  | 496 <b>730</b>  | 198 <b>237</b>  |                 |                  |
| PAM4 160b  | 2957                  |                 |                 | 53 <b>97</b>    |                  |
| PAM4 256b  | 3233                  |                 |                 | 57 <b>133</b>   |                  |
| PAM4 320b  | 3095                  |                 |                 |                 | 29 <b>66</b>     |
| PAM4 512b  | 3690                  |                 |                 |                 | 35               |

### **PCIe-CPM** test

- CPM-PCIe example from Xilinx: XTP712
  - CPM: building block design for PCIe with integrating DMA, CIPS, NOC, etc.
  - PCIe Gen4 x8: GTYP links are up. 16 Gbps per lane.
- Driver software: QDMA, also a Xilinx IP. ٠
- Data exchange test with the QDMA • software:
- We spent much time in mine-sweeping ٠
  - Will start to make real protocol for event data readout purpose.
    - Similar to the one in Belle II DAQ

| lardwa                                                                                                                        | are          |               | ? _ 🗆                      | L X DI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | DRMC - DDRMC_1          | DDRMC -       | DDRMC_2 ×           |          |                     |                      |                                                            |                |                |     |
|-------------------------------------------------------------------------------------------------------------------------------|--------------|---------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|---------------|---------------------|----------|---------------------|----------------------|------------------------------------------------------------|----------------|----------------|-----|
| $\mathbf{Q} \mid \Xi \mid \Leftrightarrow \mid \boldsymbol{\beta} \mid \models \mid \gg \mid \equiv \mid \qquad \diamondsuit$ |              |               |                            | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Status Margins Analysis |               |                     |          |                     |                      |                                                            |                |                |     |
| Name Status                                                                                                                   |              |               |                            | Status   Status Registers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                         |               |                     |          |                     | ,                    |                                                            | L Chart (Fran  | - 01 - 0       |     |
| v Ng Quad_102 (4)                                                                                                             |              |               |                            | <b>^</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Status   Status         | Registers     |                     |          |                     |                      | Table   Chart (Freq 0) - Left Aligned   Chart (Freq 0) - C |                |                |     |
|                                                                                                                               | Pd CH_0      |               | 15.987 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Name: DDRMC_2           |               |                     |          |                     | Q B                  | €   <b>\$</b>   P                                          | S Read Mode ✓  | Simple Pattern | ı v |
|                                                                                                                               | Pc] CH_1     |               | 15.977 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     | Name     |                     | Left Margin (tap     | s) Center                                                  | Point          |                |     |
|                                                                                                                               | ₽d CH_2      |               | 15.97 Gbps                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     | ✓ Freq   | 0                   |                      |                                                            |                |                |     |
|                                                                                                                               | Pd CH_3      |               | 15.954 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          |                     | ~ Bj                 | te 0                                                       |                |                |     |
|                                                                                                                               | ✓ № Quad_10: | 3(4)          |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Gate Tracking St        |               |                     |          |                     |                      | Nibble 0                                                   |                | 61             |     |
|                                                                                                                               | P⊲ CH_0      |               | 15.973 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Message:                | No errors o   | detected during cal | ibrat    | ion.                | Nibble 1<br>V Byte 1 |                                                            |                | 61             |     |
|                                                                                                                               | Pig CH_1     |               | 15.963 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Error:                  |               |                     |          |                     |                      |                                                            |                |                |     |
|                                                                                                                               | ₽d CH_2      |               | 15.98 Gbps                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          |                     |                      | Nibble 0                                                   |                | 62             |     |
|                                                                                                                               | № CH_3       |               | 15.984 Gbps                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          |                     |                      | Nibble 1                                                   |                | 62             |     |
|                                                                                                                               | 1 DDRMC_1 (L | PDDR4) (x0y0) | PASS                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Calibration             |               |                     |          |                     | ~ B                  | te 2                                                       |                |                |     |
|                                                                                                                               | DDRMC_2 (L   | PDDR4) (x1y0) | PASS                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Stage                   |               |                     |          | Status              |                      | Nibble 0                                                   |                | 62             |     |
|                                                                                                                               | DDRMC_3      |               | DISABLED                   | ~                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Stage<br>SCAL STAGE.01  |               |                     |          | Status ^            |                      | Nibble 1                                                   |                | 62             |     |
| _                                                                                                                             |              |               |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          | S Pass              | ~ B                  | te 3                                                       |                |                |     |
| Properties 2 – C C X                                                                                                          |              |               |                            | CAL_STAGE.02_F0_MEM_INIT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                         |               |                     | Nibble 0 |                     | 61                   |                                                            |                |                |     |
|                                                                                                                               |              |               |                            | CAL STAGE.04 F0 DQS GATE CAL SPASS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                         |               |                     | Nibble 1 |                     | 60                   |                                                            |                |                |     |
|                                                                                                                               |              |               | $\leftarrow$ $\rightarrow$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          | % Pass              | ~ B)                 | te 4                                                       |                |                |     |
| Select an object to see properties                                                                                            |              |               |                            | \CAL_STAGE.05_F0_WRITE_LEVELING             \CAL_STAGE.06_F0_READ_D0_CAL             \CAL_STAGE.07_F0_NRITE_D0_D0L_CAL             \CAL_STAGE.07_F0_NRITE_D0_D0L_CAL             \CAL_STAGE.07_F0_NRITE_D0_D0L_CAL             \CAL_STAGE.07_F0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_D0_NRITE_ |                         |               |                     |          | Nibble 0            |                      | 64                                                         |                |                |     |
|                                                                                                                               |              |               |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          | Nibble 1            |                      | 64                                                         |                |                |     |
| cl Con<br>Q   🗶                                                                                                               | · · · 9      | es Serial I   | /O Links ×                 | Serial I/O Se                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | RX PLL Status           | TX PLL Status | Loopback Mode       |          | Termination Voltage | RX Comr              | ion Mode                                                   | TXUSERCLK Freq | RXUSERCLK Fre  | q   |
|                                                                                                                               |              |               |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                         |               |                     |          |                     |                      |                                                            |                |                |     |
| · ~                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | ]                       |               | User Design         | ~        | 800mv ~             |                      |                                                            |                |                |     |
| n v                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             |                      |                                                            | 499.512        |                | 292 |
| 1 V                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             | Program              |                                                            | 499.512        |                | 292 |
| n v                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             | ~                    |                                                            | 499.072        |                | 779 |
| n v                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             | Program              |                                                            | 499.292        |                | 438 |
| ı v                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             |                      |                                                            | 498.413        |                | 365 |
| 1 V                                                                                                                           | 0            | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv ~             | Program              |                                                            | 498.413        |                | 585 |
|                                                                                                                               |              | Inject        | Reset                      | Reset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Locked                  | Locked        | User Design         | ~        | 800mv v             | Program              | mable 🗸                                                    | 498.486        | 498.           | 560 |
| ı v                                                                                                                           |              | - inject      |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Locked                  | Locked        |                     |          |                     | Program              |                                                            | 498,560        |                | 512 |

|    | [root@cef01 linux-kernel]# ./bin/dma-ctl dev list                                        |
|----|------------------------------------------------------------------------------------------|
|    | qdma02000 0000:02:00.0 max QP: 8, 0~7                                                    |
|    | qdma02001 0000:02:00.1 max QP: 0, -~-                                                    |
|    | qdma02002 0000:02:00.2 max QP: 0, -~-                                                    |
|    |                                                                                          |
|    | [root@ccf01 linux-kernel]# ./bin/dma-ctl qdma02000 q add idx 0 dir bi                    |
| ıy | dma-ctl: Warn: Default mode set to 'mm'                                                  |
|    |                                                                                          |
|    | qdma02000-MM-0 H2C added.                                                                |
|    | qdma02000-MM-0 C2H added.                                                                |
|    | Added 1 Queues.                                                                          |
|    | [root@cef01 linux-kernel]# ./bin/dma-ctl qdma02000 q start idx 0 dir bi                  |
| 7  | dma-ctl: Info: Default ring size set to 2048                                             |
| ۲. | 1 Queues started, idx $0 \sim \overline{0}$ .                                            |
|    | 1 Queues started, idx $0 \sim 0$ .                                                       |
|    | <pre>[root@cef01 linux-kernel]# ./bin/dma-to-device -d /dev/qdma02000-MM-0 -s 32</pre>   |
|    | size=32 Average BW = 177.377688 KB/sec                                                   |
|    | <pre>[root@cef01 linux-kernel]# ./bin/dma-from-device -d /dev/qdma02000-MM-0 -s 32</pre> |
|    |                                                                                          |
|    | size=32 Average BW = 132.445391 KB/sec                                                   |
|    | [root@cef01 linux-kernel]# ./bin/dma-ctl qdma02000 q stop idx 0 dir bi                   |
|    | Stopped Queues 0 -> 0.                                                                   |
|    | [root@cef01 linux-kernel]# ./bin/dma-ctl qdma02000 q del idx 0 dir bi                    |
|    | Deleted Queues 0 -> 0.                                                                   |
|    |                                                                                          |

- The Xilinx PCIe-CPM IP provides two modes:
  - Memory-Map (MM)
  - Streaming
- Next, we started to make the firmware/software for continuous event readout for realistic experimental purpose.



### PCIe-CPM firmware: Event readout using MM mode

• New firmware based on MM mode.



### PCIe-CPM firmware: Event readout using MM mode



### PCIe-CPM firmware: Event exchange using MM mode

- A data exchange flow is also made for firmware and software.
- 1 event in 1 event out.
- In order to test the algorithm core logic to be implemented in Versal kits.



### Plan

- Further optimize the design and measure the throughput.
- Try to use ST mode: Consulting with Xilinx engineers.

## Al engine



As VCK190 arrived at KEK in 2024 March, we started

### Al engine: test

- The work flow of building up a firmware with AI engine has been studied.
  - PL  $\rightarrow$  Al engine  $\rightarrow$  PL.
- Some logics were tested.
  - Arithmetic calculation
  - FIR filter
  - leNet





## Vitis-AI with DPU

- VCK190 has another feature of Deep Learning Processor Unit (DPU), which is a a configurable computation engine dedicated to convolutional neural networks.
- The design flow does not involve Vivado for PL design. The device is utilized with a small operation system like a server, and works can be executed in it.
  - A higher-level application.



### Vitis-AI with DPU: test

The environment with docker and DPU setup for VCK190 has been ready.



### Vitis-AI within docker

# Processed image shown on the display



Image processing in DPU

| examples/vai library/samples/class   | fication/images/002_JPEG                                                                                                                             |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| XAIEFAL: INFO: Resource group Avai   |                                                                                                                                                      |
| XAIEFAL: INFO: Resource group Stat:  |                                                                                                                                                      |
| XAIEFAL: INFO: Resource group Gene   |                                                                                                                                                      |
| WARNING: Logging before InitGoogle   |                                                                                                                                                      |
| I1119 10:19:08.931777 1536 demo.h    | <pre>op:I193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/imag</pre>                                             |
| es/002.JPEG                          |                                                                                                                                                      |
|                                      | s_result.hpp:24] r.index 109 brain coral, r.score 0.999749                                                                                           |
|                                      | s_result.hpp:24] r.index 955 jackfruit, jak, jack, r.score 0.000158421                                                                               |
|                                      | s_result.hpp:24] r.index 973 coral reef, r.score 5.828e-05<br>s result.hpp:24] r.index 390 eel, r.score 1.66975e-05                                  |
|                                      | s result.hpp:24] r.index 590 eet; r.score 1.009/16/05<br>s result.hpp:24] r.index 50 electric ray, crampfish, numbfish, torpedo, r.score 7.88734e-06 |
| 11115 10115.001552502 1550 proces.   |                                                                                                                                                      |
| I1119 10:19:08.932798 1536 demo.h    | <pre>op:1193] batch: 1 image: /home/root/Vitis-AI/examples/vai library/samples/classification/imag</pre>                                             |
| es/002.JPEG                          |                                                                                                                                                      |
|                                      | s_result.hpp:24] r.index 109 brain coral, r.score 0.999749                                                                                           |
|                                      | s_result.hpp:24] r.index 955 jackfruit, jak, jack, r.score 0.000158421                                                                               |
|                                      | s_result.hpp:24] r.index 973 coral reef, r.score 5.828e-05                                                                                           |
|                                      | s_result.hpp:24] r.index 390 eel, r.score 1.66975e-05<br>s_result.hpp:24] r.index 5 electric ray, crampfish, numbfish, torpedo, r.score 7.88734e-06  |
| 11119 10:19:00:955192 1550 process   |                                                                                                                                                      |
|                                      |                                                                                                                                                      |
|                                      | examples/vai_library/samples/classification# ./test_video_classification_resnet18_pt (                                                               |
|                                      | cv/4_5.2-r0/git/modules/videoio/src/cap_gstreamer.cpp (1081) open OpenCV   GStreamer v                                                               |
| annot query video position: status=0 |                                                                                                                                                      |
| XAIEFAL: INFO: Resource group Avail  |                                                                                                                                                      |
| XAIEFAL: INFO: Resource group Static | is created.                                                                                                                                          |
| XAIEFAL: INFO: Resource group Generi | c is created.                                                                                                                                        |
| WARNING: Logging before InitGoogleLo | gging() is written to STDERR                                                                                                                         |
| I1119 10:18:38.351377 1517 demo.hpp  | :752] DPU model size=224x224                                                                                                                         |
| I1119 10:18:38.392418 1517 demo.hpp  | :752] DPU model size=224x224                                                                                                                         |
| I1119 10:18:38.433463 1517 demo.hpp  |                                                                                                                                                      |
| I1119 10:18:38.474534 1517 demo.hpp  |                                                                                                                                                      |
| I1119 10:18:38.515609 1517 demo.hpp  |                                                                                                                                                      |
| I1119 10:18:38.556959 1517 demo.hpp  |                                                                                                                                                      |
| I1119 10:18:38.598032 1517 demo.hpp  |                                                                                                                                                      |
|                                      |                                                                                                                                                      |
| I1119 10:18:38.639214 1517 demo.hpp  | :752] DPO model S120=224X224                                                                                                                         |

root@xilinx-vck190-20222:~/Vitis-AI/examples/vai library/samples/classification# ./test ipeg classification resnet18 pt ~/Vitis-AI/

Camera video processing in DPU

2024/04/11

Yun-Tsung Lai (KEK IPNS) @ Workshop on Realtime ML

0 -t 8 warning: C

### Algorithm making in FPGA: HLS, ML, AI engine

- Next step, we have many algorithms from Belle II, ATLAS, or so, to play in Versal kits.
  - Before that, let's think about the methodologies to do so.
- Considering algorithm implementation:
  - HDL logic in firmware.
  - HLS: software  $\rightarrow$  firmware.
  - ML inference
  - Al engine.

Depend on the different targets, our selection on FPGA differs. A strong FPGA? ACAP with AI engine? DPU?

- Not only the hls4ml, HLS tools has much more for ML and non-ML application.
  - Similarly, Versal AI engine requires a different design flow to make software/firmware.
- For this part of the work, we generalize the work plan into a roadmap in a technical perspective.

## HLS, ML, AI engine: roadmap

- As a member of KEK E-sys group, we hope to understand the basic utilization on each, and build a database of such technical knowledge, to support our experimental colleagues.
- We are recruiting young student to learn/work with us.
  - We also plan to make a series of hand-on lecture for each of them.



## hls4ml

- hls4ml: A package for machine learning inference in FPGA.
  - Already lots of utilizations with Vivado HLS in Belle II and ATLAS.
- Yiyang Ding, our summer internship student in 2023, performed general studies on it.
  - A NN model for simple tracker and tested with VPK120!
  - Also tested with Intel FPGA with Quartus.
  - A manual has been prepared.





### Next step: what kinds of algorithms to implment?



2024/04/11

### Next step: how to implment?



2024/04/11

### Prospect: more new ideas

### Additional dimension: More resource in FPGA



### More than NN: CNN or GNN?

KIT, TUM, MPI: Belle II AI trigger group

#### **CNN tracking**



#### **GNN tracking**



- The Collider Electronics Forum at KEK IPNS and Japanese HEP community started a project using the evaluation kits of the Xilinx Versal ACAP targeting on the future R&D of a new universal FPGA device.
- Some of the fundamental functionalities of the Versal evaluation kits have been studied.
  - Firmware making, high-speed transmission, PCIe, AI engine, DPU, and HLS for ML inference.
- Future plan:
  - More basic studies on HLS tools, ML inference packages, and AI engine will be performed.
  - Implement different physics algorithms for different experiments.
  - We will also discuss about the new device's R&D plan.
    - The next generation of Universal Trigger board (UT5).

# Backup

2024/04/11

### **Evaluation kits for Versal**



- Features the VC1902 Versal AI Core series
- For using AI and DSP engines with greater compute performance that current server class CPUs



- Features the VM1802 Versal<sup>™</sup> Prime series
- The world's first ACAP
- A software programmable infrastructure and connectivity



- Features the VH1582 Versal<sup>™</sup> HBM series
- convergence of memory, compute, and connectivity with 32G HBM and 112G PAM4



- Features the Versal AI Core Series
- For (AI) Engine development with Vitis and AI Inference development
- Not flexible for FPGA firmware



- Features Versal<sup>™</sup> Premium series VP1202
- Multiple high-speed connectivity option
- Massive serial bandwidth, security, and compute density



- Features the VE2802 Versal AI Edge series
- Simpler version of VCK190
- Will come out in 2024

### **SuperKEKB**

- SuperKEKB: Upgraded from KEKB.
  - More than 30 times larger luminosity of KEKB with nano beam scheme.
- Asymmetric energy collider:
  - 7.0 GeV  $e^{-}$  and 4.0 GeV  $e^{+}$  for Y(4S)  $\rightarrow B\overline{B}$ .



- Luminosity achievement:
  - L<sub>peak</sub> = 4.65 x 10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>.
     World record. ~Two times of KEKB record with much smaller beam current.
  - $L_{int} = \sim 427 \text{ fb}^{-1} \text{ up to Jun. 2022.}$
- Will resume beam collision in 2024 with PXD full installation.



### **Belle II detector**

• Belle II: Newly-designed sub-detectors set to improve detection performance.



- Physics target of Belle II:
  - Rare B,  $\tau$ , charm physics, Dark Matter search, CP Violation.
- Requirement for data taking:
  - High L1 trigger rate (~30 kHz), high background, and large event size.

## Belle II DAQ system

- Pipeline common readout system for each sub-detector.
  - Except for PXD: data reduction system with ROI.
- Target of performance: 30 kHz L1 rate, ~1% of dead time, and a raw event size of 1 MB.



### Readout device and its upgrade



In total 203 coppers were used in Belle II.



 In total 21 PCIe40 boards will be used in Belle II.

### **Considerations for upgrade:**

- Difficulty of maintenance:
  - Increasing number of malfunctioning pieces.
  - Many different boards in system.
  - Parts out of production already.

- Limit of the system on further improvement:
  - Output throughput by GbE: 1Gbps.
  - CPU usage: ~60% at 30 kHz trigger rate.

• 4 sub-trigger systems + 2 global trigger systems.



### Conditions and requirements for TRG

- Requirements:
  - Overall latency < 4.4  $\mu$ s.
  - ~100% eff. for hadronic events.
  - Max 30 kHz @ 8\*10<sup>35</sup> cm<sup>-2</sup>s<sup>-1</sup>
  - Timing precision: < 10 ns
  - Event separation: 500 ns

- Examples of technical challenges so far:
  - Low-multiplicity trigger mainly based on ECL, but contamination from noise, beam bkg or Bhabha.
- Energy trigger with high eff. but high rate too.
- Injection bkg.

. . . . . .

- Drawback of track trigger at endcap.
- High track trigger rate due to crosstalk noise.
- Latency budget due to transmission or complicated logics.
- Phase2 Lum, Record C.S. (nb) R@L=5.5x10<sup>33</sup> (Hz) R@L=8x10<sup>35</sup> (Hz) Process TRG logic Upsilon(4S) 1.2 6.6 960 CDC 3trk(fff) ECL high energy(hie) ECL 4 clusters(c4) Continuum 2.8 15.4 2200 0.8 4.4 640 μμ CDC 2trk(ffo) etc 0.8 4.4 640 ττ Bhabha 242 350 \* 44 ECL Bhabha(bhabha, 3D bhabha) 19 \* 13.2 2.4 Y-Y CDC 2trk(ffo) Two photon 71.5 10000 13 etc Total 67 357.5 ~15000
- Physics processes in interest:

### Data transmission protocol at Belle II TRG

- Data transmission in TRG: Xilinx and Altera FPGA MGT, QSFP module, and MPO cable.
- The original plan was to use the open-source Aurora protocol, but large latency was introduced and exceeded the L1 limit (4.4  $\mu$ s).
- Belle II CDCTRG developed an user-defined transmission protocols: •
  - Smaller latency than Aurora's: Latency reduction is critical for L1!
  - User-friendly interface.
  - 8B/10B and 64B/66B encoding.
  - Support various Xilinx and Altera MGT.
  - Bit error rate  $< 10^{-18}$  /s with few weeks BERT.
  - Flow control and synchronization.

#### Latency comparison using UT3 (Virtex-6 GTX and GTH)

| Protocol          | Lane rate         | user_clk               | Link type | Latency $(ns)$ |              |
|-------------------|-------------------|------------------------|-----------|----------------|--------------|
| Aurora 8B/10B     | 5.08 Gbps         | $254 \mathrm{~MHz}$    | GTX-GTX   | 185~190 •      | For <b>U</b> |
| Raw-level 8B/10B  | 5.08 Gbps         | $254 \mathrm{~MHz}$    | GTX-GTX   | $132 \sim 136$ | • Up         |
|                   | $5.08 { m ~Gbps}$ | $254 \mathrm{~MHz}$    | GTH-GTX   | $132 \sim 136$ | •            |
|                   | $5.08 { m ~Gbps}$ | $254 \mathrm{~MHz}$    | GTH-GTH   | $91 \sim 95$   | • La         |
|                   | 5.08 Gbps         | $254 \mathrm{~MHz}$    | GTX-GTH   | $91 \sim 95$   |              |
| Aurora 64B/66B    | 10.16 Gbps        | $158.75 \mathrm{~MHz}$ | GTH-GTH   | $296 \sim 302$ |              |
| Raw-level 64B/66B | 11.176 Gbps       | $169.33 \mathrm{~MHz}$ | GTH-GTH   | $106 \sim 112$ |              |

**JT4**:

- p to 25 Gbps using 64B/66B.
- atency: ~ 50ns.

## Track trigger with CDC



- An alternative AUAVAUAVA wire configuration for 3D information:
  - A: Axial super-layer (SL) parallel to z-axis
  - U, V: Stereo SL with two small stereo angles.





## Neural z trigger

 In addition to the conventional 3D tracker based on fitting method, Belle II has a Neural Network 3D tracker (NN) running in parallel in the system. S. Neuhaus et al 2015 J. Phys.: Conf. Ser. 608 012052 Kai Lukas Unger et al 2023 J. Phys.: Conf. Ser. 2438 012056 F. Meggendorfer, DPG Conference 2021 Thesis: S. Skambraks, S. Pohl

1 ap. Dg > 1t

• Input the 2D tracker and stereo TS info: Crossing angle, drift time,  $\phi$  relative to 2D Track .

Karlsruher Institut für Technologie

• Obtain  $z_0$  and  $\theta$ .



# ML tau trigger

- Global trigger receives the cluster information from ECLTRG.
  - Input the position and energy information of clusters to a Neural Network, and determine if it is a tau event or not.
  - A kind of topological application.
  - Based on hls4ml.
  - Validated and will be implemented in 2024 runs.





47

