#### Our (more recent) involvement with XILINX/AMD FPGAs and SoCs at the University of Pittsburgh



### The past: analog interface between ATLAS calorimeters and L1 trigger / FPGA work by engineers



# Belle II @ Super-KEKB Intensity frontier B-factory experiment, successor to Belle @ KEKB (1999-2010)





1km

# 7 GeV e<sup>-</sup>, 4 GeV e<sup>+</sup> E<sub>CM</sub> = 10.58 GeV

A collaboration of ~800 scientists from 25 countries



## The Belle II detector

K<sub>L</sub> and muon detector:

Resistive Plate Counter (barrel outer layers) Scintillator + WLSF + MPPC (end-caps, inner 2 barrel layers)

EM Calorimeter: CsI(TI), waveform sampling

#### electron (7 GeV)

Beryllium beam pipe:

2 cm diameter

Vertex detector:

2 layers DEPFET + 4 layers DSSD

Central Drift Chamber: He(50%):C<sub>2</sub>H<sub>6</sub>(50%), Small cells, long lever arm, fast electronics

First new particle collider since the LHC (intensity rather than energy frontier; e<sup>+</sup>e<sup>-</sup> rather than pp)

Apr. 9, 2024 / Realtime Workshop / Giessen



#### Particle Identification:

Time-of-Propagation counter (barrel) Prox. Focusing Aerogel RICH (fwd)



#### Readout (TRG, DAQ):

Max. 30kHz L1 trigger ~100% efficient for hadronic events. 1MB (PXD) + 100kB (others) per event - over 30GB/sec to record

#### Offline computing:

Distributed over the world via the GRID

arXiv:1011.0352 [physics.ins-det]

~8m wide ~8m tall

~1,400 tons









### Some of the TOP FEE, trigger and DAQ infrastructure in the lab







#### To understand SoC/FPGA ZYNQ-045 we also used ZC706 evaluation board





#### Actual GTY bandwidth, AURORA performance, stability etc have been investigated with realistic reference and system clocks

#### When the time came to upgrade Trigger and DAQ, we used evaluation board VCU108



Everything we had to learn about custom boards UT4 (used at KEK for trigger), we learned locally using VCU108 (UltraScale)

Apr. 9, 2024 / Realtime Workshop / Giesse

### When the time came to upgrade Trigger and DAQ, we used evaluation board VCU108



#### Everything we had to learn about custom boards UT4 (used at KEK for trigger), we learned locally using VCU108 (UltraScale)

Apr. 9, 2024 / Realtime Workshop / Giesse

## Before PCIe40 (see Dmytro Levit's talk) was chosen for Belle II DAQ upgrade, we played with FELIX



In the course of working on various projects we accumulated an arsenal of FPGA / SoC boards and built up the infrastructure

#### BNL FELIX platform got us into Kintex and KCU105



This investment didn't pay off in terms of Belle II DAQ upgrade, however, we got a change to play with Kintex UltraScale

Apr. 9, 2024 / Realtime Workshop / Giess





Vladimir Savinov (University of Pittsburgh)

#### When an opportunity unexpectedly became available we used a chance to get an UltraScale+ board



### ASIC characterization / connection to NALU Scientific / new sampling and digitizing electronics

#### RALU SCIENTIFIC

#### **ENABLING INNOVATION**

DESIGNING CUTTING-EDGE TECHNOLOGIES FOR PRECISION MEASUREMEN



ABOUT NALU SCIENTIFIC

ientific specializes in advanced mixed signal integrated circuits with tions in particle tracking and time of flight measurements. Nalu Scientific en the recipient of multiple SBIR awards, to develop the next generation ~on-a-Chip based front-end electronics for particle tracking applications. am has extensive working relationships with several U.S. national labs and ermational collaborations (such as KEK in Japan).

00A Series Waveform Generator

# https://www.naluscientific.com/

Artix<sup>®</sup>-7 XC7A200T

#### Now we use this ecosystem to train new students in the area of FPGA FW and related matters



How we train new students / these are not professionals / usually no prior experience with FPGAs

- 1) Figure everything out by yourself / free reign to experiment with the boards
- 2) First project is usually "make an evaluation board do something for you"
- 3) After it works take a look at Methodology Report / prepare to be shocked
- 4) Explain repeatedly how HDL is not (exactly) a programming language
- 5) Ask to prepare IBERTs AND investigate the sources prepared by Core Gen
- 6) Discuss timing reports / domains / constraints / ways to deal with these

7) Learn as much as possible from Transceiver Wizard and such + XILINX docs

#### Some examples of how "good" of FW designers / how "knowledgeable" we are...





Selected examples of what we discovered (likely about our ignorance) in the course of our work

- 1) HSLB boards (receivers for FEE readout) weren't stable with our clocks in our environment. A student found a way to modify board's firmware to make it work for us. Lesson learned: show some initiative and don't rely on someone else to fix the problem for you. This student continued his education in a graduate school in EE, designs space-bourne instruments.
- 2) QC/QA of SCRODs (Standard Control and Read-Out Data boards) revealed that calibration of RAM was very difficult. If I recall correctly, it was traced to some issues in memory controller on the chip. Information was obtained in timely manner to make the right decision about using RAM in ops. A better calibration scheme was developed (via running it on embedded core of ZYNQ-045). Lesson learned: should allow more time between the last design cycle and production stage.
- 3) Verilog modules instantiation in VIVADO does not always produce the desired result. Lesson learned: don't rely just on Chipscope, sometimes RTL design is worth looking at.
- 4) While transitioning to newest VIVADO (2023.2.2) discovered that its default optimization changed. Lesson learned: be careful with version changes, don't expect the tools to be smarter than you.

5) Discovered an "interesting" difference in how VIVADO grabs all programming cables.

#### Verilog modules port instantiation surprise when generate block is used

```
// from toptrig.v (top-level module for TOP TRG FW)
+ + +
genvar i:
generate
        for(i=0; i<8; i=i+1) begin: TOPBAR_ARRAY</pre>
                 topbar TOPBAR_i (
                 // Usen I/0
                 .RST_AURORA(rst_aurora_quad[i]),
+ + +
                                                                                                             = B'00 00 010 1", S= defaut
                 ,min_hit_cut(min_hit_value),
{}^{\prime\prime}
                                                    -77 BAD!
                 .min_hit_cut(min_hit_value_w), // GOOD!
+++
         end
endgenerate
always @ (posedge clk_127)
begin
        min_hit_value_r <= min_hit_value_i;</pre>
end
+ + 4
           min_hit_value_slc;
    [7:0]
neg.
+ + +
min_hit_value_slc <= vme_write_reg2[3][7:0];</pre>
wire [7:0] min_hit_value_local;
assign min_hit_value_local = min_hit_value_r;
wire [7:0] min_hit_value;
assign min_hit_value = (control_mode==1'b0)?(min_hit_value_slc):(min_hit_value_local);
+ + +
// vs / March 7, 2024: critical for routing into 8 instantiated copied of topbar!
wire [7:0] min_hit_value_w;
assign min_hit_value_w = min_hit_value;
+++
```

before introducing an additional wire as an input to instantiated/generated copies of TOPBAR:



after an additional wire was introduced:



same result (before / after) if an additional register (instead of a multiplexer) is used in a synchronous process

#### Before an extra redundant wire is introduced, RTL design shows no connection to instantiated modules

before introducing an additional wire as an input to instantiated/generated copies of TOPBAR:



#### After an extra redundant wire is introduced, RTL design confirms connection to instantiated modules



same result (before / after) if an additional register (instead of a multiplexer) is used in a synchronous process Apr. 9, 2024 / Realtime Workshop / Giesser

#### An interesting example of many instances of a "better default optimization" in newest VIVADO



->

other instances identified also, similar situation, such signals commented out

## On the other hand, perhaps the newest VIVADO (with its defaults) is actually much better?

# VIVADO 2020.2



# VIVADO 2023.2.2

| ac or o        | x071          | ac or a | X07 3         |        |
|----------------|---------------|---------|---------------|--------|
| <b>10</b> 17 0 | 20171         | 2017 2  | 5             |        |
| x 7 0          | 3 31          | x77     | E             |        |
| <b>x 3</b> 0   | <b>X 37</b> 1 | X 3 3   | 4.1           |        |
| x ar o         | 131           | 182     | <b>x</b> 3 3  | X 5 4  |
| x er o         | 191           | x er 2- | <b>X 27 3</b> | 10 A O |

### Problem with several instances of VIVADO in a multiuser environment with several devices

The setup / how to experience this problem:

There are several instances of VIVADO running on your system Several devices are connected to the system Each device uses its own programming cable Each instance of VIVADO is expected to use its own exclusive device

The statement of the problem:

Devices could not be programmed / debugged simultaneously This is because all instances of VIVADO are using the same hardware manager

Solution which does not work:

Make each instance of VIVADO use its own dedicated hardware manager

This does not help because hardware manager which starts first grabs all programming cables

No such problem in IMPACT/ISE because IMPACT could be told which programming cable to use!

## A working solution to the problem of simultaneous programming / debugging in VIVADO

Disable all USB devices (for programing cables) Enable programming cable (USB device) #1 Start hardware manager instance #1 Start VIVADO instance #1 Connect VIVADO instance #1 to hardware manager #1 Disable all programming cables Enable programming cable (USB device) #2 Start hardware manager instance #2 Start VIVADO instance #2 Connect VIVADO instance #2 to hardware manager #2 Disable all programming cables

•••

Enable all programming cables

Of course, do all this from a script

# a working hack for two instances of VIVADO

disable\_all\_redboxes enable\_one\_redbox 1 start\_servers\_1 vivado -source ~/bin/connect\_to\_redbox\_1 & sleep 10

disable\_all\_redboxes enable\_one\_redbox 2 start\_servers\_2 vivado -source ~/bin/connect\_to\_redbox\_2 & sleep 10

enable\_all\_redboxes

## Of course, all this is of questionable value and quality, but works well for our purposes

```
#!/bin/bash
# disable all redboxes
echo "Xilinx programming cables found:"
Isusb | grep -i "xilinx" | grep -v "grep" | grep -v "show_all_redboxes"
list=`lsusb | grep -i "xilinx" | grep -v "grep" | grep -v "show_all_redboxes" | replace " " "#" -- | replace ":" "" --`
it=1
for box in ${list}
                                                                    #!/bin/bash
                                                                    # enable one redbox
do
                                                                    itenable=$1
                                                                    if test "$itenable" != "1" && test "$itenable" != "2"
# echo "found a programmer: ${box}"
                                                                    then
                                                                    echo "There is no programmer $itenable (?). Your choices are 1 or 2"
 bus=`echo ${box} | awk -F# '{print($2)}'`
                                                                     exit
 device=`echo ${box} | awk -F# '{print($4)}'`
                                                                    list=`lsusb | grep -i "xilinx" | grep -v "grep" | grep -v "show_all_redboxes" | replace " " "#" -- | replace ":" "" --`
# echo "Bus: $bus, device: $device"
                                                                    it=1
                                                                    for box in ${list}
 devfull="/dev/bus/usb/${bus}/${device}"
                                                                    do
                                                                    if test "${itenable}" == "$it"
 echo "Before disabling programmer $it:"
                                                                     then
                                                                     bus=`echo ${box} | awk -F# '{print($2)}'`
 ls -qasl ${devfull}
                                                                     device=`echo ${box} | awk -F# '{print($4)}'
                                                                     devfull="/dev/bus/usb/${bus}/${device}"
 sudo chmod 0 ${devfull}
                                                                     echo "Before enabling programmer $it:"
 echo "After disabling programmer $it:"
                                                                     ls -qasl ${devfull}
                                                                     sudo chmod 666 ${devfull}
 ls -qasl ${devfull}
                                                                     echo "After enabling programmer $it:"
                                                                     ls -qasl ${devfull}
 it=$((it+1))
                                                                    it=$((it+1))
done
                                                                    done
```