## Belle2Link implementation and data-error check

Yun-Tsung Lai

**KEK IPNS** 

ytlai@post.kek.jp

Belle II Trigger/DAQ workshop 2022 @ Nara Women's University 30th Nov., 2022





## Outline

• Data link auto-reset function in firmwares

Data-error check

Summary

#### **Auto-reset function**

- In the beginning of PCIe40 development work, we observed that:
  - A data link may be turned down while one of the ends (FEE or PCle40) is re-programmed.
  - If a PCIe40 is re-programmed, 50% of all of the links could be down.
  - Then, we need to re-program those devices to recover the links, or repeating is needed.
- An effective and convenient way to recover a link is needed.
  - Issue proper reset signal to the transceiver IPcore.
  - Do it automatically and properly.
- The design of the auto-reset function:
  - Define a link up flag to check the link's hardware status.
  - Issue reset while observing any instability.
  - Repeat it until link is stable.
- Result:
  - Tested to have ~100% readiness while any device is touched.
  - Work for all detectors.
  - No manual approach is needed.



## Link up flag

- The definition of the link up flag (lane\_up) is based on the FPGA transceiver IPcore's interface.
- Basic definition:
  - gtp\_ready/gtx\_ready
  - notinstable (decoding error)
  - disparity error
  - PLL locked
  - rxvalid
  - Wrong position of K character
- It depends on the difference of transceiver, so adaption is needed for different FPGA.
- For the b2l and b2tt modules in firmware, they were using gtp\_ready/gtx\_ready to confirm the link's hardware status in the original design.
  - Now, it is modified for them to use this lane\_up signal as flag.

#### Auto-reset function in PCIe40 firmware

- By slow control pio\_cm3:
  - Trigger a restart signal to
     TX state-machine:
     generate rst\_b signal.
- Inst of alt\_a10\_reset:
  - · For 6 links.
- But rst\_b is active low:
  - OR gate:
     All rst\_b have to be fired
     at the same clock to trigger reset gbt.

```
alt_a10_reset_inst: alt_a10_reset
  generic map(
                                                          1 bank (6 links)
     CLK FREQ
                 => 120e6 -- Comment: (Default: 120MHz)
  port map(
     -- Clock --
                => CLK_A10_100MHZ_P,
                                                       OR gate of 6 rst b
                   rst b(0) or rst b(1) or rst b(2) or rst b(3) or rst b(4) or rst b(5),
     RESET2 B I \Rightarrow '1',
                => reset gbt);
     RESET 0
                                                     1 bit
               gxRstCtrl inst: mgt atxpll rst
                                                                                6 links
               port map (
                  clock
                                           => XCVRCLK,
                                           => reset qbt,
                  reset
                                           => s pll powerdown(i),
                  pll powerdown(0)
                                           => s tx analogreset(i),
                  tx analogreset(0)
                                           => s_tx_digitalreset(i),
                  tx digitalreset(0)
                                           => s tx ready(i),
                  tx ready(0)
                  pll locked(0)
                                           => s pll locked(0),
                 pll_select
                                           => "0".
                                           => or_cal_busy(i),
                  tx cal busy(0)
                                           => s rx analogreset(i),
                  rx analogreset(0)
                  rx digitalreset(0)
                                           =>Is rx digitalreset(i),
                  rx ready(0)
                                           => s rx ready(i),
                                           => s_rx_is_lockedtodata(i)
                 rx is lockedtodata(0)
                                           => s rx cal busy(i)
                  rx cal busy(0)
                                  6 bits
                  );
                    . . .
```

#### Auto-reset function in PCIe40 firmware (cont'd)

- By slow control pio\_cm3:
  - Trigger a restart signal to
     TX state-machine: generate rst\_b signal.
- New design:
  - Inst alt\_a10\_reset is made for each link.
  - So the reset signal to each link can be issued individually.



## Auto-reset function in PCIe40 firmware (cont'd)





- Modification in Belle2Link state machine:
  - Include the lane\_up flag in the condition for state 1.
  - Other states go to state 1 while lane\_up = '0'.
  - Allow state 1 go to state 12 to make the reset fully effective.

#### Auto-reset function in FEE firmware

- Adaption is needed for different FPGA transceiver IPcore.
- TRG UT3 GTX:
  - The first one implemented with this function in 2018~2019.
  - The example design is originated from this.
- Finally finished in 2022 summer.
- Some difficulties:
  - TRG UT3 GTH: Re-compilation of 3D tracker firmware took a few months. It required improvement in the firmware's timing condition.
  - ECL collector: Firmware version problem and improvement in the firmware's timing condition.

| Detector FEE | Transceiver                                        |
|--------------|----------------------------------------------------|
| SVD          | Spartan-6 GTP                                      |
| CDC          | Virtex-5 GTP                                       |
| TOP          | Kintex-7 GTX                                       |
| ARICH        | Virtex-5 GTP                                       |
| ECL          | Spartan-6 GTP                                      |
| KLM          | Virtex-6 GTX                                       |
| TRG          | UT3: Virtex-6 GTX, GTH<br>UT4: UltraScale GTH, GTY |

#### Data error check

Process the RX data from Belle2Link state machine.

- Types of error check:
  - crc16: Checked at the end of event.
  - Event tag incrementation: Checked at the beginning of event. Ignored for the first event of a run.
  - Exprun same as one in last event: Checked at the beginning of event. Ignored for the first event of a run.
  - tt ctime and type between header and trailer: Checked at the end of event.
  - Event tag between header and trailer (LSB 16 bit): Checked at the end of event.

```
B2L: '0'(1) | TT-ctime(27) | TT-type(4)
B2L: TT-tag (32)
                  Header
B2L: TT-etime(32)
B2L: TT-exprun(32)
B2L: '0' | B2L-ctime(27)
                         | reserver(4)
FEE: Data #0 (32)
FEE: Data #1 (32)
                       FEE data
FEE: ...
FEE: Data #n (32)
B2L: '0'(1) | TT-ctime(27) | TT-type(4)
B2L: TT-tag(16) | B2L-CRC16(16)
B2L: X"FE00"(16) | X"FF00"(16)
                   Trailer
```

## Data error check at the beginning of an event

- Event tag incrementation:
- Exprun same as one in last event:
- Use runreset signal to identify the first event of a run.

```
'0'(1) | TT-ctime(27) | TT-type(4)
B2L: TT-tag (32)
                         Header
B2L: TT-etime(32)
B2L: TT-exprun(32)
B2L: '0' | B2L-ctime(27)
                           | reserver(4)
FEE: Data #0 (32)
FEE: Data #1 (32)
                         FEE data
FEE: ...
FEE: Data #n (32)
B2L: '0'(1) | TT-ctime(27) | TT-type(4)
B2L: TT-tag(16)
                          B2L-CRC16(16)
B2L: X"FE00"(16)
                          X"FF00"(16)
```

Trailer



#### Data error check at the end of an event

- crc16.
- tt ctime and type between header and trailer.
- Event tag between header and trailer (LSB 16 bit).
- Use wr\_en\_length\_info from SM1\_V2 to identify the end of an event.



#### Trailer



## Fake error problem in Sep. 2022

- We sometimes observed fake error raised by the module:
  - Data error check detected something, but software didn't.
- One of the reason was found in Sep. 2022 and has been fixed.
  - Only fake crc16 error.
  - Reason: Position of event-end flag.
- Flag is at the end of a data frame: Trace back by 4 clocks.



Flag is at the beginning of a data frame: Trace back by 13 clocks.



## Fake error problem in Sep. 2022 (cont'd)

- In the case that a slow control frame arrives right before the last data frame of an event: Need to trace back by 29 clocks.
  - We didn't notice it while developing the module using test bench.
     In a real detector system, slow control daemon software is always running, so there is such a daemon process of SLC in Belle2Link.
- The problem was fixed by updating the code.
  - Keep the valid crc16 pattern in shift register in a pipe-line, such that we don't need to trace back by so many clocks.



- For now, there is still fake error from time to time, but the frequency is very low (~1/month).
  - Need to run signaltap and keep it triggering for a long time during operation.

#### Summary

- Data link auto-reset function:
  - Implemented in all FEE firmwares and PCIe40 firmware.
  - ~100% readiness in system initialization.

- Data error check:
  - Process the RX data from Belle2Link state machine to find data error.
  - For now, there is still fake error with a very low frequency. Still need to keep an eye on it.

# **Backup**

#### crc16 check

- In b2l\_transmitter, crc16 is calcuated for 8-bit data before going to the data FIFO.
   At receiver side, data arrives in 16-bit.
  - We can process the data in 8-bit but latency will be doubled.
- Calculate crc16 of the MSB 8 bit first (w/o clock), and feed the output to the LSB 8 bit (1 clock), and feed the output of LSB to the MSB part in the next clock.
  - The crc of both MSB abd LSB 8 bit can be obtained in 1 clock.





## An example of crc16 check

Event-end is indicated by wr\_en\_length\_info from SM1\_V2.



1-clock before FE00: crc16 from data

In each clock, the crc16 is calculated for both MSB and LSB 8 bit.

Calculated crc16 with rx data.

Check result at the end of an event.

crc16 of LSB  $\rightarrow$  ini of MSB part next clock