An automated bandwidth division for the LHCb upgrade trigger

Evans, T.; Fitzpatrick, C.; Horswill, J.

doi:10.1007/s41781-025-00139-2

An automated bandwidth division for the LHCb upgrade trigger

Research
Open access
Published: 21 May 2025

Volume 9, article number 7, (2025)
Cite this article

Download PDF

You have full access to this open access article

Computing and Software for Big Science Aims and scope Submit manuscript

An automated bandwidth division for the LHCb upgrade trigger

Download PDF

270 Accesses
2 Altmetric
Explore all metrics

Abstract

The upgraded Large Hadron Collider beauty (LHCb) experiment is the first detector based at a hadron collider using a fully software-based trigger. The first ‘High Level Trigger’ stage (HLT1) reduces the event rate from 30 MHz to approximately 1 MHz based on reconstruction criteria from the tracking system, and consists of $\mathcal {O}(100)$ trigger selections implemented on Graphics Processing Units (GPUs). These selections are further refined following the full offline-quality reconstruction at the second stage (HLT2) prior to saving for analysis. An automated bandwidth division has been performed to equitably divide this 1 MHz HLT1 Output Rate (OR) between the signals of interest to the LHCb physics program. This was achieved by optimizing a set of trigger selections that maximize efficiency for signals of interest to LHCb while keeping the total HLT1 readout capped to a maximum. The bandwidth division tool has been used to determine the optimal selection for 35 selection algorithms over 80 characteristic physics channels.

Performance of the missing transverse momentum triggers for the ATLAS detector during Run-2 data taking

Article Open access 19 August 2020

Beam-Constrained Vertexing for B Physics at the Belle II Experiment

The LHCb Detector

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Preamble

The heavy flavor decays prioritized at LHCb occur at rates much higher than the processes analyzed at other LHC experiments. A novel full-software trigger has been implemented, the purpose of which is to select collision events more likely to be of interest. The new trigger was implemented during the detector upgrade that occurred during 2018–2022, and made selections using event reconstruction information. The resulting complexity presents new challenges when optimizing trigger performance. Physics retention must be managed equitably across the entirety of LHCb’s physics program while also filtering the O(10, 000) petabytes of raw data down to the $\sim 30$ petabytes available for storage on disk [1].

In the Sects. titled "LHCb Upgrade Trigger" and "HLT1 Algorithms", the details and the challenges of LHCb’s trigger will be outlined. The "Methods" Section will describe the method used to solve these challenges at the first selection stage (HLT1). Finally, the resulting quantified improvements in trigger performance will be demonstrated and summarized in the Sects. titled "Results" and "Conclusions".

LHCb upgrade trigger

LHCb is a single-arm forward spectrometer covering the pseudorapidity range $2< \eta < 5$, stationed at interaction point number 8 on the LHC ring. It is one of the four largest detectors at the ‘Large Hadron Collider’ (LHC) at CERN. The main goal of this detector is to discover new physics by probing differences between matter and antimatter, and studying decays of heavy-flavor hadrons. For more details on the detector layout, see Refs. [1, 2]. The upgraded LHCb detector aims to accumulate 50 $\hbox {fb}^{-1}$ of data by 2034, including the data recorded prior to the upgrade. This is made possible by running the detector at an instantaneous luminosity around five times higher than the original detector, which collected 9 $\hbox {fb}^{-1}$ between 2010 and 2018 [3].

The data read out from the detector is managed by Central Processing Unit (CPU) servers that process this information into packets of events that can be processed directly by the High-Level Trigger (HLT) at point 8 of the LHC. During proton–proton (pp) collisions, the total output of the detector is around 4 TB/s. This data is reconstructed and used by the trigger to select signals of interest, after which approximately 10 GB/s is written to permanent offline storage. The corresponding data flow is illustrated in Fig. 1.

The previous hardware-based first-level trigger made decisions relying on simpler, localized information, such as energy deposits in the calorimeters, prior to readout of the full detector. At larger nominal instantaneous luminosity planned for 2022–2034 data-taking, this would saturate. Consequently, the hardware trigger efficiency for hadronic final states dropped as a function of instantaneous luminosity. With the new configuration that makes selections based on track reconstruction, the trigger efficiency scales well [6]. More details on the reconstruction are available in Refs. [1, 7].

HLT1 runs on $\sim 500$ GPUs to reconstruct the triggerless readout from the subdetectors and frontend electronics. The HLT1 trigger menu must be able to efficiently select signals across very wide ranges of rates and energy scales, covering the full breadth of LHCb’s physics program. For nominal running conditions, $\mathcal {O}(100)$ HLT1 selections are performed [8]. When compared to the 2010–2018 period, it is now much more straightforward for an analyst with a new physics idea to implement a corresponding HLT1 line. This is due to the significant flexibility of the full-software trigger [6]. Additionally, GPUs enable the parallelization of the event loop and parts of the track reconstruction, which makes them ideal for maximizing trigger throughput [2, 9].

HLT1 processes data in real time, the output of which is fed into the buffer before it is eventually processed by HLT2. HLT2 performs full-event reconstruction on CPUs. It uses full-particle identification from the RICH detectors and calorimeter systems to apply selections to fully aligned and calibrated physics objects.

The output of HLT1 must be kept below an upper limit, while maximizing the signal efficiency for all physics channels. In practice, this involves reducing the readout from 30 MHz to $\sim 1$ MHz [1]. The reason for this is that HLT2 processes HLT1-filtered data from the buffer days, or sometimes weeks, after the data is processed by HLT1. Consequently, further event reconstruction becomes possible during detector downtime.

The limit of the buffer is determined by the fact that HLT2 needs to process around half of the output of HLT1 to not fall behind since protons are collided in the LHC for approximately half of the operational time between shutdowns [5]. This goal of reducing the HLT1 readout, while simultaneously maximizing physics retention, can be achieved by building a tool that automatically tunes an appropriate set of selections.

Software was previously developed to reduce and divide the bandwidth of the hardware trigger between physics goals for the collection of data between 2010 and 2018 [6]. The aim of this project was to develop software that equitably divides the bandwidth for the upgraded HLT1. This must be automated as the trigger selections and collision conditions can change regularly. For each change, the software must re-divide the bandwidth to optimize HLT1 data acquisition under the new conditions. This is the first automated bandwidth division applied to a software trigger at a HEP experiment.

HLT1 algorithms

The software project for HLT1, Allen, contains parameters that can be tuned to modify the output data rate of the trigger. The HLT1 trigger menu is composed of many trigger lines (decision algorithms). Most of the tuned HLT1 parameters assert requirements on the transverse momentum ($p_{\text {T}}$), impact parameter (IP) or impact parameter significance ($\chi ^2_{\text {IP}}$). $p_{\text {T}}$ is the momentum of a track in the plane transverse to the LHCb detector’s beam axis. IP is the shortest transverse distance between a track and a vertex, typically the proton–proton collision, and $\chi ^2_{\text {IP}}$ is this quantity divided by its uncertainty.

These quantities are useful for distinguishing signal from background because tracks originating from pp collisions typically have small IPs, given by the finite resolution of the vertex locator. Conversely, particles originating from heavy-flavor decays will have larger IPs of around $100\,\upmu \text {m}$ and typically have $p_{\text {T}}$ greater than $2\,\text {Ge}\hspace{-1.00006pt}\,\text {V}\!/c$. Therefore, large numbers of uninteresting events can be rejected by placing requirements on these quantities. Cutting too tightly on these quantities will produce an undesirably low efficiency for certain signal channels, and too loosely produces too much OR from HLT1.

The HLT1 trigger lines can be multivariate and can be categorized into inclusive and exclusive selections. The majority of the OR comes from the inclusive hadron lines, which select most of the decay channels involving beauty hadrons. High signal efficiencies (between 65-95$\%$ at 1 MHz HLT1 output) for beauty decays can be achieved using inclusive selections. Conversely, some channels cannot achieve the same quality of physics retention using only the inclusive lines. For example, it is sometimes necessary to use exclusive lines that select a single channel or a mixture of inclusive and exclusive to achieve good efficiencies when selecting certain charm and strange hadrons decays. Exclusive trigger lines are made possible by the event reconstruction in HLT1. Many unique selections can be performed in real time.

The inclusive TrackMVA and TwoTrackMVA trigger lines select the largest number of physics signals and are responsible for the majority of the HLT1 OR. For that reason, modifying the value of their input thresholds produces the largest effect on the performance of HLT1. It is important that the OR and efficiency curves of these lines, which are instrumental to any figure-of-merit chosen for trigger optimization, are smooth. This significantly increases the convergence rate.

The TrackMVA algorithm performs a preselection of tracks to remove the most obvious fakes, and then applies a hyperbolic selection criterion to a two-dimensional plane of $p_{\text {T}}$ and $\chi ^2_{\text {IP}}$. This method captures most of the signal background discrimination and is simple to implement. The TwoTrackMVA algorithm is designed to identify pairs of tracks originating from the decays of beauty or charm hadrons using a multivariate classifier [1, 6].

To visualize part of the parameter space that would be explored during optimization, examples of the OR for these lines with respect to their input thresholds have been provided in Fig. 2. The OR is calculated using $\sim 9$ million events of pp collision data. This corresponds to $\sim 0.5$ seconds of 2024 LHCb data acquisition using a minimally biasing trigger, with data quality checks at nominal instantaneous luminosity. No other selections were applied. Examples of the efficiency to select $B^0 \rightarrow D^+ D^-$ decays with respect to the requirements on the same lines are shown in Fig. 3.

This efficiency is calculated from a simulation sample generated from 2024 run conditions, containing simulated events that were selected by at least one representative trigger line and possessed protons in both crossing bunches. Again, no other selections were applied. Each signal sample contains sufficient simulated pp collision events ($\sim 100$k) such that the uncertainty on the relative efficiency is much smaller than the change in the efficiency when modifying the values of the discrete line input thresholds. The lines shown in Fig. 3 are the only ones that select this signal channel. The ORs and the efficiencies for Figs. 5, 6, 7, 8, 9, 10, 11 are calculated in the same way.

To automatically tune these efficiencies and the HLT1 OR for the purpose of optimizing trigger performance, a pseudo-$\chi ^2$ figure-of-merit was proposed.

Methods

Figure-of-Merit

At the beginning of this project, the LHCb Collaboration chose a set of signal channels from which the physics retention of a given trigger selection could be calculated. This ensemble is a subset of the total number of channels being studied at LHCb. Increasing the number of channels selected by a given trigger line or weighting those channels more favorably would lead to this line being loosened further during tuning. Hence, these channels and their corresponding figure-of-merit weightings were chosen carefully to best represent the physics interests of the collaboration.

A set of sixteen floating HLT1 input thresholds, denoted ${\textbf {x}}$, were chosen to modify the efficiencies of the representative channels and the HLT1 OR. The HLT1 thresholds excluded from ${\textbf {x}}$ are either fixed, or they are not applied and their corresponding trigger lines are not included in the bandwidth division. A small number of thresholds and lines are not optimized due to their negligible contribution to the total rate. A pseudo-$\chi ^2$ figure-of-merit is minimized in order to tune ${\textbf {x}}$.

The trigger efficiency for each signal channel is measured on a simulated sample of data in which each event contains a reconstructible signal associated with that physics channel. Reconstructibility requirements are applied to the simulated events because most of them are generated without requiring that every track traverses the entire tracking system. This ensures that the bandwidth division determines thresholds based on signal efficiencies for candidates that would be selected at HLT1. A signal event is considered reconstructible when all of the charged particles from the signal decay have a corresponding reconstructible track. A track is reconstructible when it has sufficient hits in the tracking system, on both sides of LHCb’s magnet, to be reconstructed [1]. The trigger efficiency is then the subset of these events that fire on at least one representative trigger line divided by the number of simulated, reconstructible events with protons in both crossing bunches.

If the OR of the trigger exceeds a certain limit ($\text {OR}_{\text {limit}}$), the trigger efficiencies included in the $\chi ^2$ are penalized. The trigger efficiencies are included to account for the loss of physics when constraining the HLT1 OR. The OR is calculated from a sample of ‘minimally biased’ data collected by the detector during 2024, consisting of 9 million events with at least one proton–proton collision:

$$\begin{aligned} \text {OR} = \frac{N^{\text {passed any}}({\textbf {x}})}{N^{\text {total any}}}\times \text {event rate}. \end{aligned}$$

The event rate is the number of events per second that HLT1 processes at nominal luminosity during data-taking ($\sim 30$ MHz). $\text {OR}_{\text {limit}}$ is usually 1 MHz. A range of tunings at different rate limits (typically between 0.5 and $1.5\,\text {MHz}$) enables the collaboration to adapt to changing physics conditions. For example, changes to the nominal luminosity due to gains in throughput from HLT2 would necessitate the ability to interpolate from a range of thresholds.

The events for $N^{\text {passed any}}$ can pass any or all of the HLT1 trigger algorithms included in the division. This means that the $N^{\text {passed any}}({\textbf {x}})$ depends on all of the elements of ${\textbf {x}}$. The $\chi ^2$ represents the weighted sum of the loss of physics retention across the chosen set of signal channels and is given by

$$\begin{aligned} \chi ^2_{\text {global}}({\textbf {x}}) = \sum _{i}^{\text {channels}}\omega _i \left( 1 - \frac{\epsilon _i({\textbf {x}})}{\epsilon ^{\text {max}}_i}\right) ^2, \end{aligned}$$

where $\omega _{\text {i}}$ represents the relative importance of each channel, and is usually set to one. The rate-penalized efficiency for the ith signal decay mode, $\epsilon _i({\textbf {x}})$, is

$$\begin{aligned} \epsilon _{i}({\textbf {x}}) = {\left\{ \begin{array}{ll} \frac{N_{i}^{\text {passed}}({\textbf {x}})}{N_{i}^{\text {total}}} & \text {OR} \le \text {OR}_{\text {limit}}, \\ \frac{N_{i}^{\text {passed}}({\textbf {x}})}{N_{i}^{\text {total}}}\times \frac{\text {OR}_{\text {limit}}}{\text {OR}} & \text {otherwise}, \end{array}\right. } \end{aligned}$$

where $N_{i}^{\text {passed}}({\textbf {x}})$ is the number of reconstructible signal events passing the decisions of the HLT1 trigger selections that would typically be required by an analysis of the given signal mode. $N_{i}^{\text {total}}$ is the total number of reconstructible signal events where there are protons in both crossing bunches. $\epsilon _i^{\text {max}}$ is the rate-penalized efficiency for a given channel if the entirety of the bandwidth is allocated to it. This is the highest efficiency achievable at HLT1. $\epsilon _i^{\text {max}}$ is calculated before the global minimization using a separate figure-of-merit, but the same rate-penalized efficiencies, and is given by $\chi ^2_{\text {indiv}}({\textbf {x}}) = (1-\epsilon _{\text {indiv}}({\textbf {x}}))^2$.

The chosen set of continuous thresholds ${\textbf {x}}$ must be discretized to a grid of possible solutions. The step sizes were chosen to be sufficiently large to ensure statistical significance, given the finite number of available events. It was necessary to truncate the continuous thresholds to discrete values to avoid tuning on statistical noise in the data.

Tuning of the OR and signal efficiencies is currently performed over 35 trigger lines for 80 physics channels chosen by analysts. Each channel can be selected by several thresholds/trigger lines. Thresholds can be shared across multiple lines to ensure selections are consistent between control and signal modes. More than one threshold can also be input into a singular trigger line. A specialized configuration of Allen was used to produce output files that contain the minimal amount of information required to reproduce and modify the settings of the trigger decision. The event information for each signal sample and the minimally biased sample is read from the Allen output files and stored in C++ objects.

The division of the new full-software trigger provides significant computational complexity when compared to the bandwidth division of the hardware trigger during 2010–2018. One of the goals of this project is to enable users to quickly run the entire minimization process and obtain from the bandwidth division tool an optimal set of thresholds using modest computing resources. The OR and signal efficiencies are recalculated tens of thousands of times. This necessitated optimizations to the following functions which resulted in a run time speedup from several hours per $\chi ^2_{\text {indiv}}({\textbf {x}})$ minimization to one to two minutes:

Automatic removal of candidates outside of the desired phase space. If a signal candidate is only accessible by loosening the desired lines to the point where the HLT1 OR is greater than 1.1 MHz, this candidate is removed.
Reading tabulated ROOT event data into memory (C++ data structures) for faster event filtering during efficiency calculations [13].
Parallelizing the $\chi ^2$ evaluations over multiple threads using OpenMP [13].

Choosing a minimization algorithm

The stochastic ‘Genetic Algorithm’ (GA) [14] was employed to find the optimal selection when the problem was less CPU-intensive, during the 2010–2018 data-taking period. This method was chosen due to the fact that the GA operates with input parameters that are constrained to a discrete grid of coordinates. Therefore, the GA required almost no adaptation to minimize the $\chi ^2$.

However, this requires a careful choice of hyper-parameters to balance run time with search depth for a given number of dimensions and possible solutions. This is even more important when performing a minimization for nominal running conditions during 2022–2034. There are approximately four times the number of samples included in the $\chi ^2_{\text {global}}$ calculation, and three times the number of parameter dimensions when compared to the version of the bandwidth division tool used during 2011-2018. Hence, the gradient-based Adapted-Moment (Adam) algorithm [15] has been chosen to improve performance and reduce the number of $\chi ^2$ calculations required before finding the global minimum. The workflow of the Adam algorithm is illustrated in Fig. 4.

Adam is computationally efficient and well-suited to problems with large datasets, many parameters, and noisy gradients. All three of these challenges are encountered when minimizing the loss function. Adam is less sensitive to statistical noise in gradient calculations as a first order method than methods that rely on computations of the Hessian, e.g., HESSE from MINUIT [17].

The Adam algorithm is adapted to search for the best continuous selection ensemble before evaluating nearby thresholds constrained to a discrete grid. ‘Continuous’ here means the thresholds can float as any value within a certain range, i.e., not constrained to a discrete grid of possible thresholds.

It was found initially that the Adam algorithm outperformed the GA by at least an order of magnitude in run time to find the best set of thresholds. This comparison was made after choosing appropriate GA hyper-parameters, and tuning only five thresholds instead of the final number of 16. Adam converged on the solution faster due to the gradient-based search and the fact that searching neighboring discrete thresholds took less time for less dimensions.

However, at larger numbers of dimensions and possible solutions, the stochastic GA approach yields similar levels of performance. The bandwidth division tool takes a longer time evaluating the discrete grid of solutions surrounding the continuous solution found by Adam. Additionally, the gradient calculations at each epoch become more costly for higher dimensions ($N_{\chi ^2} = 6N_{\text {dim}} = 96$ evaluations per epoch).

Figure 5 compares the performance and speed of the Adam algorithm to the GA over many minimizations. Adam takes longer than the GA, but finds better solutions on average, despite starting from random sets of coordinates each time. Across all samples, there is an average increase in efficiency of $\sim 0.4\%$ between the average $\chi ^2$ value of the GA and the average $\chi ^2$ value of Adam. However, the standard deviation of the GA $\chi ^2$ is $\sim 70\%$ smaller than that of Adam. The average GA runtime is also $\sim 30\%$ shorter than the average Adam runtime.

Adam is preferred due to the need to carefully choose the GA hyper-parameters. Another advantage of Adam is the fact that the starting thresholds for the Adam minimization can be chosen to be in the vicinity of the thresholds chosen from previous divisions. This increases the likelihood that the selected thresholds are relatively similar to the divisions at slightly looser/tighter HLT1 OR limits. Conversely, the GA tends to find thresholds with vastly different values when compared to divisions of looser or tighter OR limits due to its stochastic nature.

An example of the Adam search algorithm in an individual sample minimization ($\chi ^2_{\text {indiv}}$) compared to a scan of the phase space is illustrated in Fig. 6. Evident in this figure is the discretization of the continuous solution through a grid search of the nearby discrete thresholds.

Convergence and discretization

The moment hyper-parameters of the Adam algorithm that are responsible for the momentum of the gradient descent are scaled by a heuristic factor of one order of magnitude in the first epoch of the $\chi ^2$ minimization. This helps prevent the minimizer getting stuck in a local minimum if the starting thresholds were situated within one.

After the first Adam minimization, a ‘warm-restart’ mechanism is initiated. A warm restart involves beginning a second minimization at the thresholds chosen by the first minimization with the learning rate for each parameter reset to the starting value. This method was inspired by the minimizers benchmarked in Ref. [18, 19] and evident in Fig. 7. The $\chi ^2_{\text {indiv}}$ minimization path falls into a local minimum and then is ‘kicked’ into a better minimum at a looser TwoTrackMVA threshold and a tighter TrackMVA threshold. Additionally, this figure illustrates the benefit of the Adam algorithm’s ability to move against positive gradients during gradient descent.

Figure 8 also shows the warm restart during the minimization of $\chi ^2_{\text {global}}$. The bandwidth division tool artificially accelerates the search out of a local minimum halfway through the search, resulting in a superior $\chi ^2_{\text {global}}$ at the end. The same artificial multiplication of the Adam moments occurs as it does during the first epoch. However, in this epoch the moments are increased in the direction opposite to the previously explored areas of the parameter space. This increases the range of thresholds being searched.

Once the stopping condition specified in Fig. 4 is satisfied, the smallest $\chi ^2$ value is chosen. The grid thresholds nearest the continuous thresholds would then be located. A non-degenerate search of neighboring discrete selections is performed starting from this nearest set of thresholds to find the smallest discrete $\chi ^2$ value. The bandwidth division tool would move from threshold set to threshold set, searching neighboring thresholds to see if a smaller $\chi ^2$ value could be found. If so, these would be the new central ‘base’ thresholds from which the next group of neighboring thresholds would be searched. The search is performed recursively until a local $\chi ^2$ minimum on the grid is obtained.

The first grid search is performed only over the coordinates that can be reached by making one or two ‘jumps’ (translations) to neighboring sets from the base set. However, jumping twice in the same threshold dimension is prohibited. This method significantly reduces the run time when compared to evaluating all neighboring points. However, it also means that the search is more sensitive to the resolution of the grid.

The continuous thresholds need to be sufficiently close to the grid, i.e., sufficiently granular to avoid an erroneous choice of discrete solution. If the grid is too granular then this increases the chance of a worse local minimum being chosen, since the depth of the grid search is reduced.

Results

Many bandwidth divisions were performed for various preliminary iterations of the trigger, and also for changes to the physics and operation conditions of the detector. This enabled the collaboration to benchmark HLT1’s performance. This occurred before the beginning of the data-taking period, and facilitated the study of trigger efficiencies of not just individual signals, but also ensembles of signals studied by different analysis groups. Consequently, refinements were made that led to improvements in trigger performance in 2024.

The results in Fig. 9 show the efficiencies of all 80 signal channels when using the thresholds being tuned for a $1~\,\text {MHz}$ rate limit. The blue efficiencies were calculated using the optimal ensemble of HLT1 thresholds produced by the tool at a 1 MHz rate limit. The red efficiencies were calculated using the 80 sets of thresholds chosen to maximize the efficiency of each channel. Channels are loosely grouped into those involving beauty, charm, electroweak (EW), rare (RD), and semileptonic (SL) decays. These thresholds have been used to collect 9.54 fb⁻¹ of collision data in HLT1 during 2024.

The results in Figs. 10 - 11 show the analogous results for the loosest and tightest (0.8 and 1.2 MHz) rate limits. Between these rate limits, there is an average increase in efficiency of around $30\%$. The increase in available bandwidth is largely allocated to the inclusive hadronic lines which select the majority of the physics program.

An average physics retention ratio ($\epsilon _{i}^{\text {optimal}} / \epsilon _{i}^{\text {max}}$) of $93\%$ is achieved for the intermediate $p_{\text {T}}$ beauty channels, since all are selected by inclusive hadronic lines (and some also by the muonic lines). The charm channels show a reasonable average retention of $79\%$. Charm decays are high rate and low $p_{\text {T}}$ and so achieve a poorer efficiency for a reasonable share of the bandwidth. The electroweak channels demonstrate very efficient signatures, with an average retention of $97\%$. Some are selected by the high $p_{\text {T}}$ leptonic lines and some by the inclusive hadronic lines. The semileptonic channels demonstrate a reasonable average retention of $90\%$ when selected by the inclusive hadronic and leptonic TrackMVA lines.

Some lines are not written to maximize efficiency. Alternatively, the channels that are being selected can be particularly difficult to trigger on. The efficiency for such signals increases very slowly as a function of rate. In these cases, the tool naturally favors the lines with larger efficiencies for a reasonable bandwidth and penalizes those that do not. For example, the decay $B_{\text {s}} \rightarrow \gamma \gamma$ is challenging to trigger as it does not result in any charged tracks and thus can only be triggered by signals from the electromagnetic calorimeter.

The lines selecting the decay ${{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \rightarrow {\varLambda } {\hspace{1.79993pt}\overline{\hspace{-1.79993pt}\varLambda }}$ have poor efficiency because the decay products are not particularly high in $p_{\text {T}}$ and the lines select a high background rate relative to the signal. The lines selecting $\Upsilon (1S) \rightarrow \ell \ell$ are fixed-rate and are not tuned by the division.

Default thresholds were chosen manually to be used for data-taking before the development of the bandwidth division tool. The automatically tuned thresholds resulted in an average increase in signal efficiency of $\sim 30\%$ for beauty and $\sim 70\%$ for charm and semileptonic channels, at a saving of 200 kHz in OR, with respect to the efficiencies and OR obtained with the default thresholds.

Conclusions

LHCb has pioneered readout and event reconstruction on a fully software-based trigger. This trigger reduces the event rate by a factor 30 before full-event reconstruction at a level comparable to an offline quality. The trigger system has changed considerably since the 2015–2018 data-taking period. However, one thing that remains constant is the requirement for the OR of HLT1 to be reduced to a size that HLT2 can handle, while providing an equitable division of physics data between analysis groups.

Adaptations of stochastic and gradient-based methods have been applied to a high-dimensional phase-space challenge produced by the full-software trigger of HLT1. The new bandwidth division tool enables the collaboration to adapt to this challenge quickly. The results of this optimization serve as a proof of concept that could be well-suited to other domains.

The bandwidth division tool is designed to accommodate all new requirements and features of the upgraded LHCb trigger. It adapts to the increased demand for accuracy and performance. This facilitates a fast turnaround when HLT1 needs re-optimization. This can be done under different run conditions and a different ensemble of samples, lines, and parameters.

The thresholds chosen by the bandwidth division tool during 2024 have acted to equitably minimize the overall difference between the best possible signal efficiencies and the optimized signal efficiencies. This was achieved for a range of HLT1 OR working points. The main consequence of the bandwidth division was a significant improvement in the sensitivity of measurements performed by analysts on data collected in the next two data-taking periods.

For future upgrades, the complexity of the trigger menu will increase and the tool will scale accordingly, as it has between the 2010-2018 and 2022-2034 periods.

Data availability

No datasets were generated or analysed during the current study.

References

The LHC Experiments Committee, “LHCb Trigger and Online Upgrade Technical Design Report,” tech. rep., CERN, May (2014). https://cds.cern.ch/record/1701361
Aaij R, Akar S, JA, et al (2024) The LHCb Upgrade I. JINST 19(05):P05065
Article Google Scholar
Bediaga I, Chanal H, Hopchev P, Cadeddu S, Stoica S, Calvo Gomez M, T’Jampens S, Machikhiliyan I, Guzik Z, Alves Jr A et al (2012) “Framework TDR for the LHCb Upgrade: Technical Design Report,” tech. rep., LHCb-TDR-012. https://cds.cern.ch/record/1443882/files/LHCB-TDR-012.pdf
Gligorov VV, Rodrigues E (2020) “RTA and DPA Dataflow Diagrams for Run 1, Run 2, and the Upgraded LHCb Detector,” http://cds.cern.ch/record/2730181/
The LHC Experiments Committee, “Computing Model of the Upgrade LHCb experiment,” tech. rep., CERN, Geneva, (2018) https://cds.cern.ch/record/2319756
Aaij R, Albrecht J, Belous eaM, (2020) Allen: a high-level trigger on GPUs for LHCb. Compuy Software Big Sci 4:7. https://doi.org/10.1007/s41781-020-00039-7
Article Google Scholar
Aaij R, Benson S, De Cian M, Dziurda A, Fitzpatrick C, Govorkova E, Lupton O, Matev R, Neubert S, Pearce A, Schreiner H, Stahl S, Vesterinen M (2019) A comprehensive real-time analysis model at the LHCb experiment. JINST 14(04):P04006
Article Google Scholar
Aaij R, Vom Bruch D (2021) “Allen Documentation.” https://allen-doc.docs.cern.ch/
Bailly-Reyre A, Bian L, Billoir P, Pérez DHC, Gligorov VV, Pisani F, Quagliani R, Scarabotto A, Bruch DV (2024) Looking forward: a high-throughput track following algorithm for parallel architectures. IEEE Access 12:114198–114211. https://doi.org/10.1109/ACCESS.2024.3442573
Article Google Scholar
Aaij R, Akar S, JA et al (2019) “Design and Performance of the LHCb Trigger and Full Real-time Reconstruction in Run 2 of the LHC,” Journal of Instrumentationl. 14: P04013–P04013. DOI: https://doi.org/10.1088%2F1748-0221%2F14%2F04%2Fp04013
Aaij R, Benson S, Cian MD, Dziurda A, Fitzpatrick C, Govorkova E, Lupton O, Matev R, Neubert S, Pearce A, Schreiner H, Stahl S, Vesterinen M (2019) A comprehensive real-time analysis model at the LHCb experiment. J Instru 14:P04006–P04006
Article Google Scholar
Brun R, Rademakers F (1997) “ROOT - An Object Oriented Data Analysis Framework.” https://github.com/root-project/root/
Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. Comput Sci Eng IEEE 5(1):46–55. https://doi.org/10.1109/99.660313
Article Google Scholar
Mitchell M (1996) An introduction to genetic algorithms. MIT Press. https://doi.org/10.7551/mitpress/3927.001.0001
Article MATH Google Scholar
Ketkar N (2017) Stochastic Gradient Descent. Apress, Berkeley CA
Book Google Scholar
JGraph (2021) “Draw.io Online Diagram Editor,” . https://www.diagrams.net/
James F, Winkler M (2004) “MINUIT User’s Guide,” . https://inspirehep.net/literature/1258345
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CA USA, San Diego
Google Scholar
Loshchilov I, Hutter F (2017) “Decoupled Weight Decay Regularization,” in International Conference on Learning Representations. https://api.semanticscholar.org/CorpusID:53592270

Download references

Acknowledgements

The authors acknowledge the work of Miriam Gandelman on an implementation of the bandwidth division developed for the hardware trigger to take data between 2010 and 2012. The authors would also like to thank LHCb’s Real-Time Analysis Project for its support, insightful discussions, and for reviewing an early draft of this paper. The authors are grateful to the LHCb computing and simulation teams for producing the simulated LHCb samples used calculate the physics retention of the signals of interest. They extend thanks to the LHCb Physics Planning Group and WG analysts for their expertise in selecting the physics signals and trigger lines to include the bandwidth division. Finally, the authors show appreciation to their LHCb colleagues, particularly the RTA piquets and software maintainers, for the development and maintenance of LHCb’s nightly testing and benchmarking infrastructure. The authors have been supported by ERC-StG-2019 Beauty2Charm, and acknowledge support from STFC and UKRI.

Author information

Authors and Affiliations

Department of Physics and Astronomy, University of Manchester, Manchester, United Kingdom
T. Evans, C. Fitzpatrick & J. Horswill
Nikhef National Institute for Subatomic Physics, Amsterdam, the Netherlands
T. Evans

Authors

T. Evans
View author publications
Search author on:PubMed Google Scholar
C. Fitzpatrick
View author publications
Search author on:PubMed Google Scholar
J. Horswill
View author publications
Search author on:PubMed Google Scholar

Contributions

J.H. wrote the majority of the code for the bandwidth division tool which produced all figures except figures 1 and 2. J.H also wrote the main text. All authors reviewed the manuscript, which was also reviewed by Ross Hunter and Vava Gligorov of the Real Time Analysis Project at LHCb. T.E. contributed to making the code for the bandwidth division tool more efficient and streamlined. T.E. integrated the tuning infrastructure into the HLT1 software, Allen. T.E. communicated with the physics planning group and analysts to build and incorporate the ensemble of signal samples and the lines that would select them into the bandwidth division tool. C.F. provided guidance throughout the project. This was helpful as they wrote the previous version of the bandwidth division tool for data taking between 2010-2018. C.F. also aided in liasing with the PPG, analysts and the other members of the Real Time Analysis project. Finally, C.F. obtained the research grant that funded the work for this paper.

Corresponding author

Correspondence to J. Horswill.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Submitted to Comput Softw Big Sci.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Evans, T., Fitzpatrick, C. & Horswill, J. An automated bandwidth division for the LHCb upgrade trigger. Comput Softw Big Sci 9, 7 (2025). https://doi.org/10.1007/s41781-025-00139-2

Download citation

Received: 20 February 2025
Accepted: 28 April 2025
Published: 21 May 2025
DOI: https://doi.org/10.1007/s41781-025-00139-2

Profiles

J. Horswill View author profile

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An automated bandwidth division for the LHCb upgrade trigger

Abstract

Similar content being viewed by others

Performance of the missing transverse momentum triggers for the ATLAS detector during Run-2 data taking

Beam-Constrained Vertexing for B Physics at the Belle II Experiment

The LHCb Detector

Explore related subjects

Introduction

Preamble

LHCb upgrade trigger

HLT1 algorithms

Methods

Figure-of-Merit

Choosing a minimization algorithm

Convergence and discretization

Results

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Profiles