AICurious Logo

What is: Base Boosting?

SourceBoosting on the shoulders of giants in quantum device calibration
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

In the setting of multi-target regression, base boosting permits us to incorporate prior knowledge into the learning mechanism of gradient boosting (or Newton boosting, etc.). Namely, from the vantage of statistics, base boosting is a way of building the following additive expansion in a set of elementary basis functions: \begin{equation} h_{j}(X ; { \alpha_{j}, \theta_{j} }) = X_{j} + \sum_{k=1}^{K_{j}} \alpha_{j,k} b(X ; \theta_{j,k}), \end{equation} where XX is an example from the domain X,\mathcal{X}, {αj,θj}={αj,1,,αj,Kj,θj,1,,θj,Kj}\{\alpha_{j}, \theta_{j}\} = \{\alpha_{j,1},\dots, \alpha_{j,K_{j}},\theta_{j,1},\dots,\theta_{j,K_{j}}\} collects the expansion coefficients and parameter sets, XjX_{j} is the image of XX under the jjth coordinate function (a prediction from a user-specified model), KjK_{j} is the number of basis functions in the linear sum, b(X;θj,k)b(X; \theta_{j,k}) is a real-valued function of the example X,X, characterized by a parameter set θj,k.\theta_{j,k}.

The aforementioned additive expansion differs from the standard additive expansion: \begin{equation} h_{j}(X ; { \alpha_{j}, \theta_{j}}) = \alpha_{j, 0} + \sum_{k=1}^{K_{j}} \alpha_{j,k} b(X ; \theta_{j,k}), \end{equation} as it replaces the constant offset value αj,0\alpha_{j, 0} with a prediction from a user-specified model. In essence, this modification permits us to incorporate prior knowledge into the for loop of gradient boosting, as the for loop proceeds to build the linear sum by computing residuals that depend upon predictions from the user-specified model instead of the optimal constant model: \mboxargmini=1mtrainj(Yj(i),c),\mbox{argmin} \sum_{i=1}^{m_{train}} \ell_{j}(Y_{j}^{(i)}, c), where mtrainm_{train} denotes the number of training examples, j\ell_{j} denotes a single-target loss function, and cRc \in \mathbb{R} denotes a real number, e.g, \mboxargmini=1mtrain(Yj(i)c)2=i=1mtrainYj(i)mtrain.\mbox{argmin} \sum_{i=1}^{m_{train}} (Y_{j}^{(i)} - c)^{2} = \frac{\sum_{i=1}^{m_{train}} Y_{j}^{(i)}}{m_{train}}.