OmniGBDT Documentation
OmniGBDT packages the original GBDT-MO algorithm as a regular Python library. The native C++ training core remains in place, while the Python layer adds wheel-based installation, public custom-objective hooks, optional sklearn-compatible wrappers, and accuracy-oriented regression defaults.
At the mathematical level, GBDT-MO replaces the usual one-tree-per-target strategy with a single tree whose split score is the summed second-order objective gain across all outputs. This allows one tree to reuse shared structure across correlated targets, supports optional sparse leaves that update only the most relevant outputs, and extends histogram-based split search to the multi-output case so the costs of training remain practical at larger output dimensions.
The main user-facing entry points are MultiOutputGBDT and SingleOutputGBDT.
Why OmniGBDT
Joint multi-output gradient boosting from the original GBDT-MO research codebase
Standard
pipanduvinstallation with the native library bundled inside the packagePublic Python callbacks for custom gradients, Hessians, metrics, and early stopping
Fixed-thread deterministic CPU training through the public
deterministicparameterOptional sklearn-compatible wrappers for tools such as permutation importance
Accuracy-oriented regression defaults in the current fork:
num_rounds=200,lr=0.05,max_bins=128,early_stop=15, and automatic mean initialization whenbase_scoreis unset
For the original project, benchmark figures, experiment scripts, and upstream research context, please see:
Installation
Install the released package:
pip install omnigbdt
or with uv:
uv add omnigbdt
For optional extras, source installs, local-path installs, and Windows build notes, see Installation.
Quick start
The example below trains one MultiOutputGBDT model on two correlated targets using only NumPy:
import numpy as np
from omnigbdt import MultiOutputGBDT, Verbosity
rng = np.random.default_rng(0)
X = rng.normal(size=(400, 6))
shared_signal = 1.2 * X[:, 0] - 0.8 * X[:, 1] + 0.5 * X[:, 2] * X[:, 3]
Y = np.column_stack(
[
shared_signal + 0.3 * X[:, 4],
shared_signal - 0.2 * X[:, 5],
]
)
X_train, Y_train = X[:240], Y[:240]
X_valid, Y_valid = X[240:320], Y[240:320]
X_test = X[320:]
model = MultiOutputGBDT(
out_dim=Y.shape[1],
params={
"loss": b"mse",
"max_depth": 4,
"max_bins": 128,
"lr": 0.05,
"early_stop": 15,
"num_threads": 1,
"verbosity": Verbosity.SILENT,
},
)
model.set_data((X_train, Y_train), (X_valid, Y_valid))
model.train(200)
preds = model.predict(X_test[:5])
print(preds.shape)
For a real-world financial benchmark based on the UCI Stock Portfolio Performance dataset, a comparison with SingleOutputGBDT, and an sklearn permutation_importance example, see Examples.
Documentation guide
Installation for released-package installs, source installs, and platform notes
Examples for runnable examples, including the financial benchmark
Python API for the main Python entry points
Parameters for the parameter dictionary and callback hook signatures
Differences From Upstream for fork-specific behavior and deviations from the original package
Development for local contributor workflows
Differences from upstream
Compared with the upstream repository, OmniGBDT currently adds:
standard Python packaging and bundled native-library loading
public Python callback hooks for custom gradients, Hessians, metrics, and early stopping
public
deterministicparameter for fixed-thread CPU repeatability on the same platformoptional sklearn-compatible wrappers
automatic regression mean initialization when
base_scoreis omittedscalar or per-output
base_scorevalues forMultiOutputGBDTaccuracy-oriented wrapper defaults for regression workflows
See Differences From Upstream for a fuller summary, including native-code adjustments that can change same-seed trees relative to older buggy runs.
Project provenance
This fork builds directly on the original GBDT-MO implementation by Zhendong Zhang and Cheolkon Jung.
OmniGBDT is intended to make the package easier to build, install, and distribute. It is not the canonical source for the paper, benchmark tables, figures, or research documentation.
If this project is used in research, please credit the original paper:
@article{zhang2020gbdt,
title={GBDT-MO: Gradient-boosted decision trees for multiple outputs},
author={Zhang, Zhendong and Jung, Cheolkon},
journal={IEEE transactions on neural networks and learning systems},
volume={32},
number={7},
pages={3156--3167},
year={2020},
publisher={Ieee}
}