OmniGBDT Documentation

OmniGBDT packages the original GBDT-MO algorithm as a regular Python library. The native C++ training core remains in place, while the Python layer adds wheel-based installation, public custom-objective hooks, optional sklearn-compatible wrappers, and accuracy-oriented regression defaults.

At the mathematical level, GBDT-MO replaces the usual one-tree-per-target strategy with a single tree whose split score is the summed second-order objective gain across all outputs. This allows one tree to reuse shared structure across correlated targets, supports optional sparse leaves that update only the most relevant outputs, and extends histogram-based split search to the multi-output case so the costs of training remain practical at larger output dimensions.

The main user-facing entry points are MultiOutputGBDT and SingleOutputGBDT.

Why OmniGBDT

  • Joint multi-output gradient boosting from the original GBDT-MO research codebase

  • Standard pip and uv installation with the native library bundled inside the package

  • Public Python callbacks for custom gradients, Hessians, metrics, and early stopping

  • Fixed-thread deterministic CPU training through the public deterministic parameter

  • Optional sklearn-compatible wrappers for tools such as permutation importance

  • Accuracy-oriented regression defaults in the current fork: num_rounds=200, lr=0.05, max_bins=128, early_stop=15, and automatic mean initialization when base_score is unset

For the original project, benchmark figures, experiment scripts, and upstream research context, please see:

Installation

Install the released package:

pip install omnigbdt

or with uv:

uv add omnigbdt

For optional extras, source installs, local-path installs, and Windows build notes, see Installation.

Quick start

The example below trains one MultiOutputGBDT model on two correlated targets using only NumPy:

import numpy as np
from omnigbdt import MultiOutputGBDT, Verbosity

rng = np.random.default_rng(0)
X = rng.normal(size=(400, 6))
shared_signal = 1.2 * X[:, 0] - 0.8 * X[:, 1] + 0.5 * X[:, 2] * X[:, 3]
Y = np.column_stack(
    [
        shared_signal + 0.3 * X[:, 4],
        shared_signal - 0.2 * X[:, 5],
    ]
)

X_train, Y_train = X[:240], Y[:240]
X_valid, Y_valid = X[240:320], Y[240:320]
X_test = X[320:]

model = MultiOutputGBDT(
    out_dim=Y.shape[1],
    params={
        "loss": b"mse",
        "max_depth": 4,
        "max_bins": 128,
        "lr": 0.05,
        "early_stop": 15,
        "num_threads": 1,
        "verbosity": Verbosity.SILENT,
    },
)
model.set_data((X_train, Y_train), (X_valid, Y_valid))
model.train(200)
preds = model.predict(X_test[:5])
print(preds.shape)

For a real-world financial benchmark based on the UCI Stock Portfolio Performance dataset, a comparison with SingleOutputGBDT, and an sklearn permutation_importance example, see Examples.

Documentation guide

  • Installation for released-package installs, source installs, and platform notes

  • Examples for runnable examples, including the financial benchmark

  • Python API for the main Python entry points

  • Parameters for the parameter dictionary and callback hook signatures

  • Differences From Upstream for fork-specific behavior and deviations from the original package

  • Development for local contributor workflows

Differences from upstream

Compared with the upstream repository, OmniGBDT currently adds:

  • standard Python packaging and bundled native-library loading

  • public Python callback hooks for custom gradients, Hessians, metrics, and early stopping

  • public deterministic parameter for fixed-thread CPU repeatability on the same platform

  • optional sklearn-compatible wrappers

  • automatic regression mean initialization when base_score is omitted

  • scalar or per-output base_score values for MultiOutputGBDT

  • accuracy-oriented wrapper defaults for regression workflows

See Differences From Upstream for a fuller summary, including native-code adjustments that can change same-seed trees relative to older buggy runs.

Project provenance

This fork builds directly on the original GBDT-MO implementation by Zhendong Zhang and Cheolkon Jung.

OmniGBDT is intended to make the package easier to build, install, and distribute. It is not the canonical source for the paper, benchmark tables, figures, or research documentation.

If this project is used in research, please credit the original paper:

@article{zhang2020gbdt,
  title={GBDT-MO: Gradient-boosted decision trees for multiple outputs},
  author={Zhang, Zhendong and Jung, Cheolkon},
  journal={IEEE transactions on neural networks and learning systems},
  volume={32},
  number={7},
  pages={3156--3167},
  year={2020},
  publisher={Ieee}
}