Examples

This page contains short, self-contained examples for the packaged OmniGBDT fork. The UCI stock portfolio benchmark is the main real-world tabular example used throughout the documentation. Installation details are available on Installation.

For benchmark scripts and extended evaluation workflows, please see the upstream repositories:

UCI stock portfolio benchmark

Install the UCI dataset helper first:

pip install ucimlrepo

The example below uses the UCI Machine Learning Repository Stock Portfolio Performance dataset. It loads one real-world financial tabular benchmark and splits it into train, validation, and test partitions.

import numpy as np
from ucimlrepo import fetch_ucirepo
from omnigbdt import MultiOutputGBDT, Verbosity

stock_portfolio = fetch_ucirepo(id=390)
frame = stock_portfolio.data.original

feature_columns = [
    "Large B/P",
    "Large ROE",
    "Large S/P",
    "Large Return Rate in the last quarter",
    "Large Market Value",
    "Small systematic Risk",
]
target_columns = [
    "Annual Return.1",
    "Excess Return.1",
    "Systematic Risk.1",
    "Total Risk.1",
    "Abs. Win Rate.1",
    "Rel. Win Rate.1",
]

X = frame.loc[:, feature_columns].to_numpy(dtype=np.float64)
Y = frame.loc[:, target_columns].to_numpy(dtype=np.float64)

rng = np.random.default_rng(0)
indices = rng.permutation(len(X))
train_end = int(len(X) * 0.6)
valid_end = int(len(X) * 0.8)
train_idx = indices[:train_end]
valid_idx = indices[train_end:valid_end]
test_idx = indices[valid_end:]

X_train, Y_train = X[train_idx], Y[train_idx]
X_valid, Y_valid = X[valid_idx], Y[valid_idx]
X_test, Y_test = X[test_idx], Y[test_idx]

params = {
    "loss": b"mse",
    "max_depth": 4,
    "max_bins": 128,
    "lr": 0.05,
    "early_stop": 15,
    "num_threads": 1,
    "verbosity": Verbosity.SILENT,
}

booster = MultiOutputGBDT(out_dim=Y.shape[1], params=params)
booster.set_data((X_train, Y_train), (X_valid, Y_valid))
booster.train(200)

preds = booster.predict(X_test[:5])
print(preds.shape)

The UCI export includes both formatted percentage columns and normalized numeric target columns. The example uses the normalized target columns from data.original, which carry the .1 suffix in the spreadsheet-derived column names. base_score remains unset, so regression training starts from the training-label mean automatically.

Comparing `SingleOutputGBDT` and `MultiOutputGBDT`

Continuing from the UCI stock portfolio example above, one simple baseline is to train:

one MultiOutputGBDT model on the full target matrix
one SingleOutputGBDT model per target column

import numpy as np
from omnigbdt import SingleOutputGBDT

multi_preds = booster.predict(X_test)

single_models = []
for col in range(Y.shape[1]):
    model = SingleOutputGBDT(params=params)
    target = np.ascontiguousarray(Y_train[:, col])
    eval_target = np.ascontiguousarray(Y_valid[:, col])
    model.set_data((X_train, target), (X_valid, eval_target))
    model.train(200)
    single_models.append(model)

single_preds = np.column_stack([model.predict(X_test) for model in single_models])

multi_rmse = np.sqrt(np.mean((multi_preds - Y_test) ** 2))
single_rmse = np.sqrt(np.mean((single_preds - Y_test) ** 2))

print("Held-out RMSE from MultiOutputGBDT:", round(float(multi_rmse), 6))
print("Held-out RMSE from stacked SingleOutputGBDT models:", round(float(single_rmse), 6))

Dumping and loading a model

Continuing from the UCI stock portfolio example above:

from pathlib import Path

model_path = Path("omnigbdt_model.txt")
booster.dump(model_path)

reloaded = MultiOutputGBDT(out_dim=Y.shape[1], params=params)
reloaded.set_booster(X.shape[1], Y.shape[1])
reloaded.load(model_path)

Advanced manual loading

Normal usage does not require manual shared-library handling, but the compatibility helper is still available:

from omnigbdt import MultiOutputGBDT, load_lib

lib = load_lib("/path/to/native/library/or/folder")
booster = MultiOutputGBDT(lib=lib, out_dim=3, params={"loss": b"mse"})

Optional plotting

Install the optional plotting dependency first:

pip install "omnigbdt[plot]"

Then render a dumped tree:

from omnigbdt import create_graph

graph = create_graph("omnigbdt_model.txt", tree_index=0, value_list=[0, 1])
graph.render("tree_0", format="pdf")

Custom objective

MultiOutputGBDT supports public callback-based custom objectives through train(..., objective=...):

Continuing from the UCI stock portfolio example above:

import numpy as np
from omnigbdt import MultiOutputGBDT, Verbosity

def mse_objective(preds, target):
    return preds - target, np.ones_like(preds)

def rmse_metric(preds, target):
    return float(np.sqrt(np.mean((preds - target) ** 2)))

booster = MultiOutputGBDT(
    out_dim=Y_train.shape[1],
    params={
        "loss": b"mse",
        "max_depth": 4,
        "max_bins": 128,
        "lr": 0.05,
        "early_stop": 15,
        "num_threads": 1,
        "verbosity": Verbosity.FULL,
    },
)
booster.set_data((X_train, Y_train), (X_valid, Y_valid))
booster.train(
    200,
    objective=mse_objective,
    eval_metric=rmse_metric,
    maximize=False,
)

This uses a Python callback to supply gradients and Hessians round by round.

The protected _set_gh(...) plus boost() workflow still exists as an advanced escape hatch:

g, h = mse_objective(booster.preds_train.copy(), booster.label.copy())
booster._set_gh(g, h)
booster.boost()

For SingleOutputGBDT, the custom-objective callback receives 1D arrays. For MultiOutputGBDT, it receives 2D arrays shaped (n_samples, out_dim).

The sklearn-compatible wrappers forward the same callback arguments:

from omnigbdt import MultiOutputGBDTRegressor

model = MultiOutputGBDTRegressor(
    num_rounds=200,
    objective=mse_objective,
    eval_metric=rmse_metric,
    maximize=False,
    max_depth=4,
    max_bins=128,
    lr=0.05,
    early_stop=15,
    num_threads=1,
)
model.fit(X_train, Y_train)

Permutation importance with sklearn

Install the optional sklearn extra and the UCI dataset helper first:

pip install "omnigbdt[sklearn]"
pip install ucimlrepo

Then use the sklearn-compatible wrapper with permutation_importance:

import numpy as np
from sklearn.inspection import permutation_importance
from ucimlrepo import fetch_ucirepo

from omnigbdt import MultiOutputGBDTRegressor

stock_portfolio = fetch_ucirepo(id=390)
frame = stock_portfolio.data.original
feature_columns = [
    "Large B/P",
    "Large ROE",
    "Large S/P",
    "Large Return Rate in the last quarter",
    "Large Market Value",
    "Small systematic Risk",
]
target_columns = [
    "Annual Return.1",
    "Excess Return.1",
    "Systematic Risk.1",
    "Total Risk.1",
    "Abs. Win Rate.1",
    "Rel. Win Rate.1",
]
X = frame.loc[:, feature_columns].to_numpy(dtype=np.float64)
Y = frame.loc[:, target_columns].to_numpy(dtype=np.float64)

rng = np.random.default_rng(0)
indices = rng.permutation(len(X))
train_end = int(len(X) * 0.8)
train_idx = indices[:train_end]
test_idx = indices[train_end:]

X_train, Y_train = X[train_idx], Y[train_idx]
X_test, Y_test = X[test_idx], Y[test_idx]

model = MultiOutputGBDTRegressor(
    num_rounds=200,
    max_depth=4,
    max_bins=128,
    lr=0.05,
    early_stop=15,
    num_threads=1,
)
model.fit(X_train, Y_train)

result = permutation_importance(
    model,
    X_test,
    Y_test,
    scoring="r2",
    n_repeats=5,
    random_state=42,
    n_jobs=1,
)

print(result.importances_mean)