Examples
This page contains short, self-contained examples for the packaged OmniGBDT fork. The UCI stock portfolio benchmark is the main real-world tabular example used throughout the documentation. Installation details are available on Installation.
For benchmark scripts and extended evaluation workflows, please see the upstream repositories:
UCI stock portfolio benchmark
Install the UCI dataset helper first:
pip install ucimlrepo
The example below uses the UCI Machine Learning Repository Stock Portfolio Performance dataset. It loads one real-world financial tabular benchmark and splits it into train, validation, and test partitions.
import numpy as np
from ucimlrepo import fetch_ucirepo
from omnigbdt import MultiOutputGBDT, Verbosity
stock_portfolio = fetch_ucirepo(id=390)
frame = stock_portfolio.data.original
feature_columns = [
"Large B/P",
"Large ROE",
"Large S/P",
"Large Return Rate in the last quarter",
"Large Market Value",
"Small systematic Risk",
]
target_columns = [
"Annual Return.1",
"Excess Return.1",
"Systematic Risk.1",
"Total Risk.1",
"Abs. Win Rate.1",
"Rel. Win Rate.1",
]
X = frame.loc[:, feature_columns].to_numpy(dtype=np.float64)
Y = frame.loc[:, target_columns].to_numpy(dtype=np.float64)
rng = np.random.default_rng(0)
indices = rng.permutation(len(X))
train_end = int(len(X) * 0.6)
valid_end = int(len(X) * 0.8)
train_idx = indices[:train_end]
valid_idx = indices[train_end:valid_end]
test_idx = indices[valid_end:]
X_train, Y_train = X[train_idx], Y[train_idx]
X_valid, Y_valid = X[valid_idx], Y[valid_idx]
X_test, Y_test = X[test_idx], Y[test_idx]
params = {
"loss": b"mse",
"max_depth": 4,
"max_bins": 128,
"lr": 0.05,
"early_stop": 15,
"num_threads": 1,
"verbosity": Verbosity.SILENT,
}
booster = MultiOutputGBDT(out_dim=Y.shape[1], params=params)
booster.set_data((X_train, Y_train), (X_valid, Y_valid))
booster.train(200)
preds = booster.predict(X_test[:5])
print(preds.shape)
The UCI export includes both formatted percentage columns and normalized numeric target columns. The example uses the normalized target columns from data.original, which carry the .1 suffix in the spreadsheet-derived column names. base_score remains unset, so regression training starts from the training-label mean automatically.
Comparing SingleOutputGBDT and MultiOutputGBDT
Continuing from the UCI stock portfolio example above, one simple baseline is to train:
one
MultiOutputGBDTmodel on the full target matrixone
SingleOutputGBDTmodel per target column
import numpy as np
from omnigbdt import SingleOutputGBDT
multi_preds = booster.predict(X_test)
single_models = []
for col in range(Y.shape[1]):
model = SingleOutputGBDT(params=params)
target = np.ascontiguousarray(Y_train[:, col])
eval_target = np.ascontiguousarray(Y_valid[:, col])
model.set_data((X_train, target), (X_valid, eval_target))
model.train(200)
single_models.append(model)
single_preds = np.column_stack([model.predict(X_test) for model in single_models])
multi_rmse = np.sqrt(np.mean((multi_preds - Y_test) ** 2))
single_rmse = np.sqrt(np.mean((single_preds - Y_test) ** 2))
print("Held-out RMSE from MultiOutputGBDT:", round(float(multi_rmse), 6))
print("Held-out RMSE from stacked SingleOutputGBDT models:", round(float(single_rmse), 6))
Dumping and loading a model
Continuing from the UCI stock portfolio example above:
from pathlib import Path
model_path = Path("omnigbdt_model.txt")
booster.dump(model_path)
reloaded = MultiOutputGBDT(out_dim=Y.shape[1], params=params)
reloaded.set_booster(X.shape[1], Y.shape[1])
reloaded.load(model_path)
Advanced manual loading
Normal usage does not require manual shared-library handling, but the compatibility helper is still available:
from omnigbdt import MultiOutputGBDT, load_lib
lib = load_lib("/path/to/native/library/or/folder")
booster = MultiOutputGBDT(lib=lib, out_dim=3, params={"loss": b"mse"})
Optional plotting
Install the optional plotting dependency first:
pip install "omnigbdt[plot]"
Then render a dumped tree:
from omnigbdt import create_graph
graph = create_graph("omnigbdt_model.txt", tree_index=0, value_list=[0, 1])
graph.render("tree_0", format="pdf")
Custom objective
MultiOutputGBDT supports public callback-based custom objectives through train(..., objective=...):
Continuing from the UCI stock portfolio example above:
import numpy as np
from omnigbdt import MultiOutputGBDT, Verbosity
def mse_objective(preds, target):
return preds - target, np.ones_like(preds)
def rmse_metric(preds, target):
return float(np.sqrt(np.mean((preds - target) ** 2)))
booster = MultiOutputGBDT(
out_dim=Y_train.shape[1],
params={
"loss": b"mse",
"max_depth": 4,
"max_bins": 128,
"lr": 0.05,
"early_stop": 15,
"num_threads": 1,
"verbosity": Verbosity.FULL,
},
)
booster.set_data((X_train, Y_train), (X_valid, Y_valid))
booster.train(
200,
objective=mse_objective,
eval_metric=rmse_metric,
maximize=False,
)
This uses a Python callback to supply gradients and Hessians round by round.
The protected _set_gh(...) plus boost() workflow still exists as an advanced escape hatch:
g, h = mse_objective(booster.preds_train.copy(), booster.label.copy())
booster._set_gh(g, h)
booster.boost()
For SingleOutputGBDT, the custom-objective callback receives 1D arrays. For MultiOutputGBDT, it receives 2D arrays shaped (n_samples, out_dim).
The sklearn-compatible wrappers forward the same callback arguments:
from omnigbdt import MultiOutputGBDTRegressor
model = MultiOutputGBDTRegressor(
num_rounds=200,
objective=mse_objective,
eval_metric=rmse_metric,
maximize=False,
max_depth=4,
max_bins=128,
lr=0.05,
early_stop=15,
num_threads=1,
)
model.fit(X_train, Y_train)
Permutation importance with sklearn
Install the optional sklearn extra and the UCI dataset helper first:
pip install "omnigbdt[sklearn]"
pip install ucimlrepo
Then use the sklearn-compatible wrapper with permutation_importance:
import numpy as np
from sklearn.inspection import permutation_importance
from ucimlrepo import fetch_ucirepo
from omnigbdt import MultiOutputGBDTRegressor
stock_portfolio = fetch_ucirepo(id=390)
frame = stock_portfolio.data.original
feature_columns = [
"Large B/P",
"Large ROE",
"Large S/P",
"Large Return Rate in the last quarter",
"Large Market Value",
"Small systematic Risk",
]
target_columns = [
"Annual Return.1",
"Excess Return.1",
"Systematic Risk.1",
"Total Risk.1",
"Abs. Win Rate.1",
"Rel. Win Rate.1",
]
X = frame.loc[:, feature_columns].to_numpy(dtype=np.float64)
Y = frame.loc[:, target_columns].to_numpy(dtype=np.float64)
rng = np.random.default_rng(0)
indices = rng.permutation(len(X))
train_end = int(len(X) * 0.8)
train_idx = indices[:train_end]
test_idx = indices[train_end:]
X_train, Y_train = X[train_idx], Y[train_idx]
X_test, Y_test = X[test_idx], Y[test_idx]
model = MultiOutputGBDTRegressor(
num_rounds=200,
max_depth=4,
max_bins=128,
lr=0.05,
early_stop=15,
num_threads=1,
)
model.fit(X_train, Y_train)
result = permutation_importance(
model,
X_test,
Y_test,
scoring="r2",
n_repeats=5,
random_state=42,
n_jobs=1,
)
print(result.importances_mean)