Python API

This page summarizes the main Python entry points in OmniGBDT. For installation and runnable examples, see Installation and Examples.

Common data requirements

feature arrays should be float64 and two-dimensional with shape (n_samples, n_features)
multi-output labels should be float64 or int32 and two-dimensional with shape (n_samples, out_dim)
single-output labels should be contiguous float64 or int32 one-dimensional arrays
For slicing one column out of a 2D label matrix, use np.ascontiguousarray(...) before passing it to SingleOutputGBDT

Core models

MultiOutputGBDT

class MultiOutputGBDT(lib=None, out_dim=1, params=None)

Multi-output boosted tree model.

Parameters:

lib – optional handle returned by load_lib()
out_dim (int) – number of output columns
params (dict) – training parameters; missing values fall back to defaults

MultiOutputGBDT is the main entry point to learn multiple outputs jointly.

When params["base_score"] is left as None with loss=b"mse", the initial prediction is inferred from the training-label mean for each output column.

When params["deterministic"] is True, repeated CPU runs on the same platform are intended to be repeatable for a fixed num_threads setting.

set_data(train_set=None, eval_set=None)

Register training and optional evaluation data.

train_set and eval_set are tuples of (X, y).

X must be a 2D float64 array
y may be None or a 2D float64 / int32 array with one column per output

train(num, objective=None, eval_metric=None, maximize=None)

Train the model for num boosting rounds.

when objective is omitted, OmniGBDT uses the built-in native loss from params["loss"]
when objective is provided, it must return (grad, hess) from the current prediction matrix and label matrix
for MultiOutputGBDT, those callback arrays are 2D with shape (n_samples, out_dim)
eval_metric may be used to report a scalar metric for the train and eval splits during custom-objective training
if early_stop > 0 and evaluation labels are registered on the custom-objective path, then eval_metric and maximize must also be provided

predict(x, num_trees=0)

Predict on a 2D float64 feature matrix.

when num_trees == 0, all learned trees are used
returns a 2D array with shape (n_samples, out_dim)

dump(path)

Write the learned model to a text file.

path accepts str, bytes, and pathlib.Path.

load(path)

Load a text-dumped model from disk.

path accepts str, bytes, and pathlib.Path.

_set_gh(g, h): Set gradient and hessian arrays for the next call to boost(). This is an advanced escape hatch for manual custom-loss workflows.

_set_label(x, is_train): Replace labels for the training or evaluation dataset without rebuilding the feature binning.

boost(): Grow a single tree after calling _set_gh(...).

close(): Release the underlying native model explicitly. This is optional, but useful in longer-running scripts.

SingleOutputGBDT

class SingleOutputGBDT(lib=None, out_dim=1, params=None)

Single-output boosted tree model.

Parameters:

lib – optional handle returned by load_lib()
out_dim (int) – output dimension used by prediction helpers; for the common single-target case, leave this at 1
params (dict) – training parameters; missing values fall back to defaults

SingleOutputGBDT can be used to train one model per target column as a simple baseline.

When params["base_score"] is left as None with loss=b"mse", the initial prediction is inferred from the training-label mean.

When params["deterministic"] is True, repeated CPU runs on the same platform are intended to be repeatable for a fixed num_threads setting.

set_data(train_set=None, eval_set=None)

Register training and optional evaluation data.

train_set and eval_set are tuples of (X, y) where:

X is a 2D float64 array
y is typically a contiguous 1D float64 or int32 array

train(num, objective=None, eval_metric=None, maximize=None)

Train a single-output model for num boosting rounds.

when objective is omitted, OmniGBDT uses the built-in native loss from params["loss"]
when objective is provided, it must return (grad, hess) from the current prediction vector and label vector
for SingleOutputGBDT, callback arrays are 1D with shape (n_samples,)
the custom-objective path is only supported for the normal out_dim == 1 workflow
if early_stop > 0 and evaluation labels are registered on the custom-objective path, then eval_metric and maximize must also be provided

predict(x, num_trees=0)

Predict on a 2D float64 feature matrix.

with out_dim == 1, the return value is a 1D array
with out_dim > 1, the return value is shaped as (n_samples, out_dim)

train_multi(num): Legacy helper used by the original code for multi-classification style workflows.

reset(): Clear learned trees and reset predictions back to the resolved base score.

close(): Release the underlying native model explicitly.

Optional sklearn wrappers

The sklearn-compatible wrappers are optional and require the sklearn extra:

pip install "omnigbdt[sklearn]"

This is a fork-specific addition intended to make OmniGBDT work with sklearn tooling such as sklearn.inspection.permutation_importance.

SingleOutputGBDTRegressor

class SingleOutputGBDTRegressor(...)

sklearn-compatible single-target regressor wrapper around SingleOutputGBDT.

It exposes fit(...), predict(...), and score(...) for use with tools such as sklearn.inspection.permutation_importance.

Its constructor also accepts objective=None, eval_metric=None, maximize=None, and deterministic=True and forwards them to SingleOutputGBDT.

MultiOutputGBDTRegressor

class MultiOutputGBDTRegressor(...)

sklearn-compatible multi-output regressor wrapper around MultiOutputGBDT.

It exposes fit(...), predict(...), and score(...) for sklearn-style multi-output workflows.

Its constructor also accepts objective=None, eval_metric=None, maximize=None, and deterministic=True and forwards them to MultiOutputGBDT.

Utilities

load_lib

load_lib(path=None)

Load the compiled native library and return a configured ctypes handle.

path may be:

omitted, in which case the packaged native library is loaded automatically
a direct path to the compiled library file
a directory that contains the compiled library

Most users do not need to call this directly.

Verbosity

class Verbosity

Small enum-like helper for training output levels.

Verbosity.SILENT: no native training output
Verbosity.SUMMARY: only the final best score when evaluation data is present
Verbosity.FULL: per-round metrics plus the final best score

create_graph

create_graph(file_name, tree_index=0, value_list=None)

Build a graphviz.Digraph from a dumped text model.

This helper is optional and requires the plotting dependency:

pip install "omnigbdt[plot]"

Parameters:

file_name – path to a text model dump
tree_index (int) – zero-based tree index
value_list – optional list of output indices to display in leaf nodes