Python API
==========

This page summarizes the main Python entry points in OmniGBDT. For installation and runnable examples, see :doc:`install` and :doc:`example`.

Common data requirements
------------------------

- feature arrays should be ``float64`` and two-dimensional with shape ``(n_samples, n_features)``
- multi-output labels should be ``float64`` or ``int32`` and two-dimensional with shape ``(n_samples, out_dim)``
- single-output labels should be contiguous ``float64`` or ``int32`` one-dimensional arrays
- For slicing one column out of a 2D label matrix, use ``np.ascontiguousarray(...)`` before passing it to ``SingleOutputGBDT``

Core models
-----------

MultiOutputGBDT
^^^^^^^^^^^^^^^

.. class:: MultiOutputGBDT(lib=None, out_dim=1, params=None)

   Multi-output boosted tree model.

   :param lib: optional handle returned by ``load_lib()``
   :param int out_dim: number of output columns
   :param dict params: training parameters; missing values fall back to defaults

   ``MultiOutputGBDT`` is the main entry point to learn multiple outputs jointly.

   When ``params["base_score"]`` is left as ``None`` with ``loss=b"mse"``, the initial prediction is inferred from the training-label mean for each output column.

   When ``params["deterministic"]`` is ``True``, repeated CPU runs on the same platform are intended to be repeatable for a fixed ``num_threads`` setting.

   .. method:: set_data(train_set=None, eval_set=None)

      Register training and optional evaluation data.

      ``train_set`` and ``eval_set`` are tuples of ``(X, y)``.

      - ``X`` must be a 2D ``float64`` array
      - ``y`` may be ``None`` or a 2D ``float64`` / ``int32`` array with one column per output

   .. method:: train(num, objective=None, eval_metric=None, maximize=None)

      Train the model for ``num`` boosting rounds.

      - when ``objective`` is omitted, OmniGBDT uses the built-in native loss from ``params["loss"]``
      - when ``objective`` is provided, it must return ``(grad, hess)`` from the current prediction matrix and label matrix
      - for ``MultiOutputGBDT``, those callback arrays are 2D with shape ``(n_samples, out_dim)``
      - ``eval_metric`` may be used to report a scalar metric for the train and eval splits during custom-objective training
      - if ``early_stop > 0`` and evaluation labels are registered on the custom-objective path, then ``eval_metric`` and ``maximize`` must also be provided

   .. method:: predict(x, num_trees=0)

      Predict on a 2D ``float64`` feature matrix.

      - when ``num_trees == 0``, all learned trees are used
      - returns a 2D array with shape ``(n_samples, out_dim)``

   .. method:: dump(path)

      Write the learned model to a text file.

      ``path`` accepts ``str``, ``bytes``, and ``pathlib.Path``.

   .. method:: load(path)

      Load a text-dumped model from disk.

      ``path`` accepts ``str``, ``bytes``, and ``pathlib.Path``.

   .. method:: _set_gh(g, h)

      Set gradient and hessian arrays for the next call to ``boost()``. This is an advanced escape hatch for manual custom-loss workflows.

   .. method:: _set_label(x, is_train)

      Replace labels for the training or evaluation dataset without rebuilding the feature binning.

   .. method:: boost()

      Grow a single tree after calling ``_set_gh(...)``.

   .. method:: close()

      Release the underlying native model explicitly. This is optional, but useful in longer-running scripts.

SingleOutputGBDT
^^^^^^^^^^^^^^^^

.. class:: SingleOutputGBDT(lib=None, out_dim=1, params=None)

   Single-output boosted tree model.

   :param lib: optional handle returned by ``load_lib()``
   :param int out_dim: output dimension used by prediction helpers; for the common single-target case, leave this at ``1``
   :param dict params: training parameters; missing values fall back to defaults

   ``SingleOutputGBDT`` can be used to train one model per target column as a simple baseline.

   When ``params["base_score"]`` is left as ``None`` with ``loss=b"mse"``, the initial prediction is inferred from the training-label mean.

   When ``params["deterministic"]`` is ``True``, repeated CPU runs on the same platform are intended to be repeatable for a fixed ``num_threads`` setting.

   .. method:: set_data(train_set=None, eval_set=None)

      Register training and optional evaluation data.

      ``train_set`` and ``eval_set`` are tuples of ``(X, y)`` where:

      - ``X`` is a 2D ``float64`` array
      - ``y`` is typically a contiguous 1D ``float64`` or ``int32`` array

   .. method:: train(num, objective=None, eval_metric=None, maximize=None)

      Train a single-output model for ``num`` boosting rounds.

      - when ``objective`` is omitted, OmniGBDT uses the built-in native loss from ``params["loss"]``
      - when ``objective`` is provided, it must return ``(grad, hess)`` from the current prediction vector and label vector
      - for ``SingleOutputGBDT``, callback arrays are 1D with shape ``(n_samples,)``
      - the custom-objective path is only supported for the normal ``out_dim == 1`` workflow
      - if ``early_stop > 0`` and evaluation labels are registered on the custom-objective path, then ``eval_metric`` and ``maximize`` must also be provided

   .. method:: predict(x, num_trees=0)

      Predict on a 2D ``float64`` feature matrix.

      - with ``out_dim == 1``, the return value is a 1D array
      - with ``out_dim > 1``, the return value is shaped as ``(n_samples, out_dim)``

   .. method:: train_multi(num)

      Legacy helper used by the original code for multi-classification style workflows.

   .. method:: reset()

      Clear learned trees and reset predictions back to the resolved base score.

   .. method:: close()

      Release the underlying native model explicitly.

Optional sklearn wrappers
-------------------------

The sklearn-compatible wrappers are optional and require the ``sklearn`` extra:

.. code-block:: bash

   pip install "omnigbdt[sklearn]"

This is a fork-specific addition intended to make OmniGBDT work with sklearn tooling such as ``sklearn.inspection.permutation_importance``.

SingleOutputGBDTRegressor
^^^^^^^^^^^^^^^^^^^^^^^^^

.. class:: SingleOutputGBDTRegressor(...)

   sklearn-compatible single-target regressor wrapper around ``SingleOutputGBDT``.

   It exposes ``fit(...)``, ``predict(...)``, and ``score(...)`` for use with tools such as ``sklearn.inspection.permutation_importance``.

   Its constructor also accepts ``objective=None``, ``eval_metric=None``, ``maximize=None``, and ``deterministic=True`` and forwards them to ``SingleOutputGBDT``.

MultiOutputGBDTRegressor
^^^^^^^^^^^^^^^^^^^^^^^^

.. class:: MultiOutputGBDTRegressor(...)

   sklearn-compatible multi-output regressor wrapper around ``MultiOutputGBDT``.

   It exposes ``fit(...)``, ``predict(...)``, and ``score(...)`` for sklearn-style multi-output workflows.

   Its constructor also accepts ``objective=None``, ``eval_metric=None``, ``maximize=None``, and ``deterministic=True`` and forwards them to ``MultiOutputGBDT``.

Utilities
---------

load_lib
^^^^^^^^

.. function:: load_lib(path=None)

   Load the compiled native library and return a configured ``ctypes`` handle.

   ``path`` may be:

   - omitted, in which case the packaged native library is loaded automatically
   - a direct path to the compiled library file
   - a directory that contains the compiled library

   Most users do not need to call this directly.

Verbosity
^^^^^^^^^

.. class:: Verbosity

   Small enum-like helper for training output levels.

   - ``Verbosity.SILENT``: no native training output
   - ``Verbosity.SUMMARY``: only the final best score when evaluation data is present
   - ``Verbosity.FULL``: per-round metrics plus the final best score

create_graph
^^^^^^^^^^^^

.. function:: create_graph(file_name, tree_index=0, value_list=None)

   Build a ``graphviz.Digraph`` from a dumped text model.

   This helper is optional and requires the plotting dependency:

   .. code-block:: bash

      pip install "omnigbdt[plot]"

   :param file_name: path to a text model dump
   :param int tree_index: zero-based tree index
   :param value_list: optional list of output indices to display in leaf nodes