cybooster Documentation

CyBooster - A high-performance gradient boosting implementation using Cython

This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks

class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')

Bases: object

Booster classifier.

n_estimators

int number of boosting iterations.

learning_rate

float controls the learning speed at training time.

n_hidden_features

int number of nodes in successive hidden layers.

reg_lambda

float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha

float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.

row_sample

float percentage of rows chosen from the training set.

col_sample

float percentage of columns chosen from the training set.

dropout

float percentage of nodes dropped from the training set.

tolerance

float controls early stopping in gradient descent (at training time).

bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose

int progress bar (yes = 1) or not (no = 0) (currently).

seed

int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend

str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver

str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.

activation

str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

n_clusters

int number of clusters for clustering the features

clustering_method

str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling

str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree

int degree of features interactions to include in the model

weights_distr

str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’

hist

bool indicates whether histogram features are used or not (default is False)

bins

int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, long[:] y, obj=None)
predict(self, double[:, ::1] X)
predict_proba(self, double[:, ::1] X)
update(self, double[:] X, y, double alpha=0.5)
class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')

Bases: object

Booster regressor.

n_estimators

int number of boosting iterations.

learning_rate

float controls the learning speed at training time.

n_hidden_features

int number of nodes in successive hidden layers.

reg_lambda

float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha

float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’

row_sample

float percentage of rows chosen from the training set.

col_sample

float percentage of columns chosen from the training set.

dropout

float percentage of nodes dropped from the training set.

tolerance

float controls early stopping in gradient descent (at training time).

bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose

int progress bar (yes = 1) or not (no = 0) (currently).

seed

int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend

str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver

str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)

activation

str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

type_pi

str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.

replications

int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.

n_clusters

int number of clusters for clustering the features

clustering_method

str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling

str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree

int degree of features interactions to include in the model

weights_distr

str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’

hist

bool whether to use histogram features or not

bins

int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, double[:] y)
get_feature_importances(self, double[:, ::1] X, columns=None, show_progress=True)

Compute average absolute sensitivity for each feature across the dataset. This serves as a feature importance measure.

get_sensitivities(self, double[:, ::1] X, columns=None, show_progress=True)

Compute the gradient (sensitivity) of the response with respect to each input feature.

Parameters:
  • X (np.ndarray) – Input data of shape (n_samples, n_features).

  • columns (list, optional) – List of features . If None, automatically named.

  • show_progress (bool, optional) – Whether to display a progress bar. Default is True.

Returns:

Array of sensitivities for each sample and feature (∂F_M/∂x_j)

Return type:

np.ndarray

get_summary(self, double[:, ::1] X, conf_level=0.95, columns=None, show_progress=True)

Given a DataFrame of sensitivities (n_obs x n_features), this function returns a summary similar to skim in R with confidence intervals around average effects.

Parameters: - n_bootstrap: Number of bootstrap iterations for confidence intervals. - conf_level: Confidence level for the intervals (default: 95%).

Returns: - summary_df: A pandas DataFrame with feature-level summary statistics.

predict(self, double[:, ::1] X)
update(self, double[:] X, y, double alpha=0.5)
class cybooster.NGBClassifier(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)

Bases: object

Optimized NGBoost Classifier with Multinomial distribution (softmax)

accuracy_score(self, ndarray X, ndarray y)

Compute classification accuracy

fit(self, ndarray X, ndarray y)

Fit NGBoost classifier with multinomial distribution

get_params(self, deep=True)

Get parameters for sklearn compatibility

predict(self, ndarray X)

Predict class labels

predict_logit(self, ndarray X)

Predict raw logit values

predict_proba(self, ndarray X)

Predict class probabilities

sample(self, ndarray X, int n_samples=1)

Sample from the predicted categorical distributions

score(self, ndarray X, ndarray y)

Compute average log-likelihood score

set_params(self, **params)

Set parameters for sklearn compatibility

class cybooster.NGBRegressor(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)

Bases: object

Optimized NGBoost implementation

fit(self, ndarray X, ndarray y)

Fit NGBoost model with improved numerical stability

get_params(self, deep=True)

Get parameters for sklearn compatibility

predict(self, ndarray X, bool return_std=False)

Predict distribution parameters or point estimates

predict_dist(self, ndarray X)

Predict full distribution parameters (mu, sigma)

sample(self, ndarray X, int n_samples=1)

Sample from the predicted distributions

score(self, ndarray X, ndarray y)

Compute log-likelihood score

set_params(self, **params)

Set parameters for sklearn compatibility

class cybooster.SkBoosterClassifier[source]

Bases: BoosterClassifier, BaseEstimator, ClassifierMixin

A scikit-learn compatible wrapper for BoosterClassifier.

fit(self, double[:, ::1] X, long[:] y, obj=None)[source]
predict(self, double[:, ::1] X)[source]
predict_proba(self, double[:, ::1] X)[source]
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkBoosterClassifier

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class cybooster.SkBoosterRegressor[source]

Bases: BoosterRegressor, BaseEstimator, RegressorMixin

A scikit-learn compatible wrapper for BoosterRegressor.

fit(self, double[:, ::1] X, double[:] y)[source]
predict(self, double[:, ::1] X)[source]
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkBoosterRegressor

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class cybooster.SkNGBClassifier(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]

Bases: BaseEstimator, ClassifierMixin

fit(X, y)[source]
predict(X)[source]
predict_proba(X)[source]
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkNGBClassifier

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class cybooster.SkNGBRegressor(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]

Bases: BaseEstimator, RegressorMixin

Scikit-learn compatible NGBoost regressor (Python wrapper).

This is a thin wrapper around the high-performance Cython implementation exposed as cybooster._ngboost.NGBRegressor. It mirrors a scikit-learn estimator interface while providing an optional JAX acceleration hook for internal linear algebra utilities.

Parameters:
  • obj (Any, optional) – Placeholder for future objectives (kept for backward compatibility). The current Cython backend expects this positional slot.

  • n_estimators (int, default=500) – Number of boosting iterations.

  • learning_rate (float, default=0.01) – Shrinkage applied to each boosting step.

  • tol (float, default=1e-4) – Tolerance used for early stopping monitoring in the backend.

  • early_stopping (bool, default=True) – Whether to enable early stopping based on log-likelihood improvement.

  • n_iter_no_change (int, default=10) – Number of successive iterations with change < tol to trigger stop.

  • feature_engineering (bool, default=False) – If True, enables feature engineering through nnetsauce.

  • use_jax (bool, default=True) – If True and JAX is available, enables small JIT-compiled helpers.

  • verbose (bool, default=False) – If True, prints fitting diagnostics.

Notes

  • The underlying predictive distribution is Normal with parameters (mu, log_sigma). The backend learns both via natural gradients.

  • predict(X, return_std=False) returns point estimates. When return_std=True, it returns a 2D array with columns (mu, sigma).

  • Use predict_dist(X) to directly obtain distribution parameters as (mu, sigma).

fit(X, y)[source]

Fit the NGBoost model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target values.

Returns:

self – Fitted estimator.

Return type:

SkNGBRegressor

predict(X, return_std=False)[source]

Predict values or distribution parameters.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • return_std (bool, default=False) –

    • If False, returns point predictions (mu).

    • If True, returns an array of shape (n_samples, 2) with columns (mu, sigma).

Returns:

Either (n_samples,) array of means or (n_samples, 2) with (mu, sigma) per sample.

Return type:

ndarray

predict_dist(X)[source]

Predict Normal distributions for each input sample.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input features.

Returns:

A list of Normal distributions parameterized by predicted mu and sigma for each input sample.

Return type:

list[scipy.stats.rv_continuous]

set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') SkNGBRegressor

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkNGBRegressor

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object