cybooster Documentation¶

CyBooster - A high-performance gradient boosting implementation using Cython

This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks

class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶

Bases: object

Booster classifier.

n_estimators¶: int number of boosting iterations.

learning_rate¶: float controls the learning speed at training time.

n_hidden_features¶: int number of nodes in successive hidden layers.

reg_lambda¶: float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha¶: float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.

row_sample¶: float percentage of rows chosen from the training set.

col_sample¶: float percentage of columns chosen from the training set.

dropout¶: float percentage of nodes dropped from the training set.

tolerance¶: float controls early stopping in gradient descent (at training time).

direct_link¶: bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose¶: int progress bar (yes = 1) or not (no = 0) (currently).

seed¶: int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend¶: str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver¶: str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.

activation¶: str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

n_clusters¶: int number of clusters for clustering the features

clustering_method¶: str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling¶: str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree¶: int degree of features interactions to include in the model

weights_distr¶: str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’

hist¶: bool indicates whether histogram features are used or not (default is False)

bins¶: int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, long[:] y, obj=None)¶

predict(self, double[:, ::1] X)¶

predict_proba(self, double[:, ::1] X)¶

update(self, double[:] X, y, double alpha=0.5)¶

class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶

Bases: object

Booster regressor.

n_estimators¶: int number of boosting iterations.

learning_rate¶: float controls the learning speed at training time.

n_hidden_features¶: int number of nodes in successive hidden layers.

reg_lambda¶: float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha¶: float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’

row_sample¶: float percentage of rows chosen from the training set.

col_sample¶: float percentage of columns chosen from the training set.

dropout¶: float percentage of nodes dropped from the training set.

tolerance¶: float controls early stopping in gradient descent (at training time).

direct_link¶: bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose¶: int progress bar (yes = 1) or not (no = 0) (currently).

seed¶: int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend¶: str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver¶: str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)

activation¶: str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

type_pi¶: str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.

replications¶: int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.

n_clusters¶: int number of clusters for clustering the features

clustering_method¶: str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling¶: str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree¶: int degree of features interactions to include in the model

weights_distr¶: str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’

hist¶: bool whether to use histogram features or not

bins¶: int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, double[:] y)¶

get_feature_importances(self, double[:, ::1] X, columns=None, show_progress=True)¶: Compute average absolute sensitivity for each feature across the dataset. This serves as a feature importance measure.

get_sensitivities(self, double[:, ::1] X, columns=None, show_progress=True)¶

Compute the gradient (sensitivity) of the response with respect to each input feature.

Parameters:

X (np.ndarray) – Input data of shape (n_samples, n_features).
columns (list, optional) – List of features . If None, automatically named.
show_progress (bool, optional) – Whether to display a progress bar. Default is True.

Returns:

Array of sensitivities for each sample and feature (∂F_M/∂x_j)

Return type:

np.ndarray

get_summary(self, double[:, ::1] X, conf_level=0.95, columns=None, show_progress=True)¶

Given a DataFrame of sensitivities (n_obs x n_features), this function returns a summary similar to skim in R with confidence intervals around average effects.

Parameters: - n_bootstrap: Number of bootstrap iterations for confidence intervals. - conf_level: Confidence level for the intervals (default: 95%).

Returns: - summary_df: A pandas DataFrame with feature-level summary statistics.

predict(self, double[:, ::1] X)¶

update(self, double[:] X, y, double alpha=0.5)¶

class cybooster.NGBClassifier(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)¶

Bases: object

Optimized NGBoost Classifier with Multinomial distribution (softmax)

accuracy_score(self, ndarray X, ndarray y)¶: Compute classification accuracy

fit(self, ndarray X, ndarray y)¶: Fit NGBoost classifier with multinomial distribution

get_params(self, deep=True)¶: Get parameters for sklearn compatibility

predict(self, ndarray X)¶: Predict class labels

predict_logit(self, ndarray X)¶: Predict raw logit values

predict_proba(self, ndarray X)¶: Predict class probabilities

sample(self, ndarray X, int n_samples=1)¶: Sample from the predicted categorical distributions

score(self, ndarray X, ndarray y)¶: Compute average log-likelihood score

set_params(self, **params)¶: Set parameters for sklearn compatibility

class cybooster.NGBRegressor(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)¶

Bases: object

Optimized NGBoost implementation

fit(self, ndarray X, ndarray y)¶: Fit NGBoost model with improved numerical stability

get_params(self, deep=True)¶: Get parameters for sklearn compatibility

predict(self, ndarray X, bool return_std=False)¶: Predict distribution parameters or point estimates

predict_dist(self, ndarray X)¶: Predict full distribution parameters (mu, sigma)

sample(self, ndarray X, int n_samples=1)¶: Sample from the predicted distributions

score(self, ndarray X, ndarray y)¶: Compute log-likelihood score

set_params(self, **params)¶: Set parameters for sklearn compatibility

class cybooster.SkBoosterClassifier[source]¶

Bases: BoosterClassifier, BaseEstimator, ClassifierMixin

A scikit-learn compatible wrapper for BoosterClassifier.

fit(self, double[:, ::1] X, long[:] y, obj=None)[source]¶

predict(self, double[:, ::1] X)[source]¶

predict_proba(self, double[:, ::1] X)[source]¶

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SkBoosterClassifier¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class cybooster.SkBoosterRegressor[source]¶

Bases: BoosterRegressor, BaseEstimator, RegressorMixin

A scikit-learn compatible wrapper for BoosterRegressor.

fit(self, double[:, ::1] X, double[:] y)[source]¶

predict(self, double[:, ::1] X)[source]¶

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SkBoosterRegressor¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class cybooster.SkNGBClassifier(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]¶

Bases: BaseEstimator, ClassifierMixin

fit(X, y)[source]¶

predict(X)[source]¶

predict_proba(X)[source]¶

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SkNGBClassifier¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class cybooster.SkNGBRegressor(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]¶

Bases: BaseEstimator, RegressorMixin

Scikit-learn compatible NGBoost regressor (Python wrapper).

This is a thin wrapper around the high-performance Cython implementation exposed as cybooster._ngboost.NGBRegressor. It mirrors a scikit-learn estimator interface while providing an optional JAX acceleration hook for internal linear algebra utilities.

Parameters:

obj (Any, optional) – Placeholder for future objectives (kept for backward compatibility). The current Cython backend expects this positional slot.
n_estimators (int, default=500) – Number of boosting iterations.
learning_rate (float, default=0.01) – Shrinkage applied to each boosting step.
tol (float, default=1e-4) – Tolerance used for early stopping monitoring in the backend.
early_stopping (bool, default=True) – Whether to enable early stopping based on log-likelihood improvement.
n_iter_no_change (int, default=10) – Number of successive iterations with change < tol to trigger stop.
feature_engineering (bool, default=False) – If True, enables feature engineering through nnetsauce.
use_jax (bool, default=True) – If True and JAX is available, enables small JIT-compiled helpers.
verbose (bool, default=False) – If True, prints fitting diagnostics.

Notes

The underlying predictive distribution is Normal with parameters (mu, log_sigma). The backend learns both via natural gradients.
predict(X, return_std=False) returns point estimates. When return_std=True, it returns a 2D array with columns (mu, sigma).
Use predict_dist(X) to directly obtain distribution parameters as (mu, sigma).

fit(X, y)[source]¶

Fit the NGBoost model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training features.
y (array-like of shape (n_samples,)) – Target values.

Returns:

self – Fitted estimator.

Return type:

SkNGBRegressor

predict(X, return_std=False)[source]¶

Predict values or distribution parameters.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input features.
return_std (bool, default=False) –
- If False, returns point predictions (mu).
- If True, returns an array of shape (n_samples, 2) with columns (mu, sigma).

Returns:

Either (n_samples,) array of means or (n_samples, 2) with (mu, sigma) per sample.

Return type:

ndarray

predict_dist(X)[source]¶

Predict Normal distributions for each input sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – Input features.
Returns:: A list of Normal distributions parameterized by predicted mu and sigma for each input sample.
Return type:: list[scipy.stats.rv_continuous]

set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') → SkNGBRegressor¶

Configure whether metadata should be requested to be passed to the predict method.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SkNGBRegressor¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object