cybooster Documentation¶
CyBooster - A high-performance gradient boosting implementation using Cython
This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks
- class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶
Bases:
object
Booster classifier.
- n_estimators¶
int number of boosting iterations.
- learning_rate¶
float controls the learning speed at training time.
int number of nodes in successive hidden layers.
- reg_lambda¶
float L2 regularization parameter for successive errors in the optimizer (at training time).
- alpha¶
float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.
- row_sample¶
float percentage of rows chosen from the training set.
- col_sample¶
float percentage of columns chosen from the training set.
- dropout¶
float percentage of nodes dropped from the training set.
- tolerance¶
float controls early stopping in gradient descent (at training time).
- direct_link¶
bool indicates whether the original features are included (True) in model’s fitting or not (False).
- verbose¶
int progress bar (yes = 1) or not (no = 0) (currently).
- seed¶
int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.
- backend¶
str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)
- solver¶
str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.
- activation¶
str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’
- n_clusters¶
int number of clusters for clustering the features
- clustering_method¶
str clustering method: currently ‘kmeans’, ‘gmm’
- cluster_scaling¶
str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’
- degree¶
int degree of features interactions to include in the model
- weights_distr¶
str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’
- hist¶
bool indicates whether histogram features are used or not (default is False)
- bins¶
int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)
- fit(self, double[:, ::1] X, long[:] y, obj=None)¶
- predict(self, double[:, ::1] X)¶
- predict_proba(self, double[:, ::1] X)¶
- update(self, double[:] X, y, double alpha=0.5)¶
- class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶
Bases:
object
Booster regressor.
- n_estimators¶
int number of boosting iterations.
- learning_rate¶
float controls the learning speed at training time.
int number of nodes in successive hidden layers.
- reg_lambda¶
float L2 regularization parameter for successive errors in the optimizer (at training time).
- alpha¶
float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’
- row_sample¶
float percentage of rows chosen from the training set.
- col_sample¶
float percentage of columns chosen from the training set.
- dropout¶
float percentage of nodes dropped from the training set.
- tolerance¶
float controls early stopping in gradient descent (at training time).
- direct_link¶
bool indicates whether the original features are included (True) in model’s fitting or not (False).
- verbose¶
int progress bar (yes = 1) or not (no = 0) (currently).
- seed¶
int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.
- backend¶
str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)
- solver¶
str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)
- activation¶
str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’
- type_pi¶
str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.
- replications¶
int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.
- n_clusters¶
int number of clusters for clustering the features
- clustering_method¶
str clustering method: currently ‘kmeans’, ‘gmm’
- cluster_scaling¶
str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’
- degree¶
int degree of features interactions to include in the model
- weights_distr¶
str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’
- hist¶
bool whether to use histogram features or not
- bins¶
int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)
- fit(self, double[:, ::1] X, double[:] y)¶
- get_feature_importances(self, double[:, ::1] X, columns=None, show_progress=True)¶
Compute average absolute sensitivity for each feature across the dataset. This serves as a feature importance measure.
- get_sensitivities(self, double[:, ::1] X, columns=None, show_progress=True)¶
Compute the gradient (sensitivity) of the response with respect to each input feature.
- Parameters:
X (np.ndarray) – Input data of shape (n_samples, n_features).
columns (list, optional) – List of features . If None, automatically named.
show_progress (bool, optional) – Whether to display a progress bar. Default is True.
- Returns:
Array of sensitivities for each sample and feature (∂F_M/∂x_j)
- Return type:
np.ndarray
- get_summary(self, double[:, ::1] X, conf_level=0.95, columns=None, show_progress=True)¶
Given a DataFrame of sensitivities (n_obs x n_features), this function returns a summary similar to skim in R with confidence intervals around average effects.
Parameters: - n_bootstrap: Number of bootstrap iterations for confidence intervals. - conf_level: Confidence level for the intervals (default: 95%).
Returns: - summary_df: A pandas DataFrame with feature-level summary statistics.
- predict(self, double[:, ::1] X)¶
- update(self, double[:] X, y, double alpha=0.5)¶
- class cybooster.NGBClassifier(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)¶
Bases:
object
Optimized NGBoost Classifier with Multinomial distribution (softmax)
- accuracy_score(self, ndarray X, ndarray y)¶
Compute classification accuracy
- fit(self, ndarray X, ndarray y)¶
Fit NGBoost classifier with multinomial distribution
- get_params(self, deep=True)¶
Get parameters for sklearn compatibility
- predict(self, ndarray X)¶
Predict class labels
- predict_logit(self, ndarray X)¶
Predict raw logit values
- predict_proba(self, ndarray X)¶
Predict class probabilities
- sample(self, ndarray X, int n_samples=1)¶
Sample from the predicted categorical distributions
- score(self, ndarray X, ndarray y)¶
Compute average log-likelihood score
- set_params(self, **params)¶
Set parameters for sklearn compatibility
- class cybooster.NGBRegressor(obj=None, int n_estimators=500, double learning_rate=0.01, double tol=1e-4, bool early_stopping=True, int n_iter_no_change=10, int verbose=1, feature_engineering=0)¶
Bases:
object
Optimized NGBoost implementation
- fit(self, ndarray X, ndarray y)¶
Fit NGBoost model with improved numerical stability
- get_params(self, deep=True)¶
Get parameters for sklearn compatibility
- predict(self, ndarray X, bool return_std=False)¶
Predict distribution parameters or point estimates
- predict_dist(self, ndarray X)¶
Predict full distribution parameters (mu, sigma)
- sample(self, ndarray X, int n_samples=1)¶
Sample from the predicted distributions
- score(self, ndarray X, ndarray y)¶
Compute log-likelihood score
- set_params(self, **params)¶
Set parameters for sklearn compatibility
- class cybooster.SkBoosterClassifier[source]¶
Bases:
BoosterClassifier
,BaseEstimator
,ClassifierMixin
A scikit-learn compatible wrapper for BoosterClassifier.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkBoosterClassifier ¶
Configure whether metadata should be requested to be passed to the
score
method.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True
(seesklearn.set_config()
). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class cybooster.SkBoosterRegressor[source]¶
Bases:
BoosterRegressor
,BaseEstimator
,RegressorMixin
A scikit-learn compatible wrapper for BoosterRegressor.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkBoosterRegressor ¶
Configure whether metadata should be requested to be passed to the
score
method.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True
(seesklearn.set_config()
). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class cybooster.SkNGBClassifier(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]¶
Bases:
BaseEstimator
,ClassifierMixin
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkNGBClassifier ¶
Configure whether metadata should be requested to be passed to the
score
method.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True
(seesklearn.set_config()
). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class cybooster.SkNGBRegressor(obj=None, n_estimators=500, learning_rate=0.01, tol=0.0001, early_stopping=True, n_iter_no_change=10, feature_engineering=False, use_jax=True, verbose=False)[source]¶
Bases:
BaseEstimator
,RegressorMixin
Scikit-learn compatible NGBoost regressor (Python wrapper).
This is a thin wrapper around the high-performance Cython implementation exposed as
cybooster._ngboost.NGBRegressor
. It mirrors a scikit-learn estimator interface while providing an optional JAX acceleration hook for internal linear algebra utilities.- Parameters:
obj (Any, optional) – Placeholder for future objectives (kept for backward compatibility). The current Cython backend expects this positional slot.
n_estimators (int, default=500) – Number of boosting iterations.
learning_rate (float, default=0.01) – Shrinkage applied to each boosting step.
tol (float, default=1e-4) – Tolerance used for early stopping monitoring in the backend.
early_stopping (bool, default=True) – Whether to enable early stopping based on log-likelihood improvement.
n_iter_no_change (int, default=10) – Number of successive iterations with change <
tol
to trigger stop.feature_engineering (bool, default=False) – If True, enables feature engineering through nnetsauce.
use_jax (bool, default=True) – If True and JAX is available, enables small JIT-compiled helpers.
verbose (bool, default=False) – If True, prints fitting diagnostics.
Notes
The underlying predictive distribution is Normal with parameters (mu, log_sigma). The backend learns both via natural gradients.
predict(X, return_std=False)
returns point estimates. Whenreturn_std=True
, it returns a 2D array with columns(mu, sigma)
.Use
predict_dist(X)
to directly obtain distribution parameters as(mu, sigma)
.
- fit(X, y)[source]¶
Fit the NGBoost model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training features.
y (array-like of shape (n_samples,)) – Target values.
- Returns:
self – Fitted estimator.
- Return type:
- predict(X, return_std=False)[source]¶
Predict values or distribution parameters.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input features.
return_std (bool, default=False) –
If False, returns point predictions (mu).
If True, returns an array of shape (n_samples, 2) with columns
(mu, sigma)
.
- Returns:
Either
(n_samples,)
array of means or(n_samples, 2)
with(mu, sigma)
per sample.- Return type:
ndarray
- predict_dist(X)[source]¶
Predict Normal distributions for each input sample.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input features.
- Returns:
A list of Normal distributions parameterized by predicted
mu
andsigma
for each input sample.- Return type:
list[scipy.stats.rv_continuous]
- set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') SkNGBRegressor ¶
Configure whether metadata should be requested to be passed to the
predict
method.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True
(seesklearn.set_config()
). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
return_std
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SkNGBRegressor ¶
Configure whether metadata should be requested to be passed to the
score
method.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True
(seesklearn.set_config()
). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object