# Tutorial Cyclic Boosting can be used in a [scikit-learn](https://scikit-learn.org/stable/)-like fashion. Several examples can be found in the [integration tests](https://github.com/Blue-Yonder-OSS/cyclic-boosting/blob/main/tests/test_integration.py). A more detailed example, including additional helper functionality, can be found [here](https://github.com/Blue-Yonder-OSS/cyclic-boosting-example). For the simplest, default case, just do: ```python from cyclic_boosting.pipelines import pipeline_CBPoissonRegressor CB_est = pipeline_CBPoissonRegressor() CB_est.fit(X_train, y) yhat = CB_est.predict(X_test) ``` ## Analysis Plots To additionally create analysis plots of the training: ```python from cyclic_boosting.pipelines import pipeline_CBPoissonRegressor from cyclic_boosting import observers from cyclic_boosting.plots import plot_analysis def plot_CB(filename, plobs, binner): for i, p in enumerate(plobs): plot_analysis( plot_observer=p, file_obj=filename + "_{}".format(i), use_tightlayout=False, binners=[binner] ) plobs = [observers.PlottingObserver(iteration=-1)] CB_est = pipeline_CBPoissonRegressor(observers=plobs) CB_est.fit(X_train, y) plot_CB('analysis_CB_iterlast', [CB_est[-1].observers[-1]], CB_est[-2]) yhat = CB_est.predict(X_test) ``` ## Set Feature Properties By setting feature properties/flags (all available ones can be found [here](https://cyclic-boosting.readthedocs.io/en/latest/cyclic_boosting.html#module-cyclic_boosting.flags)), you can also specify the treatment of individual features, e.g., as continuous or categorical (including treatment of missing values): ```python from cyclic_boosting.pipelines import pipeline_CBPoissonRegressor from cyclic_boosting import flags fp = {} fp['feature1'] = flags.IS_UNORDERED fp['feature1'] = flags.IS_CONTINUOUS | flags.HAS_MISSING | flags.MISSING_NOT_LEARNED CB_est = pipeline_CBPoissonRegressor(feature_properties=fp) CB_est.fit(X_train, y) yhat = CB_est.predict(X_test) ``` Quick overview of the basic flags: - **IS_CONTINUOUS**: can be used to do a binning (by default equi-statistics) of a continuous feature and smooth the factors estimated for each bin by an orthogonal polynomial. - **IS_LINEAR**: works similar to **IS_CONTINUOUS**, but uses a linear function for smoothing. - **IS_UNORDERED**: can be used for categorical features. - **IS_ORDERED**: in principle, should smooth categorical features by weighing neighboring bins higher, but currently just points to **IS_UNORDERED**. - **HAS_MISSING**: learns an additional, separate category for missing values (all nan values of a feature). - **MISSING_NOT_LEARNED**: puts all missing values (all nan values of a feature) to an additional, separate category, which is set to the neutral factor (e.g., 1 for multiplicative or 0 for additive regression mode). ## Set Features You can also specify which columns to use as features, including interaction terms (default is all available columns as individual features only): ```python from cyclic_boosting.pipelines import pipeline_CBPoissonRegressor features = [ 'feature1', 'feature2', ('feature1', 'feature2') ] CB_est = pipeline_CBPoissonRegressor(feature_groups=features) CB_est.fit(X_train, y) yhat = CB_est.predict(X_test) ``` There is also some functionality for interaction term selection, exploiting feature binning: ```python from cyclic_boosting.interaction_selection import select_interaction_terms_anova best_interaction_term_features = select_interaction_terms_anova(X_train, y, fp, 3, 5) ``` ## Manual Binning Behind the scenes, Cyclic Boosting works by combining a binning method (e.g., [BinNumberTransformer](https://github.com/Blue-Yonder-OSS/cyclic-boosting/blob/main/cyclic_boosting/binning/bin_number_transformer.py)) with a Cyclic Boosting estimator (find all estimators [here](https://github.com/Blue-Yonder-OSS/cyclic-boosting/blob/main/cyclic_boosting/__init__.py)). If you want to use a different number of bins (default is 100): ```python from cyclic_boosting.pipelines import pipeline_CBPoissonRegressor CB_est = pipeline_CBPoissonRegressor(number_of_bins=50) CB_est.fit(X_train, y) yhat = CB_est.predict(X_test) ``` If you want to use a different kind of binning (below is default), you can combine binners and estimators manually: ```python from sklearn.pipeline import Pipeline from cyclic_boosting import binning, CBPoissonRegressor binner = binning.BinNumberTransformer() est = CBPoissonRegressor() CB_est = Pipeline([("binning", binner), ("CB", est)]) CB_est.fit(X_train, y) yhat = CB_est.predict(X_test) ``` ## Feature Importances To get a dictionary with the relative importances of the different model features in the training: ```python CB_est.get_feature_importances() ``` ## Individual Explainability To get a dictionary with the contributions of the different model features to the individual predictions of a given data set: ```python CB_est.get_feature_contributions(X_test) ``` ## Quantile Regression Below you can find an example of a quantile regression model for three different quantiles, with a subsequent quantile matching (to get a full individual probability distribution from the estimated quantiles) by means of a Johnson Quantile-Parameterized Distribution (J-QPD) for each test sample: ```python from cyclic_boosting.pipelines import pipeline_CBMultiplicativeQuantileRegressor from cyclic_boosting.quantile_matching import J_QPD_S CB_est_qlow = pipeline_CBMultiplicativeQuantileRegressor(quantile=0.2) CB_est_qlow.fit(X_train, y) yhat_qlow = CB_est_qlow.predict(X_test) CB_est_qmedian = pipeline_CBMultiplicativeQuantileRegressor(quantile=0.5) CB_est_qmedian.fit(X_train, y) yhat_qmedian = CB_est_qmedian.predict(X_test) CB_est_qhigh = pipeline_CBMultiplicativeQuantileRegressor(quantile=0.8) CB_est_qhigh.fit(X_train, y) yhat_qhigh = CB_est_qhigh.predict(X_test) j_qpd_s = J_QPD_S(0.2, yhat_qlow, yhat_qmedian, yhat_qhigh) yhat_percentile95 = j_qpd_s.ppf(0.95) ``` There is also a ready-made end-to-end practical training chain, employing quantile transformations to impose constraints on the target range (for bound or semi-bound scenarios) and maintain the order of symmetric-percentile triplet predictions (from an arbitrary quantile regression method, not restricted to Cyclic Boosting) used for J-QPD: ```python from cyclic_boosting.pipelines import pipeline_CBAdditiveQuantileRegressor from cyclic_boosting.quantile_matching import QPD_RegressorChain est = QPD_RegressorChain( pipeline_CBAdditiveQuantileRegressor(quantile=0.5), pipeline_CBAdditiveQuantileRegressor(quantile=0.5), pipeline_CBAdditiveQuantileRegressor(quantile=0.5), "S", ) est.fit(X_train, y) yhat_qlow, yhat_qmedian, yhat_qhigh, qpd = est.predict(X_test) ```