feature_selection#

caf.brain.ml.feature_selection(data, output_path, categorical_features, numerical_features, target, custom_index=None, is_encoded=True, weight=None, sample_size_encode=None, select_encode_values=None, encode_values_to_drop=None)[source]#

Conduct simple feature selection.

Parameters:

is_encoded (bool) – is data encoded and scaled.
data (DataFrame) – Pandas Dataframe of your data. Structured or semi-structured tabular format.
output_path (Path | str) – Path to output location.
target (str) – Column in your data that is the target variable (Y, dependent variable), what you want to predict.
weight (str | None) – Optional string column value to be used as weight.
custom_index (list[str] | None) – Columns in your data that are to be indexed e.g. year, geography.
categorical_features (list[str] | None) – List of column names (strings) that are categorical variables.
numerical_features (list[str] | None) – List of column names (strings) that are continuous variables.
sample_size_encode (bool | None) – Optional bool. If true, the data will be split based on sample size. Variables with the largest sample size will be used as reference class.
select_encode_values (bool | None) – Optional bool. If True, data is split based on custom values set by the user. Corresponds to encode_values_to_drop.
encode_values_to_drop (list[str] | None) – If select_encode_values is True, then this must be a list of strings the length of categorical_features. Position one in the list will link to the first variable provided in categorical_features and so on.

Returns:

Feature selected dataset.

Return type:

df_final