tidy_data#

caf.brain.ml.tidy_data(data_path, classification_prediction, output_path, categorical_features, numerical_features, target, custom_index=None, weight=None, column_name_to_drop_rows=None, value_in_row=None, data=None)[source]#

Converts semi-structured data into structured for machine learning.

Parameters:

data_path (Path | None) – Path to your structured or semi-structured tabular data.
classification_prediction (tuple[int, ...] | None) – List of integers that correspond to the target column. The value(s) to predict in a classification problem.
output_path (Path | str) – Path to output location.
categorical_features (list[str] | None) – List of column names (strings) that are categorical variables.
numerical_features (list[str] | None) – List of column names (strings) that are continuous variables.
custom_index (list[str] | None) – Columns in your data that are to be indexed e.g. year, geography.
target (str) – Column in your data that is the target variable (Y, dependent variable), what you want to predict.
weight (str | None) – Optional string column value to be used as weight.
column_name_to_drop_rows (list[str] | None) – List of string column names that contain values to drop.
value_in_row (list[str | float | int] | None) – Corresponding values for column_name_to_drop_rows.
data (DataFrame | None) – Pandas Dataframe of your data. Structured or semi-structured tabular format.

Return type:

Structured dataframe.