tidy_data#

caf.brain.ml.tidy_data(data_path, classification_prediction, output_path, categorical_features, numerical_features, target, custom_index=None, weight=None, column_name_to_drop_rows=None, value_in_row=None, data=None)[source]#

Converts semi-structured data into structured for machine learning.

Parameters:
  • data_path (Path | None) – Path to your structured or semi-structured tabular data.

  • classification_prediction (tuple[int, ...] | None) – List of integers that correspond to the target column. The value(s) to predict in a classification problem.

  • output_path (Path | str) – Path to output location.

  • categorical_features (list[str] | None) – List of column names (strings) that are categorical variables.

  • numerical_features (list[str] | None) – List of column names (strings) that are continuous variables.

  • custom_index (list[str] | None) – Columns in your data that are to be indexed e.g. year, geography.

  • target (str) – Column in your data that is the target variable (Y, dependent variable), what you want to predict.

  • weight (str | None) – Optional string column value to be used as weight.

  • column_name_to_drop_rows (list[str] | None) – List of string column names that contain values to drop.

  • value_in_row (list[str | float | int] | None) – Corresponding values for column_name_to_drop_rows.

  • data (DataFrame | None) – Pandas Dataframe of your data. Structured or semi-structured tabular format.

Return type:

Structured dataframe.