Feature Engineering¶

feature_engineering.feature_generator.valid_features_detector(dataframe: dict, categorical_features: list, class_number_range: list) → list¶

Feature validator

The functions checks if the one-hot encoding method should be applied to the given features.

Parameters:	dataframe (dict) – A pandas dataframe which contain the dataset categorical_features (list) – A list of string that contains the name of the columns or features that contain categorical data type. class_number_range (list) – It is a list of two elements which define the minimum the and maximum number of the classes (unique value) that a feature should contain in order to apply the one-hot encoding to this feature.
Returns:	valid_features: A list of the features which the encoding will be applied to.
Return type:	list

feature_engineering.feature_generator.encoding_features(encoding_type: str, dataframes_dict: dict, reference: str, categorical_features: list, ignore_columns: list, class_number_range: list = [0, 50], target_name: str = None) → dict¶

One-hot encoder

The function applies one-hot encoding to the categorical features using the Scikit Learn framework implementation.

Parameters:	encoding_type (str) – The type of the encoding method that will be applied. For example: one-hot, target For more information please check the following reference: https://contrib.scikit-learn.org/categorical-encoding/index.html dataframes_dict (dict) – A dictionary that contains the dataframes before applying the encoding e.g. dataframes_dict={ ‘train’: train_dataframe, ‘test’: ‘test_dataframe’} reference (str) – The name of the dataframe that will be considered when validating the type of the data categorical_features (list) – A list of string that contains the name of the columns or features that contain categorical data type. class_number_range (list) – A list that contains two integers which refer ot the range of the minimum and the maximum number of the labels/classes/ categories. If a number of the categories of the feature is not in that defined range, one-hot encoding will be not applied to that feature. It can be ignored if the encoding type is not one-hot. ignore_columns (list) – list of strings which are the columns names. The encoding will not be applied to those columns. target_name (str) – The name of the column that contains the labels that should be predicted by the model. If the encoding method doesn’t require that target, it can be ignored.
Returns:	dataframes_dict_encoded: A dictionary that contains the dataframes after applying feature encoding e.g. dataframes_dict_encoded={ ‘train’: train_dataframe, ‘test’: ‘test_dataframe’}
Return type:	dict