Visualization

class visualization.visualization.CompareStatistics(dataframes_dict: dict)

Statistic explorer

The class contains methods for exploring the statistics of a given feature.

Parameters:dataframes_dict (dict) – a dictionary of pandas dataframes e.g. {“train”: train_dataframe, “test”: test_dataframe}
__init__(dataframes_dict: dict)
Parameters:dataframes_dict – A dictionary of pandas dataframes e.g. {“train”: train_dataframe, “test”: test_dataframe}
check_feature_valid(feature_nr: int) → bool

Feature’s value validator

The function validate if it is possible to derive the statistical properties of a given feature.

Parameters:feature_nr (int) – The index of the column where the feature is.
Returns:True if it is possible to calculate the statistical properties of the given feature. Otherwise false.
compare_statistics_function(feature_nr: int)

Statistic plotter

This function plots the statistical values of a certain feature among all given datasets in a single graph.

Parameters:feature_nr (int) – The index of the column where the feature is.
visualization.visualization.histogram(data: pandas.core.series.Series)

Histogram plotter

This function plots the histogram of the numeric values of a certain feature when calling the Interactive data explorer function in the preprocessing package.

Parameters:data (pd.Series) – The values of the given feature/column from the dataset
visualization.visualization.explore_missing_values(dataframes_dict: dict, number_of_features: int)

Missing values explorer

This function plots the amount of missing values for given amount of features among all datasets.

Parameters:
  • dataframes_dict (dict) – A dictionary of pandas dataframes e.g. {“train”: train_dataframe, “test”: test_dataframe}
  • number_of_features (int) – The number of the features that should be shown in the plot.
visualization.visualization.compare_statistics(dataframes_dict: dict)

Interactive statistic explorer

This function is designed to be run in a Jupyter notebook. The user can go through the feature interactively using a slider.

Parameters:dataframes_dict (dict) – A dictionary of pandas dataframes e.g. {“train”: train_dataframe, “test”: test_dataframe}