reports¶

TH.reports.geolocate(*args, **kwargs) → pandas.core.frame.DataFrame¶

Adds geolocation info to input DataFrame rows.

Parameters

input – pandas Dataframe to add geolocation info
ip_column – Name of the input column containing IPs to geolocate

Returns

merge of input pandas DataFrame and geolocation info

TH.reports.group(*args, **kwargs) → pandas.core.frame.DataFrame¶

Obtains a report pandas.Dataframe out of the given dataframe grouping and counting by the given column values.

Parameters

input – the source dataframe
by – List of columns of the source dataframe used to group the rows
sum – When defined will perform the sum of values in the specified column (instead of counting)
name – Name of the additional column created with the count for each of the grouped rows

Code example:

def do_report(source_dataframe)
    return reports.group(source_dataframe, by=['MUID', 'UserName'], name='Actions')

Returns: Dataframe with the resulting data or exception

TH.reports.profile(left: pandas.core.frame.DataFrame, right: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame¶

Obtains a pandas.DataFrame as the result of profiling two (left and right) dataframes.

Both dataframes must be of the same type or else the operation will fail.

The resulting dataframe will contain the source rows except those of the train dataframe whose values (for the given column) match the source ones

Parameters

left – The source dataframe with the current data
right – The dataframe against we will perform the profiling
column – The column we will use to perform the profiling

Code example:

def do_profile(train_period, test_period)
    df1 = obtain_dataframe(period=train_period)
    df2 = obtain_dataframe(period=test_period)
    return reports.profile(df1, df2, 'column_name')

Returns: DataFrame with the resulting data or exception

TH.reports.top(*args, **kwargs) → pandas.core.frame.DataFrame¶

Obtains a pandas.DataFrame with the top results for a given one

Parameters

input – the source dataframe
n – Number of rows for the resulting dataframe
by – Name of the column used to order the dataframe results
ascending – True to return the ‘n’ greater results according to column ‘by’ and false for the ‘n’ lowest

Code example:

def do_top(source_dataframe)
    return reports.top(source_dataframe, n=10, by='Actions', ascending=False)

Returns: DataFrame with the resulting data or exception