reports

TH.reports.geolocate(*args, **kwargs)pandas.core.frame.DataFrame

Adds geolocation info to input DataFrame rows.

Parameters
  • input – pandas Dataframe to add geolocation info

  • ip_column – Name of the input column containing IPs to geolocate

Returns

merge of input pandas DataFrame and geolocation info

TH.reports.group(*args, **kwargs)pandas.core.frame.DataFrame

Obtains a report pandas.Dataframe out of the given dataframe grouping and counting by the given column values.

Parameters
  • input – the source dataframe

  • by – List of columns of the source dataframe used to group the rows

  • sum – When defined will perform the sum of values in the specified column (instead of counting)

  • name – Name of the additional column created with the count for each of the grouped rows

Code example:

def do_report(source_dataframe)
    return reports.group(source_dataframe, by=['MUID', 'UserName'], name='Actions')
Returns

Dataframe with the resulting data or exception

TH.reports.profile(left: pandas.core.frame.DataFrame, right: pandas.core.frame.DataFrame, column: str)pandas.core.frame.DataFrame

Obtains a pandas.DataFrame as the result of profiling two (left and right) dataframes.

Both dataframes must be of the same type or else the operation will fail.
The resulting dataframe will contain the source rows except those of the train dataframe whose values (for the given column) match the source ones
Parameters
  • left – The source dataframe with the current data

  • right – The dataframe against we will perform the profiling

  • column – The column we will use to perform the profiling

Code example:

def do_profile(train_period, test_period)
    df1 = obtain_dataframe(period=train_period)
    df2 = obtain_dataframe(period=test_period)
    return reports.profile(df1, df2, 'column_name')
Returns

DataFrame with the resulting data or exception

TH.reports.top(*args, **kwargs)pandas.core.frame.DataFrame

Obtains a pandas.DataFrame with the top results for a given one

Parameters
  • input – the source dataframe

  • n – Number of rows for the resulting dataframe

  • by – Name of the column used to order the dataframe results

  • ascending – True to return the ‘n’ greater results according to column ‘by’ and false for the ‘n’ lowest

Code example:

def do_top(source_dataframe)
    return reports.top(source_dataframe, n=10, by='Actions', ascending=False)
Returns

DataFrame with the resulting data or exception