logml.data.readers

Functions

filter_params_for_reader(path, **kwargs)

Selects applicable params for reader of a given file.

get_file_reader(path)

Parses a given file's extension and returns the corresponding 'pd.read_XXX' reader functions.

load_dataframe(path, **kwargs)

Utility for loading a given file.

parse_dates(dataframe, **kwargs)

Parses dates columns using the following params:

read_dataframe(global_params)

Reads a dataframe defined by a given params, applies strata filtering.

sanitize_column(col_name, **kwargs)

Replaces unallowed characters in a given column name, casts to lower case.

sanitize_columns(dataframe, **kwargs)

Sanitizes columns of a given dataframe.

logml.data.readers.sanitize_column(col_name: str, **kwargs) str

Replaces unallowed characters in a given column name, casts to lower case. Additional rules/flags could be added.

logml.data.readers.sanitize_columns(dataframe: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame

Sanitizes columns of a given dataframe.

Special flags:
  • col_prefix - a given prefix will be prepended to all column names

  • replace_dot - whether ‘.’ should be replaced in column names or not

logml.data.readers.get_file_reader(path: str) Callable

Parses a given file’s extension and returns the corresponding ‘pd.read_XXX’ reader functions.

logml.data.readers.filter_params_for_reader(path: str, **kwargs) Dict

Selects applicable params for reader of a given file.

logml.data.readers.parse_dates(dataframe: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame
Parses dates columns using the following params:
  • parse_dates - list of columns to parse

  • dateformat - expected format

  • datetime_errors - whether raise or ignore dates parsing exceptions

logml.data.readers.load_dataframe(path: str, **kwargs)

Utility for loading a given file.

In case a given input is csv file, additional parsing flags are applied:
  • parse_dates - list of columns that contain dates

  • dateformat - for dates parsing

  • datetime_errors - whether raise or ignore dates parsing exceptions

  • sep - separator in csv files

  • encoding - csv files encoding

  • header - whether a given file has a header or not

  • sanitize_columns - whether column names need to be cleaned or not

  • col_prefix - if column sanitizing is required, a given prefix

    will be prepended to all ‘clean’ column names

  • replace_dot - whether ‘.’ should be replaced in column names or not

logml.data.readers.read_dataframe(global_params: Dict) pandas.core.frame.DataFrame

Reads a dataframe defined by a given params, applies strata filtering.