LogML Pipeline

When you provide LogML with data and configuration file, it first determines what individual steps are to be executed and their dependencies.

Typical dependency line for each analysis looks like:

Data preprocessing: prepare data for modeling. Result is LogML Dataset entity.
Specific analysis: accept dataset and perform analsys:
- For example, Feature Importance analysis performs modeling and extraction of features essential for target explanation.
Report generation: use artifacts generated by analysis step to render results visualization.
Result packaging: archive report and important analysis artifacts.

Essentially an “analysis” is a set of LogML components (module) which receives incoming data, builds a model of some sort - statistical or machine learning model - and then makes a conclusion about the data.

In general there are three types of questions LogML analyses try to answer:

What is the relation between covariates and target variables? (Modeling and Survival Analysis).
Given groups of samples, what features define separation? (Expression analysis).
Is there anything specific about statistical properties of data (Exploratory analysis).

See following sections for details of analyses kinds provided by LogML.