Running LogML

LogML is not a simple script: it is a full-scale data analysis framework, so it comes loaded with the set of python libraries of all kinds, mostly related to data analysis or visualization of its results. This set of libraries is installed on the local machine in a form of so called conda environment which decouples LogML-specific python executable and all libraries from all other python installations.

To run LogML directly we have to activate its conda environment first.

log_ml.sh - bash script, which activates current environment and launches log_ml.py
log_ml.py - primary LogML ‘executable’ file, where you can run analysis, reporting and utility routines.

On a local machine we need to activate environment only once

# Activate environment by name, or by full path:
conda activate logml

# Go to root folder of LogML distribution
#
cd /projects/logml

# Invoke
python log_ml.py pipeline --help

To describe the procedure we use environment variables, however this is only for illustrative purpose:

export LML_OUTPUT=~/logml_output
export LML_RUN_NAME=wine_modeling_01
export LML_CFG=~/logml/examples/wine/modeling.yaml
export LML_DATASET=~/logml/examples/wine/wine.csv


python log_ml.py pipeline run \
    -o $LML_OUTPUT \
    -c $LML_CFG \
    -d $LML_DATASET \
    -n $LML_RUN_NAME

This starts process of step-by-step pipeline execution which will end in several minutes for this example, but in reality it may be quite long.

After the job is completed, you should expect the following artifacts within $LML_OUTPUT folder:

$LML_RUN_NAME folder with all the artifacts produced during the run
$LML_OUTPUT/$LML_RUN_NAME/$LML_RUN_NAME.zip - the result archive with the most important artifacts (report + some supporting artifacts)

$ # examine $LML_OUTPUT folder.
$ cd ~/logml_output && tree -L 2
  .
  └── wine_modeling_01  # $LML_RUN_NAME
      ├── default  # Strata data ("default" when no explicit stratification).
      ├── analysis
      ├── analysis_metadata
      ├── logs
      ├── _logs
      ├── _dag
      ├── report
      ├── release  # final report, folder zipped as $LML_RUN_NAME.zip
      └── wine_modeling_01.zip # $LML_RUN_NAME.zip

Command Line Parameters

log_ml.py

LogML - Data Analysis Framework.

log_ml.py [OPTIONS] COMMAND [ARGS]...

config

Configuration commands (Validation, schema, etc.)

log_ml.py config [OPTIONS] COMMAND [ARGS]...

print-schema

Print configuration file schema.

log_ml.py config print-schema [OPTIONS]

Options

-o, --output <output>: File path to dump output to

--use-json, --no-use-json: Use json format (default is yaml)

validate

Check configuration file for validity.

log_ml.py config validate [OPTIONS] FILE

Options

-o, --output <output>: Path to export configuration as one file

Arguments

FILE: Required argument

info

Print LogML version and environment info.

log_ml.py info [OPTIONS]

models

Models commands.

log_ml.py models [OPTIONS] COMMAND [ARGS]...

list

List available models.

log_ml.py models list [OPTIONS]

Options

-o, --objective <objective>

Options: classification | regression | survival

--short: Output in short format, models names only.

pipeline

Pipeline DAG commands.

log_ml.py pipeline [OPTIONS] COMMAND [ARGS]...

generate_dag

Generates DAG for the pipeline config.

log_ml.py pipeline generate_dag [OPTIONS]

Options

-n, --run-name <run_name>: LogML Run Name/Identifier

-c, --config-path <config_path>: Path to the target config.

-o, --output-path <output_path>: Folder where output artifacts will be stored.

-d, --dataset-path <dataset_path>: Path to the input dataset file.

run

Run LogML pipeline.

log_ml.py pipeline run [OPTIONS]

Options

--step <step>: (Optional) Target pipeline steps to run.

-n, --run-name <run_name>: LogML Run Name/Identifier

-p, --project-id <project_id>: Unique project identifier (optional).

-c, --config-path <config_path>: Path to the config file. When executing a step, should be DAG config.

--job-completion-file <job_completion_file>: Path to the file to be created after job is complete

-d, --dataset-path <dataset_path>: Path to the input dataset file.

-m, --mct-config-path <mct_config_path>: Path to the MCT configuration (metadata) file.

-o, --output-path <output_path>: Folder where output artifacts will be stored.

-j, --job-id <job_id>: Optional job id for diagnostic purpose.

-l, --log-file <log_file>: Optional log file name (full path preferred).

--profile, --no-profile: Track execution time with stopwatch.

--sig-db-path <sig_db_path>: Path to the local export from Signature database (such as MSigDB).

--dry-run, --no-dry-run: Dry run (only output intentions).

--debug, --no-debug: Debug mode (removes some dag constraints for debugging).

--reset, --no-reset: When set, will reset DAG state (ignored when “–step” are provided.

--njobs <njobs>: Number of jobs to execute in parallel.

Environment variables

LOGML_DEBUG: Provide a default for --debug

state

Describe pipeline state (useful when running).

log_ml.py pipeline state [OPTIONS]

Options

-n, --run-name <run_name>: LogML Run Name/Identifier

-o, --output-path <output_path>: Folder where output artifacts will be stored.