Running LogML
LogML is not a simple script: it is a full-scale data analysis framework, so it comes loaded with the set of python libraries of all kinds, mostly related to data analysis or visualization of its results. This set of libraries is installed on the local machine in a form of so called conda environment which decouples LogML-specific python executable and all libraries from all other python installations.
To run LogML directly we have to activate its conda environment first.
log_ml.sh - bash script, which activates current environment and launches log_ml.py
log_ml.py - primary LogML ‘executable’ file, where you can run analysis, reporting and utility routines.
On a local machine we need to activate environment only once
# Activate environment by name, or by full path:
conda activate logml
# Go to root folder of LogML distribution
#
cd /projects/logml
# Invoke
python log_ml.py pipeline --help
To describe the procedure we use environment variables, however this is only for illustrative purpose:
export LML_OUTPUT=~/logml_output
export LML_RUN_NAME=wine_modeling_01
export LML_CFG=~/logml/examples/wine/modeling.yaml
export LML_DATASET=~/logml/examples/wine/wine.csv
python log_ml.py pipeline run \
-o $LML_OUTPUT \
-c $LML_CFG \
-d $LML_DATASET \
-n $LML_RUN_NAME
This starts process of step-by-step pipeline execution which will end in several minutes for this example, but in reality it may be quite long.
After the job is completed, you should expect the following artifacts within $LML_OUTPUT
folder:
$LML_RUN_NAME
folder with all the artifacts produced during the run$LML_OUTPUT/$LML_RUN_NAME/$LML_RUN_NAME.zip
- the result archive with the most important artifacts (report + some supporting artifacts)
$ # examine $LML_OUTPUT folder.
$ cd ~/logml_output && tree -L 2
.
└── wine_modeling_01 # $LML_RUN_NAME
├── default # Strata data ("default" when no explicit stratification).
├── analysis
├── analysis_metadata
├── logs
├── _logs
├── _dag
├── report
├── release # final report, folder zipped as $LML_RUN_NAME.zip
└── wine_modeling_01.zip # $LML_RUN_NAME.zip
Command Line Parameters
log_ml.py
LogML - Data Analysis Framework.
log_ml.py [OPTIONS] COMMAND [ARGS]...
config
Configuration commands (Validation, schema, etc.)
log_ml.py config [OPTIONS] COMMAND [ARGS]...
print-schema
Print configuration file schema.
log_ml.py config print-schema [OPTIONS]
Options
- -o, --output <output>
File path to dump output to
- --use-json, --no-use-json
Use json format (default is yaml)
validate
Check configuration file for validity.
log_ml.py config validate [OPTIONS] FILE
Options
- -o, --output <output>
Path to export configuration as one file
Arguments
- FILE
Required argument
info
Print LogML version and environment info.
log_ml.py info [OPTIONS]
models
Models commands.
log_ml.py models [OPTIONS] COMMAND [ARGS]...
list
List available models.
log_ml.py models list [OPTIONS]
Options
- -o, --objective <objective>
- Options
classification | regression | survival
- --short
Output in short format, models names only.
pipeline
Pipeline DAG commands.
log_ml.py pipeline [OPTIONS] COMMAND [ARGS]...
generate_dag
Generates DAG for the pipeline config.
log_ml.py pipeline generate_dag [OPTIONS]
Options
- -n, --run-name <run_name>
LogML Run Name/Identifier
- -c, --config-path <config_path>
Path to the target config.
- -o, --output-path <output_path>
Folder where output artifacts will be stored.
- -d, --dataset-path <dataset_path>
Path to the input dataset file.
run
Run LogML pipeline.
log_ml.py pipeline run [OPTIONS]
Options
- --step <step>
(Optional) Target pipeline steps to run.
- -n, --run-name <run_name>
LogML Run Name/Identifier
- -p, --project-id <project_id>
Unique project identifier (optional).
- -c, --config-path <config_path>
Path to the config file. When executing a step, should be DAG config.
- --job-completion-file <job_completion_file>
Path to the file to be created after job is complete
- -d, --dataset-path <dataset_path>
Path to the input dataset file.
- -m, --mct-config-path <mct_config_path>
Path to the MCT configuration (metadata) file.
- -o, --output-path <output_path>
Folder where output artifacts will be stored.
- -j, --job-id <job_id>
Optional job id for diagnostic purpose.
- -l, --log-file <log_file>
Optional log file name (full path preferred).
- --profile, --no-profile
Track execution time with stopwatch.
- --sig-db-path <sig_db_path>
Path to the local export from Signature database (such as MSigDB).
- --dry-run, --no-dry-run
Dry run (only output intentions).
- --debug, --no-debug
Debug mode (removes some dag constraints for debugging).
- --reset, --no-reset
When set, will reset DAG state (ignored when “–step” are provided.
- --njobs <njobs>
Number of jobs to execute in parallel.
Environment variables
- LOGML_DEBUG
Provide a default for
--debug
state
Describe pipeline state (useful when running).
log_ml.py pipeline state [OPTIONS]
Options
- -n, --run-name <run_name>
LogML Run Name/Identifier
- -o, --output-path <output_path>
Folder where output artifacts will be stored.