logml.configuration.stratification

Functions

patch_global_params_using_strata(global_params)

Patches a given global_params using strata info.

stratify_global_params(cfg, global_params)

Creates global configs per strata.

class logml.configuration.stratification.Strata

Bases: pydantic.main.BaseModel

Defines data subset (stratum) by the means of query to filter origina dataset.

This allows to run analysis against different subsets of data, and then comparing

Typical example here: you would want to get per-arm report (assuming your input file contains a column arm):

stratification:
# Here we wanted to separate data by treatment arms into two groups.
- strata_id: A_arm query: ‘arm == “A”’
- strata_id: BC_arms query: ‘arm.isin([“B”, “C”])’

Show JSON schema
{
   "title": "Strata",
   "description": "Defines data subset (stratum) by the means of query to filter origina dataset.\n\nThis allows to run analysis against different subsets of data, and then comparing\n\nTypical example here: you would want to get per-arm report (assuming your input file contains a column arm):\n\n.. code-block:: yaml\n\n    stratification:\n    # Here we wanted to separate data by treatment arms into two groups.\n    - strata_id: A_arm query: \u2018arm == \u201cA\u201d\u2019\n    - strata_id: BC_arms query: \u2018arm.isin([\u201cB\u201d, \u201cC\u201d])\u2019",
   "type": "object",
   "properties": {
      "strata_id": {
         "title": "Strata Id",
         "description": "Unique identifier for a stratum. This identifier is also used as a folder name to store stratum-relared data on a disk, so it should not contain files-specific symbols like slashes. NOTE: spaces will be replaced with underscores.",
         "type": "string"
      },
      "query": {
         "title": "Query",
         "description": "Query-like expression that indicates how to select samples for corresponding stratum. This follows python syntax which is quite unlike SQL. See :ref:`Dataset Queries` for details.",
         "type": "string"
      }
   },
   "required": [
      "strata_id",
      "query"
   ]
}

Fields
field strata_id: str [Required]

Unique identifier for a stratum. This identifier is also used as a folder name to store stratum-relared data on a disk, so it should not contain files-specific symbols like slashes. NOTE: spaces will be replaced with underscores.

field query: str [Required]

Query-like expression that indicates how to select samples for corresponding stratum. This follows python syntax which is quite unlike SQL. See Dataset Queries for details.

logml.configuration.stratification.patch_global_params_using_strata(global_params: Dict, strata: Optional[logml.configuration.stratification.Strata] = None) Dict

Patches a given global_params using strata info.

logml.configuration.stratification.stratify_global_params(cfg: GlobalConfig, global_params: dict) List[Dict]

Creates global configs per strata.