logml.configuration.stratification
Functions
|
Patches a given global_params using strata info. |
|
Creates global configs per strata. |
- class logml.configuration.stratification.Strata
Bases:
pydantic.main.BaseModel
Defines data subset (stratum) by the means of query to filter origina dataset.
This allows to run analysis against different subsets of data, and then comparing
Typical example here: you would want to get per-arm report (assuming your input file contains a column arm):
stratification: # Here we wanted to separate data by treatment arms into two groups. - strata_id: A_arm query: ‘arm == “A”’ - strata_id: BC_arms query: ‘arm.isin([“B”, “C”])’
Show JSON schema
{ "title": "Strata", "description": "Defines data subset (stratum) by the means of query to filter origina dataset.\n\nThis allows to run analysis against different subsets of data, and then comparing\n\nTypical example here: you would want to get per-arm report (assuming your input file contains a column arm):\n\n.. code-block:: yaml\n\n stratification:\n # Here we wanted to separate data by treatment arms into two groups.\n - strata_id: A_arm query: \u2018arm == \u201cA\u201d\u2019\n - strata_id: BC_arms query: \u2018arm.isin([\u201cB\u201d, \u201cC\u201d])\u2019", "type": "object", "properties": { "strata_id": { "title": "Strata Id", "description": "Unique identifier for a stratum. This identifier is also used as a folder name to store stratum-relared data on a disk, so it should not contain files-specific symbols like slashes. NOTE: spaces will be replaced with underscores.", "type": "string" }, "query": { "title": "Query", "description": "Query-like expression that indicates how to select samples for corresponding stratum. This follows python syntax which is quite unlike SQL. See :ref:`Dataset Queries` for details.", "type": "string" } }, "required": [ "strata_id", "query" ] }
- Fields
- field strata_id: str [Required]
Unique identifier for a stratum. This identifier is also used as a folder name to store stratum-relared data on a disk, so it should not contain files-specific symbols like slashes. NOTE: spaces will be replaced with underscores.
- field query: str [Required]
Query-like expression that indicates how to select samples for corresponding stratum. This follows python syntax which is quite unlike SQL. See Dataset Queries for details.
- logml.configuration.stratification.patch_global_params_using_strata(global_params: Dict, strata: Optional[logml.configuration.stratification.Strata] = None) Dict
Patches a given global_params using strata info.
- logml.configuration.stratification.stratify_global_params(cfg: GlobalConfig, global_params: dict) List[Dict]
Creates global configs per strata.