logml.data.datasets.survival_dataset
Functions
|
Generate survival targets in scikit-survival library style. |
Classes
|
Dataset extension for survival analysis. |
- logml.data.datasets.survival_dataset.make_survival_target(dataframe: pandas.core.frame.DataFrame, event_query: str, time_column: str) numpy.array
Generate survival targets in scikit-survival library style.
Resulting structure is np.array with shape=(nrows,) and two fields: (event: np.bool, time: np.float64)
- Parameters
dataframe – Dataframe to fetch columns from.
event_query – Event query. In case if query result dtype != bool, will be cast to bool.
time_column – Time column. Will be cast to float64
- Returns
Survival targets in scikit-survival style.
- Return type
np.array
- class logml.data.datasets.survival_dataset.UnivariateSurvivalContainer
Bases:
pydantic.main.BaseModel
Wrapper for survival target (events and times) and one feature.
Show JSON schema
{ "title": "UnivariateSurvivalContainer", "description": "Wrapper for survival target (events and times) and one feature.", "type": "object", "properties": { "column_name": { "title": "Column Name", "description": "Name of the only column of interest.", "type": "string" }, "events": { "title": "Events", "description": "List of indicators of whether events occured.", "type": "array", "items": {} }, "times": { "title": "Times", "description": "List of time-to-event measurments (OS, PFS, etc.).", "type": "array", "items": { "type": "number" } }, "values": { "title": "Values", "description": "List of values that correspond to the only variable.", "type": "array", "items": {} }, "threshold": { "title": "Threshold", "description": "Threshold that is used to split the values into Low and High groups.\n NOTE: applicable for numericals only.", "type": "number" } }, "required": [ "column_name", "events", "times", "values" ] }
- field column_name: str [Required]
Name of the only column of interest.
- field events: List [Required]
List of indicators of whether events occured.
- field times: List[float] [Required]
List of time-to-event measurments (OS, PFS, etc.).
- field values: List [Required]
List of values that correspond to the only variable.
- field threshold: float = None
Threshold that is used to split the values into Low and High groups. NOTE: applicable for numericals only.
- discretize_values()
Binarizes values based on a threshold.
In case groups are not discrete, simply uses median as a threshold to create two groups - ‘Low’ and ‘High’.
- groups_to_str() str
Returns a string representation of available values.
- get_valid_cut_offs(n_percentiles: int, min_population: float) List[float]
Returns a list of percentiles that split groups range into valid parts.
- property size: int
Return a number of samples within the container.
- class logml.data.datasets.survival_dataset.SurvivalDataset(*dont_use_positional_args, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None, objective_cfg: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, cross_validator: Optional[Union[sklearn.model_selection._split.BaseCrossValidator, Iterable]] = None, features: Optional[List[str]] = None, logger=None, **kwargs)
Bases:
logml.data.datasets.cv_dataset.ModelingDataset
,logml.data.datasets.base.CrossValidationMixin
Dataset extension for survival analysis. Expects presence of special field ‘event_column’, which is by default included into target variable as a form (event, survival_time)
Example config: .. code-block:: yaml
- modeling:
- problems:
- y_regression:
- metadata:
task: survival target: ‘time’ event_query: ‘cens == 1’ # query to generate boolean value. # event column (not to mix it with features) event_column: ‘cens’ target_metric: cindex
Note about event column:
- use ‘event_query’ to specify which values map to event (say, ‘x1 = “YES”’
or “zz = 0”
- use ‘event_column’ to specify which column had been used for the event
query, and therefore should be excluded from general list of features.
- result of ‘event_query’ is cast
to boolean and used as an event indicator for downstream model.
- LABEL = 'cv_survival_dataset'
- get_target_values() numpy.array
Fetch array with shape (rows,), with tuple (event_column, target_column).
- property event_column
Event column for survival analysis.
- property event_query
Event query for survival analysis.
- get_target_columns() List
Returns list of target columns.
- get_univariate_container(column_name: str, drop_nans=True) Optional[logml.data.datasets.survival_dataset.UnivariateSurvivalContainer]
Returns a wrapped targets and values for a given column.
If specified - samples with NaNs (within the column values) are dropped.