Welcome to i2bmi’s documentation!

Indices and tables

i2bmi.assign_comorbidities(df, column_code, column_version, columns_id, verbose=False)

Assign elixhauser/charlson comorbidity and comorbidity scores from diagnosis dataframe

Parameters:
  • df (pandas.DataFrame) – Input dataframe containing diagnosis codes
  • column_code (str) – name of column containing diagnosis code.
  • column_version (int or str) – if int, 9 or 10 indicating ICD version if str, name of column containing ICD version (9 or 10)
  • columns_id (list of str) – list of names of columns to be used as identifier
Returns:

  • df_long (pandas.DataFrame) – long-form dataframe showing the mapping from icd code to comorbidity systems
  • df_wide (pandas.DataFrame) – wide-form dataframe showing comorbidities and comorbidity score per identifier

Examples

>>> df_diagnosis_long,df_diagnosis_wide = assigncomorbidities(df_diagnosis,'ICD_CODE','ICD_VERSION',['ID'])
>>> df_diagnosis_long,df_diagnosis_wide = assigncomorbidities(df_diagnosis,'ICD_CODE',9,['MRN','CSN'])
i2bmi.boxcox(df, invert=None)

Forward and inverse boxcox transformation

Parameters:
  • df (pandas.DataFrame) – Input dataframe with numeric columns to be boxcox transformed
  • invert (dict) – used to perform inverse transformation.
Returns:

  • pandas.DataFrame – boxcox transformed input dataframe
  • dict – contains information regarding forward transformation which can be used to perform inverse transformation dict of str (boxcox-transformed column name):dict, which contains as keys ‘min’ and ‘lmbda’

Examples

>>> Transformed_DataFrame,Transformation_dict = boxcox(DataFrame)
>>> Inverse_Transformed_DataFrame = boxcox(Transformed_DataFrame,invert=Transformation_dict)
i2bmi.cohort_comparison(df, groups, include=[], p_thres=0.01, test_cat=<function _chi2>, test_cont=<function _ks>)

Generates cohort comparison table

Parameters:
  • df (pandas.DataFrame) – pandas dataframe in the form of [samples x features] where features include group(s) to be compared
  • groups (str or list of str) – name(s) of columns to be used as groups for comparison if a str, the function will compare those who had True vs. False in the column
  • include (list of str) – list of features to be compared. If empty list, all features will be compared.
  • p_thres (float) – p-value threshold for significance
  • test_cat (function) – statistical test for comparing categorical or boolean variables. Only pre-existing option is _chi2.
  • test_cont (function) – statistical test for comparing continuous (numeric) variables. Pre-existing option are _anova and _ks.
Returns:

cohort comparison table

Return type:

pandas.DataFrame

Examples

>>> cohortcomparison(processed_dataframe,'In-hospital mortality')
i2bmi.dataframe_summary(df, column_item, column_value, stripchars='+-<> ')

Characterize long dataframe containing longitudinal variables e.g. for mapping purposes

Parameters:
  • df (pandas.DataFrame) – dataframe containing longitudinal variables in long form
  • column_item (str) – name of column indicating measurement type
  • column_value (int) – name of column indicating measurement result
  • stripchars (str) – characters to remove prior to converting to numeric
Returns:

Summary of input dataframe - # of measurements, % numeric, quantiles (0.1,0.25,0.5,0.75,0.9), and top 20 most common results

Return type:

pandas.DataFrame

Examples

>>> dataframe_summary(dataframe_laboratoryresults,'LAB_TEST','LAB_RESULT_VALUE')
i2bmi.jupyter_widen()

Increases width of jupyter cells to use more of the realestate available in the browser

i2bmi.onehotify(df, sep='|')

Wrapper for one-hot encoding all (explicitly) categorical columns in a dataframe

Parameters:
  • df (pandas.DataFrame) – Input dataframe with categorical columns to be one-hot encoded. Some pre-processing may be required as this function does not group low-frequency categories.
  • sep (str) – Separator. The returned dataframe will contain columns formatted as variable name followed by separator followed by category name.
Returns:

Input dataframe but with categoriacl columns split out into one-hot encoded columns

Return type:

pandas.DataFrame

Examples

>>> onehotify(dataframe_demographics)
i2bmi.performance_metrics(y_true, y_score)

Generate performance metrics dataframe with threshold as index

Parameters:
  • y_true (list-like or pandas.Series) – true y labels
  • y_score (list-like or pandas.Series) – predicted probability
Returns:

performance_metrics dataframe

Return type:

pd.DataFrame

Examples

>>> performance_metrics(y_train,y_train_pred)
i2bmi.plot_calibration(y_true, y_score, figpath=None)

Calibration plot

Parameters:
  • y_true (list-like or pandas.Series) – true y labels
  • y_score (list-like or pandas.Series) – predicted probability
  • figpath (str) – path for saving figure
Returns:

Return type:

None

Examples

>>> plot_calibration(y_train,y_train_pred)
i2bmi.plot_prc(y_true, y_score, figpath=None)

Precision Recall Curve plot

Parameters:
  • y_true (list-like or pandas.Series) – true y labels
  • y_score (list-like or pandas.Series) – predicted probability
  • figpath (str) – path for saving figure
Returns:

Return type:

None

Examples

>>> plot_prc(y_train,y_train_pred)
i2bmi.plot_roc(y_true, y_score, figpath=None)

Receiver Operating Curve plot

Parameters:
  • y_true (list-like or pandas.Series) – true y labels
  • y_score (list-like or pandas.Series) – predicted probability
  • figpath (str) – path for saving figure
Returns:

Return type:

None

Examples

>>> plot_roc(y_train,y_train_pred)
i2bmi.plot_temporal(series_value, series_time, num_bins=20, figpath=None)

Triplet plot for characterizing longitudinal variables

Parameters:
  • series_value (pandas.Series) – variable value
  • series_time (pandas.Series) – variable documentation datetime
  • num_bins (int) – number of bins for all subplots
  • figpath (str) – path for saving figure
Returns:

Return type:

None

Examples

>>> plot_temporal(df['VALUE'],df['TIME'],num_bins=30,figpath='./figure.png')
i2bmi.plot_threshold(y_true, y_score, figpath=None)

Threshold plot

Parameters:
  • y_true (list-like or pandas.Series) – true y labels
  • y_score (list-like or pandas.Series) – predicted probability
  • figpath (str) – path for saving figure
Returns:

Return type:

None

Examples

>>> plot_threshold(y_train,y_train_pred)
i2bmi.quantile(n)

Wrapper for pandas quantile for use in groupby

Parameters:n (int) – Quantile
Returns:quantile function that can be used in a groupby
Return type:function

Examples

The series on which to apply the returned quantile function must be numeric

>>> DataFrame.groupby('MEASURE_NAME').agg({'VALUE':['size',quantile(0.25)]})
i2bmi.standardize(df, invert=None)

Forward and inverse standardization transformation (mean=0, std=1)

Parameters:
  • df (pandas.DataFrame) – Input dataframe with numeric columns to be standardized
  • invert (dict) – used to perform inverse transformation.
Returns:

  • pandas.DataFrame – standardized input dataframe
  • dict – contains information regarding forward transformation which can be used to perform inverse transformation dict of str (boxcox-transformed column name):dict, which contains as keys ‘std’ and ‘mean’

Examples

>>> Transformed_DataFrame,Transformation_dict = standardize(DataFrame)
>>> Inverse_Transformed_DataFrame = standardize(Transformed_DataFrame,invert=Transformation_dict)
i2bmi.value_counts(n)

Wrapper for pandas value_counts for use in groupby

Parameters:n (int) – Number of most common responses
Returns:value_counts function that can be used in a groupby
Return type:function

Examples

>>> DataFrame.groupby('MEASURE_NAME').agg({'VALUE':['size',value_counts(20)]})