model_tests.FEAT.SubgroupDisparity¶

SubgroupDisparity Objects¶

@dataclass
class SubgroupDisparity(ModelTest)

Test if the maximum difference / ratio of a specified metric for any 2 groups within a specified protected attribute exceeds the given threshold.

If chi2 is used, the p-value calculated from a chi-square test of independence should be greater than the level of significance as specified by the threshold.

Arguments:

attr - Column name of the protected attribute.
metric - Type of performance metric for the test, For classification problem, choose from 'fpr' - false positive rate, 'fnr' - false negative rate, 'pr' - positive rate. For regression problem, choose from 'mse' - mean squared error, 'mae' - mean absolute error.
method - Type of method for the test, choose from 'chi2', 'ratio' or 'diff'.
threshold - Threshold for maximum difference / ratio, or the significance level of chi-sq test.
test_name - Name of the test, default is 'Subgroup Disparity Test'.
test_desc - Description of the test. If none is provided, an automatic description will be generated based on the rest of the arguments passed in.

get_metric_dict¶

def get_metric_dict(df: pd.DataFrame) -> Tuple[dict, list]

Calculate metric ratio / difference and size for each subgroup of the protected attribute on a given df.

Arguments:

df - Dataframe.

Returns:

A dictionary of each subgroup and the calculated ratio or difference.

get_contingency_table¶

def get_contingency_table(df: pd.DataFrame) -> list

Obtain the contingency table of the metric of interest for each subgroup of a protected attribute on a given df.

Arguments:

df - Dataframe.

Returns:

List of metric value.

plot¶

def plot(alpha: float = 0.05, save_plots: bool = True)

Plot the metric of interest across the attribute subgroups, and their confidence interval bands.

Arguments:

alpha - Significance level for confidence interval.
save_plots - If True, saves the plots to the class instance.

get_result¶

def get_result(df_test_with_output: pd.DataFrame) -> Dict[str, float]

Calculate maximum ratio / diff or chi-sq test for any 2 subgroups' metrics on a given df.

Arguments:

df_test_with_output - Dataframe containing protected attributes with "prediction" and "truth" column.

run¶

def run(df_test_with_output: pd.DataFrame) -> bool

Runs test by calculating result and evaluating if it passes a defined condition.

Arguments:

df_test_with_output - Dataframe containing protected attributes with "prediction_probas" and "truth" column. protected attribute should not be encoded.