categoricalInformationValue
Calculates the value of (P(tag = 1) - P(tag = 0))(log(P(tag = 1)) - log(P(tag = 0))) for each category.
The result indicates how a discrete (categorical) feature [category1, category2, ...] contribute to a learning model which predicting the value of tag.
categoricalInformationValue
Introduced in: v20.1
Calculates the information value (IV) for categorical features in relation to a binary target variable.
For each category, the function computes: (P(tag = 1) - P(tag = 0)) × (log(P(tag = 1)) - log(P(tag = 0)))
where:
- P(tag = 1) is the probability that the target equals 1 for the given category
- P(tag = 0) is the probability that the target equals 0 for the given category
Information Value is a statistic used to measure the strength of a categorical feature's relationship with a binary target variable in predictive modeling. Higher absolute values indicate stronger predictive power.
The result indicates how much each discrete (categorical) feature [category1, category2, ...] contributes to a learning model which predicts the value of tag.
Syntax
Arguments
category1, category2, ...— One or more categorical features to analyze. Each category should contain discrete values.UInt8tag— Binary target variable for prediction. Should contain values 0 and 1.UInt8
Returned value
Returns an array of Float64 values representing the information value for each unique combination of categories. Each value indicates the predictive strength of that category combination for the target variable. Array(Float64)
Examples
Basic usage analyzing age groups vs mobile usage
Multiple categorical features with user demographics