Sunday, July 20, 2025
HomeLanguagesML | Chi-square Test for feature selection

ML | Chi-square Test for feature selection

Feature selection is also known as attribute selection is a process of extracting the most relevant features from the dataset and then applying machine learning algorithms for the better performance of the model. A large number of irrelevant features increases the training time exponentially and increase the risk of overfitting.

Chi-square Test for Feature Extraction:
Chi-square test is used for categorical features in a dataset. We calculate Chi-square between each feature and the target and select the desired number of features with best Chi-square scores. It determines if the association between two categorical variables of the sample would reflect their real association in the population.
Chi- square score is given by :

where –

Observed frequency = No. of observations of class
Expected frequency = No. of expected observations of class if there was no relationship between the feature and the target.

Python Implementation of Chi-Square feature selection:




# Load libraries
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
  
# Load iris data
iris_dataset = load_iris()
  
# Create features and target
X = iris_dataset.data
y = iris_dataset.target
  
# Convert to categorical data by converting data to integers
X = X.astype(int)
  
# Two features with highest chi-squared statistics are selected
chi2_features = SelectKBest(chi2, k = 2)
X_kbest_features = chi2_features.fit_transform(X, y)
  
# Reduced features
print('Original feature number:', X.shape[1])
print('Reduced feature number:', X_kbest.shape[1])


Output:

Original feature number: 4
Reduced feature number : 2
RELATED ARTICLES

Most Popular

Dominic
32154 POSTS0 COMMENTS
Milvus
67 POSTS0 COMMENTS
Nango Kala
6529 POSTS0 COMMENTS
Nicole Veronica
11677 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11737 POSTS0 COMMENTS
Shaida Kate Naidoo
6623 POSTS0 COMMENTS
Ted Musemwa
6899 POSTS0 COMMENTS
Thapelo Manthata
6590 POSTS0 COMMENTS
Umr Jansen
6583 POSTS0 COMMENTS