Graph-Based, Multi-Distance Feature Selection For Multi-Class Classification
Omer Hedvat, M.sc student Advisor: Dr. Neta Rabin
Via Zoom click here
Feature selection is an essential process in machine learning that involves selecting a subset of relevant features from a more extensive set of features. It plays a crucial role in developing accurate and efficient machine-learning models. The feature selection process helps reduce the dimensionality of data and improve model performance by eliminating irrelevant or redundant features. This helps to avoid overfitting and allows models to generalize better to new data. Additionally, feature selection can reduce training time and computational resources, making the modeling process more efficient and cost-effective. Furthermore, it can improve model interpretability by simplifying the model and making it easier to understand how the model is making predictions. Filter-based feature selection is a popular approach in machine learning that involves selecting features based on their statistical properties. It is a computationally efficient method that can quickly identify relevant features and eliminate irrelevant ones. This approach helps to reduce the dimensionality of data and improve model performance. Filter-based feature selection works by evaluating each feature's statistical properties, such as correlation, mutual information and selecting the top-ranking features based on these criteria. This method eliminates irrelevant or redundant features, reducing the model's complexity and improving its efficiency. In this work, we focus on a graph-based filter feature selection technique. We extend prior research that shows the advantage of such approach and extend it by incorporating three different types of separation metrics, the Jeffries–Matusita distance, the Wasserstein distance and the Hellinger distance. The separation metrics that characterize each feature are input for diffusion maps, a dimension reduction method. Then, feature selection is done in the low-dimensional space by the K-means algorithm to decide which of the features should be selected. We compare between the different metrics and in addition provide a framework for fusing two or more separation metrics. The algorithm was tested on five public datasets, and the results were compared to known filter-based feature selection techniques. We show that the proposed method performs well, especially when the number of selected features
Omer Hedvat, M.Sc. student at the department of Industrial Engineering in Tel Aviv University, specializing in Data Science. Omer holds a B.Sc. degree in Industrial Engineering from the Tel Aviv University and works as a Senior Data Scientists at Bluevine these days. His research focuses on feature selection in multi-class problems and algorithmic, this research was supervised by Dr. Neta Rabin.
• E-Mail: email@example.com
• LinkedIn: www.linkedin.com/in/omerhedvat