Curiosity in Data Science: A comprehensive framework for data structures analysis, instances, and features selection to improve machine learning models’ performance.

18 June 2024, 14:00 
zoom & Room 206 
 Curiosity in Data Science: A comprehensive framework for data structures analysis, instances, and features selection to improve machine learning models’ performance.

Join us with Zoom

Curiosity in Data Science: A comprehensive framework for data structures analysis, instances, and features selection to improve machine learning models’ performance.

Michal Moran, Tel-Aviv University Advisor: Prof. Goren Gordon

 

Abstract:

In the domain of data science, constructing advanced and high-quality models for cutting-edge machine learning applications faces challenges arising from the complexities of underlying dataset instances and features. These challenges significantly affect the efficiency, quality, and overall performance of machine learning models. Overcoming these challenges requires human intervention to selectively choose appropriate data instances and features for model training, coupled with continuous tuning efforts. Data scientists typically undertake two primary tasks before constructing machine learning models: instance selection and feature selection. Instance selection involves extracting a useful subset of instances from a dataset, aiming to strike a balance between dataset reduction and classification quality. Feature selection methods aim to identify a compact subset of features that improve prediction results by eliminating noisy, irrelevant, or repetitive features. Existing instance selection algorithms encounter difficulties in controlling the balance between dataset reduction and classification quality, often necessitating multiple iterations with different configurations. The computational complexity of feature selection, when exploring all possible feature subsets necessitates the use of search algorithms, which, although computationally intensive, provide locally optimal results. While various instance selection and feature selection methods have been developed to address specific challenges, there is a clear need for a comprehensive approach that integrates instance and feature selection methods. This holistic method seeks to analyze and understand the structure of the dataset, bringing further enhancements to the efficiency and quality of machine learning models. This research addresses challenges in machine learning applications related to dataset complexities by introducing several methods, based on concepts from intrinsically motivated computational learning, specifically artificial curiosity (AC). The challenges include dataset size, quality, high dimensionality, noisy features, and diverse sources. The proposed methods: (1) Curious Instance Selection (CIS), (2) Deep Curious Feature Selection (DCFS) and (3) Curious Feature Selection-Based Clustering (CuFSC), aim to enhance the understanding of dataset structures, provide insights, and improve modelling performance. These proposed methods, based on the curiosity loop, demonstrate improved accuracy on both artificial and real-world datasets compared to state-of-the-art methods. They also provide flexibility and varied insights to analysts, offering a valuable tool for navigating complex and rapidly changing environments in data science applications.

Bio:

Michal Moran has overall 15 years of experience in AI and machine learning. She holds a B.Sc. degree in Industrial Engineering from Shenkar College and M.Sc in Business Analytics from TAU Curiosity Lab and is currently working on her PHD in TAU Curiosity Lab. Her research is focused on curiosity in datasets. Currently. Michal is a Data Science Team Manager at Intel’s AI department. Michal’s team develops and implements AI solutions within Intel processors to improve performance or power savings. Prior to this role, Michal held several machine learning data science and product roles at Intel’s AI department.

Tel Aviv University makes every effort to respect copyright. If you own copyright to the content contained
here and / or the use of such content is in your opinion infringing, Contact us as soon as possible >>