Semi-Supervised identification of drivers
Semi-Supervised identification of drivers
Dor Bar,M.Sc student at the department of Industrial Engineering
Advisor: Prof. Irad Ben-Gal
Abstract:
Collaboration between edge devices has the potential to dramatically scale up machine learning (ML) through access to an unprecedented quantity of data. Federated learning (FL) is a collaborative algorithm in which clients learn from each other without sharing their private data. However, edge devices tend to have different data distributions since they are naturally exposed to different data sources. This heterogeneity of the data, also known as non-IID data distributions, has been shown to decrease FL accuracy. We propose studying how data sharing among users can mitigate this performance degradation. Data sharing among users can occur naturally on the social graph or can be incentivized by the platform based on different criteria. We test the performance gains of data sharing for several common ML models and datasets, such as MNIST, CIFAR-10, and CIFAR-100. We also test different network topologies: complete graph, clusters, and stochastic block models. We empirically show that across the different experiments, modest data sharing between neighbors on the social graph boosts learning performance significantly for the non-IID case. We also show that data sharing can, surprisingly, boost performance for the IID case. By normalizing the dataset sizes, we verify that this performance boost is significant even if data sharing does not increase the number of data points per client. Data sharing is thus a simple and efficient technique for improving SFL, where users share only part of their data with their friends, colleagues, and family.
Bio:
Dor bar is an M.Sc. student in the Department of Industrial Engineering at Tel Aviv University, conducting her research under the supervision of Prof. Irad Ben Gal.