Heterogeneity-aware Federated Learning

Aquilante, Justin W.

Student Work

Heterogeneity-aware Federated Learning

Public

Machine learning, and more specifically federated learning, is experiencing exponential growth into a variety of industries. Federated learning, training machine learning models on individual user data and aggregating the models, is an increasingly important field due to its current applications rapidly expanding potential, and focus on user privacy. Federated learning provides the convenience of machine learning with additional privacy built in (Li 2019). As more privacy-focused industries look to machine learning for efficient solutions, federated learning is increasingly relevant. Our goal is to evaluate federated learning algorithms and understand how they perform under various scenarios, such as image recognition. In this Major Qualifying Project (MQP), we compare two federated learning algorithms, FedAvg and Fedprox. These algorithms allow models to be aggregated in the federated learning process and can can be customized to user scenarios. To compare both FL algorithms, we tested both algorithms on two datasets: FEMNIST and CELEBA. In order to accomplish our analyses, we implemented a means of preprocessing the datasets we used into a form (HDF5 files) that allows them to be easily utilized in the environments we have prepared. We then prepared an easily replicable testing pipeline for federated learning algorithms. This pipeline included common federated learning analysis tools Google Collaboratory and Tensorflow Federated. We show that under various circumstances, FedProx outperforms FedAvg in identically and independently distributed (IID) scenarios. In non-identically and independently distributed (NIID) scenarios, the relative performance of the algorithms is less predictable. For example, we found that FedAvg converges 6% faster than FedProx on CELEBA with no stragglers, while there was less than 1% difference between the two on IID. We also tested the impact of adding under performing or "straggling" devices; a common problem in real-world federated learning applications. The extra challenge created by simulating stragglers negatively impacted FedAvg’s performance while leaving FedProx mostly unaffected in the MNIST dataset. When the same challenge is introduced with the CELEBA dataset, FedAvg was not able to perform at the same or better level than FedProx; FedProx converged in 50% of the rounds it took FedAvg to converge. However, NIID data again proved unpredictable with FedAvg performing 23% better than FedProx. Our analysis provides an evaluative view on the performance of common federated learning algorithms as well as a means to test other algorithms by expanding testing resources available for comparing them.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator