Datasets From LEAF – 2018 – Internet of Federated Things

Overview: LEAF is one of the earliest dataset proposals for federated learning. It contains six datasets covering different domains, including image classification, sentiment analysis and next-character prediction. A set of utilities is provided to divided datasets into different parties in an IID or non-IID way. FEMNIST dataset and Shakespeare dataset are selected since as opposed to the other four datasets, they are commonly used for testing federated learning framework.

FEMNIST
- Description: This is an image dataset with 62 different classes (10 digits, 26 lowercase, 26 uppercase). Images are 28 by 28 pixels (with option to make them all 128 by 128 pixels). There are 3500 users in total. This dataset is primarily for image classification.
Shakespeare
- Description: The dataset contains text dataset of Shakespeare dialogues. There are 1129 users in total and they are reduced to 660 users with the choice of sequence length. The main task of this dataset is for next-character prediction.

Link to LEAF Framework and its datasets