Datasets From FedML – 2020

Overview: FedML is a research library that provides both frameworks for federated learning and benchmark functionalities. As a benchmark, it provides comprehensive baseline implementations for multiple ML models and FL algorithms, including FedAvg, FedNAS, Vertical FL and split learning. Moreover, it supports three computing paradigms, namely distributed training, mobile on-device training, and standalone simulation.  Also, Fedml framework provides various dataset for users to do experiments on. Now, we would list a few representative datasets that are most commonly used.

  1. CIFAR-10
    • Description: The CIFAR-10 dataset consists of 60000 32×32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. Link to Data
  2.  FedCIFAR-100
    • Description: This dataset is similar to the CIFAR-10 dataset. It has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a “fine” label (the class to which it belongs) and a “coarse” label (the superclass to which it belongs). Link to Data
  3. Autonomous Driving
    • Description: There is a number of datasets regarding autonomous driving within this category including 2D object detection and multi-object tracking. Link to Data

Link to Framework

Link to Paper