Large COVID-19 CT Scan Slice Dataset

This dataset was created by merging seven public datasets. Patients are divided into three groups by their conditions:

  1. Normal: 6,893 images from 604 patients
  2. COVID-19: 7,593 images from 466 patients
  3. Community Acquired Pneumonia (CAP): 2,618 images from 60 patients.

Images were collected from multiple countries for different gender and age groups. These features enable the testing of federated algorithms on topics like personalization and fairness. The large variety in the dataset also makes it suitable for testing the generalization effect under distribution shifts. 

The dataset along with its metadata can be download via this Kaggle Link. For more details on the dataset, please refer to the authors’ GitHub Page.



Maftouni, M., Law, A.C, Shen, B., Zhou, Y., Yazdi, N., and Kong, Z.J. “A Robust Ensemble-Deep Learning Model for COVID-19 Diagnosis based on an Integrated CT Scan Images Database,” Proceedings of the 2021 Industrial and Systems Engineering Conference, Virtual Conference, May 22-25, 2021.