Machine learning (ML) is increasingly being used to analyze videos of spermatozoa under a microscope for developing computer-aided sperm analysis (CASA) systems1,2. In the last few years, several studies have investigated the use of deep neural networks (DNNs) to automatically determine specific attributes of a semen sample, like predicting the proportion of progressive, non-progressive, and immotile spermatozoa3,4,5,6,7. However, a major challenge with using ML for semen analysis is the general lack of data for training and validation. Only a few open labeled datasets exist (Table 1), with most focus on still-frames of fixed and stained spermatozoa or very short sequences of sperm to analyze the morphology of the spermatozoa.
In this paper, we present a multi-modal dataset containing videos of spermatozoa with the corresponding manually annotated bounding boxes (localization) and additional clinical information about the sperm providers from the original study8. This dataset is an extension of our previously published dataset VISEM8, which included videos of spermatozoa labeled with quality metrics following the World Health Organization (WHO) recommendations9.
There have been several datasets released related to spermatozoa, for example, Ghasemian et al.10 have published an open sperm dataset called HSMA-DS: Human Sperm Morphology Analysis DataSet with normal and abnormal sperms. Experts annotated different features, namely vacuole, tail, midpiece, and head abnormality. The availability of abnormalities of these features were marked using binary notations such as 1 or 0, 1 is for abnormal, and 0 for normal. In total, there are 1,457 sperm for morphology analysis. These sperm images were captured with ×400 and ×600 magnification. The Modified Human Sperm Morphology Analysis Dataset (MHSMA)11 consists of 1,540 cropped images from the HSMA-DS dataset10. This dataset was collected for analyzing different parts of sperm (morphology). The maximum image size in the dataset is 128 × 128 pixels.
The HuSHEM12 and SCIAN-MorphoSpermGS13 datasets consist of images of sperm heads captured from fixed and stained semen smears. The main purpose of these datasets is sperm morphology classification into five categories, namely normal, tapered, pyriform, small, and amorphous. SMIDS14 is another dataset consisting of 3000 images cropped from 200 stained ocular images from 17 subjects between 19–39 years. From 3000 images, 2027 patches were manually annotated as normal and abnormal. Another 973 samples were classified as non-sperm using spatial-based automated features. McCallum et al.15 have published another similar dataset with bright-field sperm of six healthy participants within 1064 cropped images. The main purpose of this dataset is to find correlations between sperm images obtained by bright field microscopy and sperm DNA quality. However, these datasets do not provide spermatozoa’s motility and kinetics features.
Chen et al.16 introduced a sperm dataset called SVIA (Sperm Videos and Images Analysis dataset), which contains 101 short 1 to 3 seconds video clips and corresponding manually annotated objects. The dataset is divided into three subsets, namely subset-A, B, and C. Subset-A contains 101 video clips (30 FPS) containing 125,000 object locations and corresponding categories. Subset-B contains 10 videos with 451 ground truth segmentation masks and subset-C consists of cropped sperms for classification into 2 categories (impurity images and sperm images). The provided video clips are very short compared to VISEM-Tracking. Our dataset17 contains 7× more annotated video frames. In addition, VISEM-Tracking contains 2.3× more annotated objects compared to SVIA.
VISEM-Tracking offers annotated bounding boxes and sperm tracking information, making it more valuable for training supervised ML models than the original VISEM dataset8, which lacks these annotations. This additional data enables a variety of research possibilities in both biology (e.g., comparing with CASA tracking) and computer science (e.g., object tracking, integrating clinical and tracking data). Unlike other datasets, VISEM-Tracking’s motility features facilitate sperm identification within video sequences, resulting in a richer and more detailed dataset that supports novel research directions. Potential applications include sperm tracking, classifying spermatozoa based on motility, and analyzing movement patterns. To the best of our knowledge, this is the first open dataset of its kind.