To better understand the Food101 dataset, we analyzed the class distributions. The original dataset contains 101 classes of food items each representing a type of food, like pizza, sushi, or dessert, with an approximately balanced distribution of 1000 images per class. The class distribution plot (Fig. 3(a)) shows the sample count per class. The class distributions for sweet and savory categories after sampling are shown in Fig 3(b).
(a) Original classes
Fig. 3: Class Distribution Bar Chart
(b) Sweet, Savory
To further gain insights into the dataset, we visualized random samples from different classes. Fig.4 shows a grid of randomly selected images from various food categories. This visualization helps us understand the diversity within each class and across different classes.
After categorizing Food101 into sweet and savory classes and applying preprocessing, Fig. 5 shows random samples from both categories, highlighting their diversity and the effectiveness of preprocessing for model training. Computed the mean, median, and standard deviation of pixel intensities for each in the balanced dataset. Table I summarizes these statistics, offering insights into the central tendencies and variability of the image data.
We explored dimensionality reduction techniques like Principal Component Analysis(PCA) and t-SNE(t-distributed Stochastic Neighbor Embedding). PCA reduces high dimensional features to lower dimensions while preserving variance, condensing 500 extracted features into 2 principal components for visualization(Fig. 6(a)). t-SNE preserves local relationships, revealing clusters, class separability, subgroups within classes, and potential outliers(Fig. 6(b))
TABLE I: Mean, Median and Standard Deviation Statistics
Category | Mean | Median | Standard Deviation |
---|---|---|---|
Sweet. | 0.2379 | 0.2348 | 1.2902 |
Savory. | 0.2380 | 0.2348 | 1.2885 |
Fig. 4: Random Image Samples Grid
Fig. 5: Random Image Samples Grid of Sweet & Savory
(a) PCA
(b) t-SNE
Fig. 6: Dimensionality Reduction Techniques