Exploratory Data Analysis (EDA)

Descriptive Statistics

To better understand the Food101 dataset, we analyzed the class distributions. The original dataset contains 101 classes of food items each representing a type of food, like pizza, sushi, or dessert, with an approximately balanced distribution of 1000 images per class. The class distribution plot (Fig. 3(a)) shows the sample count per class. The class distributions for sweet and savory categories after sampling are shown in Fig 3(b).

(a) Original classes

Fig. 3: Class Distribution Bar Chart

(b) Sweet, Savory

Data Visualization

To further gain insights into the dataset, we visualized random samples from different classes. Fig.4 shows a grid of randomly selected images from various food categories. This visualization helps us understand the diversity within each class and across different classes.

After categorizing Food101 into sweet and savory classes and applying preprocessing, Fig. 5 shows random samples from both categories, highlighting their diversity and the effectiveness of preprocessing for model training. Computed the mean, median, and standard deviation of pixel intensities for each in the balanced dataset. Table I summarizes these statistics, offering insights into the central tendencies and variability of the image data.

We explored dimensionality reduction techniques like Principal Component Analysis(PCA) and t-SNE(t-distributed Stochastic Neighbor Embedding). PCA reduces high dimensional features to lower dimensions while preserving variance, condensing 500 extracted features into 2 principal components for visualization(Fig. 6(a)). t-SNE preserves local relationships, revealing clusters, class separability, subgroups within classes, and potential outliers(Fig. 6(b))

TABLE I: Mean, Median and Standard Deviation Statistics

Category	Mean	Median	Standard Deviation
Sweet.	0.2379	0.2348	1.2902
Savory.	0.2380	0.2348	1.2885

Fig. 4: Random Image Samples Grid

Fig. 5: Random Image Samples Grid of Sweet & Savory

(a) PCA

(b) t-SNE

Fig. 6: Dimensionality Reduction Techniques