For the Food101 dataset, data cleansing and transformation were integral steps to enhance data quality and model perfor mance. We employed various image preprocessing techniques like resizing to 256x256 and center-cropping to 224x224 pixels, ensuring consistent input dimensions for the model. Histogram equalization transform is applied to improve image contrast, potentially improving feature detection. We further utilized data augmentation to enhance dataset variability in cluding random horizontal flips. These augmentations aimed to improve the model’s generalization ability across different conditions. Finally, the images were converted to tensors and normalized using mean and standard deviation values derived from the ImageNet dataset, facilitating model convergence.
Feature extraction is a critical step to distill complex, high-dimensional image data into meaningful, compact rep resentations. By focusing on the most relevant aspects of the data—such as textures, shapes and patterns—it reduces the computational burden and simplifies the learning process. The pretrained VGG16 model was used to extract high-level features from input images. By removing the classification head and focusing solely on the feature extraction layers, we obtained robust, pre-trained representations of the images. Feature selection plays a critical role in highlighting the most relevant attributes in the dataset while discarding less significant ones. By applying SelectKBest with the ANOVA F-statistic, we identify the features that exhibit the strongest correlation with the target labels. This ensures that the model focuses on the aspects of the data that are most predictive, reducing noise and potential distractions caused by irrelevant or redundant features. Further dimensionality reduction was achieved using PCA, which transforms the selected features into a lower-dimensional space of 100 components. PCA helps retain the most critical variance in the data while mitigating overfitting risks and computational overhead.
In today’s health-conscious society, managing sugar intake is a priority. A binary classification system was developed to distinguish sweet foods from savory ones, simulating a real world application that identifies desserts versus main courses to support balanced diets and reduce sugar consumption. To prepare the dataset for binary classification, implemented a custom categorization and balancing approach where 101 food classes were categorized into two main groups: 1. Sweet (23 classes): e.g., ’apple pie’, ’chocolate cake’ 2. Savory (78 classes): e.g., ’pizza’, ’sushi’. The data distribution after categorization is shown in Fig. 1. Sampling was applied to balance the dataset, resulting in 23,000 images per category, ensuring effective learning despite the initial class imbalance.
Fig. 1: Distribution of sweet vs. savory (before balancing)
The dataset was divided into three subsets: 1) Training set (80%): Used to train the model. 2) Validation set (10%): Employed for hyper-parameter tuning and avoiding over f itting. 3) Test set (10%): Reserved for final model evaluation. The split ensured a balanced distribution of data across all subsets, maintaining uniform class representation to avoid data imbalance. The distribution of samples across these subsets shown in Fig.2.
Fig. 2: Bar chart showing data split proportions