Food-101 consists of image data, our initial approach was to use a Convolutional Neural Network (CNN) to classify the images. However, when we tried using a custom-built CNN with 6 layers, the model did not perform well due to the large dataset size and the inability of a smaller CNN to effectively capture complex features. As a result, we shifted to VGG and ResNet-based architectures for feature extraction and image classification. The models we designed and experimented with are as follows:

VGG-16 + Logistic Regression Model

In this experiment, we have used the VGG-16 [9] model for feature extraction combined with logistic regression for binary classification between sweet and savory classes. VGG 16 is a deep convolutional neural network and has been a proven method to extract hierarchical image features with the help of its stacked convolutional layers. Because of the deep convolutional structure, VGG efficiently captures complex patterns and textures in food images with minimal additional training. After this, we performed dimensionality reduction and chose the top 500 features based on their correlation with the target variable. Further, we reduced the relevant features to 100 using PCA for faster and easier processing of data. These extracted features are then fed into a logistic regression model, and binary classification is performed, outputting whether the given image belongs to the sweet or savory class.

ResNet-50

ResNet-50 [10] is a deep convolutional neural network proposed by researchers in Microsoft in 2015. ResNet-50 overcomes the issue of the Vanishing Gradient Problem. It achieves this through the introduction of residual blocks, which utilize shortcut connections to skip one or more layers, allowing the model to learn residual functions instead of direct mappings. For instance, in every block of ResNet-50, there are skip connections, which allows the model to skip the next few layers and add the input of this layer directly to the output of the block, hence, effectively overcoming the Vanishing Gradient Problem. The skip connection structure is show in Figure 7. This structure makes sure that the gradient is propagated effectively during the backpropagation. From its release researchers have proved that this type of architecture is well suited for image recognition, object detection, and segmentation tasks. Here, we used the ResNet-50 model trained on the Food-101 dataset, modifying the output layer for binary classification into Sweet and Savory classes

11.png

Fig. 7: ResNet-50 Skip Connections