Food image classification is a key task in computer vision, with applications in dietary planning, health monitoring, and menu organization. The Food-101 dataset, comprising 101,000 images across 101 categories, presents a realistic challenge due to its noisy and mislabeled data. This project focuses on the binary classification of the dataset into sweet and savory categories using the ResNet-50 architecture, a robust deep-learning model designed to overcome challenges like the vanishing gradient problem. Through data preprocessing, augmentation, and feature selection with PCA, we highlight the effectiveness of ResNet-50 compared to models like VGG-16 with logistic regression, providing insights into the strengths of deep networks for fine-grained food classification.

Food-101 Dataset

The Food-101 dataset is a widely used benchmark in computer vision, consisting of 101,000 images across 101 food categories, with each category containing 1,000 images divided into 750 training and 250 manually-reviewed test images. Created by researchers at ETH Zurich, the images are rescaled to a maximum side length of 512 pixels and include intentionally noisy training data and some wrongly labeled data to present a challenging and realistic scenario for image classification tasks. The dataset supports the primary task of image classification and is accessible through various machine learning libraries such as the Hugging Face Datasets library, PyTorch’s torchvision library, and TensorFlow Datasets. It is valuable for developing and benchmarking image classification models, researching food recognition systems, and exploring data-centric AI principles. [8]

Literature Review

Recent advances in food image classification leverage deep learning techniques, particularly convolutional neural networks (CNNs), to achieve state-of-the-art performance. Imran and Athitsos (2020) [1] employed visual attention mechanisms with domain-adaptive transfer learning to enhance accuracy on fine-grained datasets, while Lu et al. (2021) [2] introduced Neural Architecture Transfer (NAT), which efficiently generates task-specific subnets from pre-trained supernets, achieving high accuracy across various datasets. Lee et al. (2020) [3] validated the effectiveness of assembling optimization techniques in models like ResNet-50, achieving 92.5% accuracy through advanced regularization and augmentation strategies. Islam et al. (2018) [4] demonstrated the capabilities of a custom-built CNN for the Food-101 dataset, emphasizing feature extraction and dimensionality reduction, while Attokaren et al. (2017) [5] utilized transfer learning with Inception V3, combining preprocessing steps like ZCA whitening and data augmentation to improve classification robustness. These studies highlight the significance of transfer learning, feature selection, and advanced model architectures, which align with our project’s use of ResNet-50 and comprehensive preprocessing to tackle challenges such as noisy data and inter-class variability.