Fine Tuned ViT on Food101 ๐ŸŒฎ๐Ÿฃ๐Ÿ•๐Ÿฃ๐Ÿ

ViT feature extractor computer vision model to classify images of classes Food101 dataset.

Examples

Training Details

This model was fine-tuned on the Food-101 dataset using a pretrained Vision Transformer (ViT) in PyTorch with.

Final Result

  • Top-1 Accuracy: 89%
  • Total Training Time: 3:26:16
  • Test loss: test_loss=1.16616
  • Train loss: test_loss=1.83015
  • Batch size: 128
  • Num epochs: 40
  • Hardware: NVIDIA DGX Spark