Fine Tuned ViT on Food101 🌮🍣🍕🍣🍝

Top-1 Accuracy: 89%
Total Training Time: 3:26:16
Test loss: test_loss=1.16616
Train loss: test_loss=1.83015
Batch size: 128
Num epochs: 40
Hardware: NVIDIA DGX Spark

ViT feature extractor computer vision model to classify images of classes Food101 dataset.

Examples

Training Details

This model was fine-tuned on the Food-101 dataset using a pretrained Vision Transformer (ViT) in PyTorch with.