120 Dog Breeds on Hugging Face

Duc Haba
7 min readJun 22, 2022

Breaking News (June 27th): “Is your baby a purebred dog?”

  • Scroll down to the “Bonus: Breaking News (June 27th)” section at the end of the article.

Hugging Face?

I am not writing about children’s books. Hugging Face is a company started in 2018 that develops social AI-run chatbot applications. It grows a rich and diverse community of AI researchers and developers. Among many of Hugging Face’s ventures is Hugging Face Space (HFS), where AI researchers and developers posted or deployed their Machine Learning (ML) model.

I developed dozens of ML projects using Fastai, and with HFS and Gradio’s web interface platform, I began to publish those results.

The first is the “classification of 120 dog breeds.” Why? Because it’s fun and easy. Furthermore, the Data Scientist at the Department of Health and Social Care London, England, United Kingdom, released the 120 dog breeds dataset about two months ago on Kaggle.

The “HFS 120dog_breeds” website has an easy-to-use interface. Click or touch the “drop image here” frame to take a photo or upload your picture. The ML will predict your dog’s breed in a delightful donut chart. Click the “Clear” button to reset and prepare for the next image. The example pictures are selected to show the range of accuracy with the model. You can click on the example pictures and see the result for yourself.

For the techie in all of us, read onward to see how I do it. With HFS, the deploy code is in the file “app.py” locates in the “Files and version” tab.

For starters, create an “ADA_DOG” class that holds Fastai’s predict methods and instantiate “maxi” with the ADA_DOG class. “Maxi” is my imaginary Sibirian Husky canine companion. The fancy “donut chart” is Maxi’s invention by taking the predicted output from Fastai, importing them into Pandas Data Frame, sorting, and graphing it using Matplotlib. The relevant code for methods “maxi.predict_donut() and maxi._draw_pred()” are here.

For a bonus, Maxi can guess which dog breed you are most likely. Just take a selfie picture. :-) That’s it for deploying an ML model. Maxi made it easy peasy lemon squeezy.

For training the model, Fastai makes it super simple to train the model. The challenge is ingesting the dataset from the “120 dog breeds” dataset from the Department of Health and Social Care (DHSC) in London, England, United Kingdom.

First, it is a clean, well-balanced, and organized image dataset. I would like to have a big shoutout to the DHSC for publishing the dataset for the research community on Kaggle.

The labeling in the parent folder posed a little challenge due to using both hyphens and underscores in the dog breeds’ names. “Ada” is the object/class instantiated to house all the methods for downloading, analyzing, and training the model. Ada is naturally an imaginary Labrador Retriever canine friend.

The method “ada.draw_images_spec()” sums up the download and analysis nicely.

There are a few outliner images with too big width and height, and the “Maltese” has almost double the number of images as the “Redbone,” but overall, the 20,580 images are well-balanced.

After successfully creating the Pandas Data Frame, it’s smooth sailing in creating the data-bunch and training the model with Fastai. Ada chooses the “efficientNetV2” model from Jeremy Howard’s research, the “Which [Timm] Model to choose” Jupyter Notebook.

The Colab-Pro account gives Ada 16 GB of GPU RAM and 24 GB of CPU RAM. When Ada chooses any of the higher performance models, she encounters an “out of Cuda memory” error. She even lowers the “batch size” to 8. The system train on a few high-performance models, but it would take about 1.3 hours per epoch.

Ada uses “fastai.interpret.ClassificationInterpretation.from_learner(learn) and .print_classification_report()” to summerize the training result. The report is fantastic for displaying useful data on “F1-Score, Precision and Recall.”

The Precision is the accuracy of positive predictions.

  • Where: Precision = TP/(TP + FP)
  • TP = True Positive
  • TN = True Negative
  • FP = False Positive
  • FN = False Negative

The Recall is a fraction of positives that were correctly identified.

  • Where: Recall = TP/(TP+FN)

The F1 score is a weighted harmonic mean of precision and recalls such that the best score is 1.0 and the worst is 0.0.

  • Where: F1 Score = 2*(Recall * Precision) / (Recall + Precision)

Ada is not happy with the “printed text” output. Thus she hacked the above methods to return a dictionary object. She loads the output to a Pandas Data Frame, sorts, and draws a delightful graph using Pandas and Matplotlib.

Ada is foretelling that her next article will be a touch more serious. The report will be about “Identifying skin cancer.” California is under a heat wave this week. Thus there will be too much sun, hence worries about skin cancer. The model training and deploying are the same. The input data will be a challenge.

Ada hopes you visit the “HFS 120dog_breeds” website with your iPhone or Android phone [or desktop] and test-drive it. You will be surprised by how accurate it is. Ada could improve it if she downloaded other dog breeds datasets on Kaggle. She then combines them into one giant “train and validates” dataset. Thus it will yield a more accurate training result.

That concludes our discussion “120 Dog Breeds on Hugging Face.” Contact me directly if you have questions. Ada, Maxi, and I are looking forward to reading your feedback. As always, I apologize for any unintentional errors. The intentional errors are mine and mine alone. :-)

Have a great day. Please give a “thumbs up, like, or heart.”

Bonus: Breaking News! (June 27th)

“Is your baby a purebred dog?”

One unintentional use is to verify if your dog is a purebred canine. A mongrel, mutt, or mixed-breed dog is a canine that does not belong to one officially recognized breed. A handful of [online] friends have asked why my baby is not purebred?

The dataset does not include mix-breed dogs. Based on the “2017–2018 AVMA Sourcebook,” 51.3% are mutts, and the “2021–2022 APPA Survey” said 54% are mongrels. Thus, the model should predict about half the time with low confidence in classifying canine breeds in the real world of dogs.

The central question is whether the Deep Learning model is generalized to include the classification of mix-breed canines? To put it more dramatically, [because we are in California, la-la land], “Is the AI smarter than I build it?”

Fair disclosure, I am not a dog whisperer, and I can’t tell the difference between a French Bulldog and a Boxer. The Deep Learning model does not have rules and traits, like pointier ears, shinier hairs, or longer noses. It has no rules, and it does not follow cause and effect. The model doesn’t even know what a dog is.

Deep Learning, also known as Artificial Neural Networks (ANN), was theorized by Warren McCulloch and Walter Pitts in 1943 in their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Since then, thousands upon thousands of researchers, scientists, and practitioners have worked together and freely posted their results on opensource. We are truly standing on the shoulder of giants.

Behind the curtain is nothing but matrix multiplication using the GPU parallel processors and activation functions. Thus I might concede that mathematics could pick out canine physical traits and generalize on the emerging vectors.

I am in no position to say “yes or no.” Ada, Maxi, and I clean and organize the data, select a transfer learning model, fetch the learning rate, feed it to the Fastai framework, pop, and out comes the model.

If you feel good about the prediction, the answer is “yes.” If not, then it could be a False Positive or True Negative. Your dog may not be a mix-breed. For model accuracy, please refer to the “F1-Score, Precision, and Recall” graph. Ada and Maxi are canine, but they are imaginary. They told me, “…personally. It doesn’t matter if I am purebred or not. I am as I am.”

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Demystify AI Series

  1. Hot off the press. 120 Dog Breeds on Hugging Face | on LinkedIn, on Medium (June 2022)
  2. The Healthcare Garden Architecture | on LinkedIn, on Medium (May 12, 2022)
  3. “AI Start Here (A1SH)” | on GitHub (July 2021)
  4. “Fast.ai Book Study Group #G1FA” on LinkedIn | on GitHub (January 2021)
  5. “Augmentation Data Deep Dive (AUD3)” on LinkedIn | on GitHub (December 2020)
  6. “Demystify Neural Network NLP Input-data and Tokenizer” on LinkedIn | on GitHub (November 2020)
  7. “Python 3D Visualization “ on LinkedIn | on GitHub (September 2020)
  8. “Demystify Python 2D Charts” on LinkedIn | on GitHub (September 2020)
  9. “Norwegian Blue Parrot, The “k2fa” AI” on LinkedIn | on K2fa-Website (August 2020)
  10. “The Texas Two-Step, The Hero of Digital Chaos” on LinkedIn (February 2020)
  11. “Be Nice 2020” on Website (January 2020)

--

--

Duc Haba

AI Solution Architect at YML.co. Prior Xerox PARC, Oracle, GreenTomato, Viant, CEO Swiftbot, CTO RRKIDZ.com