Ex) Article Title, Author, Keywords
pISSN 1598-298X
eISSN 2384-0749
Ex) Article Title, Author, Keywords
J Vet Clin 2023; 40(4): 243-259
https://doi.org/10.17555/jvc.2023.40.4.243
Published online August 31, 2023
Correspondence to:*kdmin@cbnu.ac.kr
Copyright © The Korean Society of Veterinary Clinics.
Machine learning and deep learning (ML/DL) algorithms have been successfully applied in medical practice. However, their application in veterinary medicine is relatively limited, possibly due to a lack in the quantity and quality of relevant research. Because the potential demands for ML/DL applications in veterinary clinics are significant, it is important to note the current gaps in the literature and explore the possible directions for advancement in this field. Thus, a scoping review was conducted as a situation analysis. We developed a search strategy following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. PubMed and Embase databases were used in the initial search. The identified items were screened based on predefined inclusion and exclusion criteria. Information regarding model development, quality of validation, and model performance was extracted from the included studies. The current review found 55 studies that passed the criteria. In terms of target animals, the number of studies on industrial animals was similar to that on companion animals. Quantitative scarcity of prediction studies (n = 11, including duplications) was revealed in both industrial and non-industrial animal studies compared to diagnostic studies (n = 45, including duplications). Qualitative limitations were also identified, especially regarding validation methodologies. Considering these gaps in the literature, future studies examining the prediction and validation processes, which employ a prospective and multi-center approach, are highly recommended. Veterinary practitioners should acknowledge the current limitations in this field and adopt a receptive and critical attitude towards these new technologies to avoid their abuse.
Keywords: machine learning, deep learning, veterinary clinics, scoping review, validation
The application of machine learning and deep learning (ML/DL) algorithms has altered the medical landscape. With respect to medical diagnostics, such as radiology and pathology, a growing number of studies have reported the reliable performance of ML/DL-based automatic systems (61) which are equivalent to or even better than those of human experts (40). Based on accumulated research, numerous commercialized medical devices have been officially approved for use in clinical practice, especially in countries such as Europe and the USA (49). Moreover, various digital biomarkers have been developed to predict the prognosis of chronic diseases such as cancer and cardiovascular diseases (46).
However, ML/DL applications in veterinary medicine seem to be far behind that in human medicine in terms of both quantity and quality, especially in South Korea. One of the major reasons for this slow progress can be the lack of high-quality medical data. Although most veterinary clinics use electronic medical chart systems (38), the analyzable data present in them is insufficient. Most veterinarians have not been educated and motivated regarding appropriate charting, especially in South Korea, where an insurance system for animals is lacking and purchasing some medical drugs is possible without a veterinarian’s prescription. Even if medical records are accumulated appropriately, merging multi-clinic medical records remains challenging owing to the lack of standardized medical coding classification systems (80).
However, the demand for ML/DL applications has increased with the rapid growth of the veterinary industry. Applications in industrial animal husbandry, disease screening, and medical data management have been developed, representing its growth potential (26). This growth could have positive implications, such as pioneering new markets and improving the quality of veterinary medical services. However, it could also increase the possibility of misuse and abuse, considering the challenges regarding data quality. Veterinarians who are practically involved in clinics should have a proper level of ML/DL literacy to prevent misuse and abuse and guide its development in a constructive manner.
Therefore, a scoping review was conducted to clarify current ML/DL applications in veterinary medicine and explore the directions for the advancement of this field. In this review, the application scope (specific domains in which the ML/DL methodology was applied), methodological details, and medical utility (Performance of the ML/DL models) of previously published relevant studies were investigated.
A scoping review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (53). Two electronic scientific literature databases (PubMed and Embase) were used to identify published studies that used ML/DL algorithms in veterinary clinics, especially those examining diagnostics and prognosis prediction. The search terms were developed (Supplementary Tables 1-2) based on two previous studies that examined artificial intelligence in the medical field (52) and veterinary medicine (62).
An initial search using these terms was conducted on September 22, 2022, and screening processes were implemented using predetermined inclusion/exclusion criteria. The inclusion criteria were as follows: 1) ML/DL applications in veterinary medicine focusing on diagnostics and prognosis prediction; 2) applications to companion animals, industrial animals, and/or wildlife; 3) studies written in English; and 4) original articles. The exclusion criteria were as follows: 1) applications to experimental animals; 2) population-level studies whose analysis unit was the aggregate level; 3) applications to animal husbandry or management (e.g., pregnancy detection); 4) applications to drug discovery; 5) applications to biosecurity; and 6) methodological studies that focused on algorithm development, optimization, or automatic data labelling. During the first screening process, the titles and abstracts of each searched paper were checked. In the second screening process, the full text was examined, and the final list of the included studies was determined.
For each included study the following information was extracted: 1) authors, publication year, and title; 2) purpose of study; 3) target animals; 4) number of samples used for ML/DL models; 5) algorithm types (e.g. artificial neural network, recurrent neural network [RNN], or other types); 6) whether the study used cross-validation framework to assess model performance; 7) whether the study collected test dataset prospectively or retrospectively; 8) whether the study used multi-center datasets for model building or validation; 9) measurements used for the performance of models (e.g., sensitivity, specificity, or other measurements); and 10) model performances.
The screening process for this scoping review is illustrated in Fig. 1. In the initial search, 598 and 376 studies were found in the PubMed and Embase databases, respectively. After removing duplicate studies, 699 studies were identified. A total of 532 and 112 papers were excluded after the first and second screenings, respectively. The remaining 55 studies were included. Detailed information on the included papers is provided in Tables 1, 2.
Table 1 General information regarding included studies in the review
Author and year | Animal type | Target animals | Sample size | Algorithm |
---|---|---|---|---|
G. Theodoropoulos et al., 2000 (71) | Domestic | Sheep | 255 images of 57 individual larvae (5genera) | ANN (artificial neural network; feature selection by manual, 16 features were measured) |
W. B. Roush et al., 2001 (63) | Domestic | Chicken | Case 6-40, normal 33-91 | BP3(back propagation neural network), WardBP (Ward back propagation neural network), PNN (Probabilistic neural network), GRNN (general regression neural network) |
H. Schobesberger and C. Peham, 2002 (66) | Domestic | Horse | 175 (42 control/ 133 low to medium grade lame) | ANN (feature selection by manual) |
K. G. Keegan et al., 2003 (32) | Domestic | Horse | 12 adult horse | ANN (feature selection by manual) |
M. E. Pastell and M. Kujalaf, 2007 (56) | Domestic | Dairy cow | 73 cows (training 37 cows, 5,074 observation, validation 36 cows, 4,868 measurements) | Probabilistic Neural Network Model (feature selection by manual) |
S. M. Ghotoorlar et al., 2012 (25) | Domestic | Dairy cow | 105 dairy cows | ANN (feature selection by manual) |
T. Banzato et al., 2018 (4) | Companion | Canine | 80 (56 meningioma, 24 glioma) | Convolutional neural networks (CNN), GoogleNet |
T. Banzato et al., 2018 (5) | Companion | Canine | 48 (32 case, 16 control) | Deep neural networks (DNN), especially AlexNet |
T. Banzato et al., 2018 (6) | Companion | Canine | 56 (grade 1 = 26, grade 2 = 22, grade 3 = 8) | AlexNet, DNN |
A. Yakubu et al., 2018 (73) | Domestic | Chicken | 167 | ANN |
Y. Yoon et al., 2018 (75) | Companion | Dogs | 3,142 for cardiomegaly (1,571 normal and 1,571 abnormal from 1,143 dogs), 2,086 for lung pattern (1,043 normal and 1,043 abnormal from 1,247 dogs), 892 for mediastinal shift (446 normal and 446 abnormal from 387 dogs), 940 for pleural effusion (470 normal and 470 abnormal from 284 dogs), and 78 for pneumothorax (39 normal and 39 abnormal from 61 dogs) | Bag-of-features (BOF) and CNN |
R. Bradley et al., 2019 (15) | Companion | Cat | 106,251 cats | Recurrent Neural Network (RNN) |
M. Ebrahimi et al., 2019 (20) | Domestic | Cow | 297,004 milking samples each with eight milking features | ANN, Naïve Bayes, GLM, Decision tree, Random forest, Gradient boosted tree |
J. Y. Kim et al., 2019 (35) | Companion | Dogs | 1,040 images | CNN (GoogLe net, Resnet, and VGGnet) |
M. Aubreville et al., 2020 (3) | Companion | Dogs | 32 whole slide images | CNN, RetinaNet, ResNet-18, Unet |
V. Biourge et al., 2020 (12) | Companion | Cats | 218 | ANN |
L. E. Broughton-Neiswanger et al., 2020 (16) | Companion | Cats | 12 | Partial least squares discriminant analysis, Random forest |
S. Burti et al., 2020 (17) | Companion | Dogs | 1,465 images | CNN |
E. Fernández-Carrión et al., 2020 (22) | Etc. | Wild boar | 8 | CNN |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Domestic | Horse | 285 horses | Bayes Network, Naïve Bayes, DNN, Random forest |
X. Kang et al., 2020 (30) | Domestic | Cow | 100 cows | RFB_NET_SSD deep learning network |
N. Kil et al., 2020 (33) | Domestic | Horse | 34 horses (65 video) | CNN |
S. Li et al., 2020 (39) | Companion | Dogs | 792 radiographs | CNN |
C. Marzahl et al., 2020 (42) | Domestic | Horse | 17 completely annotated cytology whole slide images (WSI) containing 78,047 hemosiderophages | CNN (RetinaNet) |
S. Mouloodi et al., 2020 (47) | Domestic | Horse | 3 third metacarpal bones from 3 racehorses | ANN |
S. Mouloodi et al., 2020 (48) | Domestic | Horse | 9 equine third metacarpal bones from 9 thoroughbred horses | ANN |
Y. Nagamori et al., 2020 (50) | Companion | Cat, dogs | 100 | CNN |
C. Post et al., 2020 (59) | Domestic | Cow | 167 cows | Logistic Regression (LR), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gaussian Naïve Bayes (GNB), Extra Trees Classifier (ET), Random forest |
A. R. Trachtman et al., 2020 (72) | Domestic | Pigs | 5,902 images | CNN |
T. Banzato et al., 2021 (7) | Companion | Dogs | 3,839 latero-lateral radiographs | CNN (ResNet-50, DenseNet-121) |
T. Banzato et al., 2021 (8) | Companion | Cat | 1,062 latero-lateral radiographs | CNN (ResNet 50 and Inception V3) |
A. Biercher et al., 2021 (11) | Companion | Dogs | Thoracolumbar MR images from 500 dogs | CNN |
E. Boissady et al., 2021 (13) | Companion | Cat, dogs | 30 canine and 30 feline thoracic lateral radiographs | CNN |
L. Bonicelli et al., 2021 (14) | Domestic | Pigs | 7,564 pictures | CNN |
V. Kittichai et al., 2021 (36) | Domestic | Poultry | 12,761 single cell images | CNN (Darknet, Darknet19, Darknet19-448 and Densenet201) |
Y. Nagamori et al., 2021 (51) | Companion | Cat, dogs | 460 samples for 4 parasites (80-200 per parasite) | You only look once (YOLOv3) model |
J. Park et al., 2021 (54) | Companion | Dogs | 90 dogs | HA, DLBAS, and the readjustment of the predicted data obtained via the DLBAS of the clinical test sets (HA_DLBAS) |
I. R. Porter et al., 2021 (58) | Domestic | Cattle | A total of 398 digital images from dairy cows’ udders | CNN (GoogLeNet) |
M. Salvi et al., 2021 (64) | Companion | Dogs | 416 canine cutaneous round cell tumors (RCT) (117 cases) | AlexNet, Inceptionv3, ResNet, Emsemble |
S. Shahinfar et al., 2021 (68) | Domestic | Cattle | 2,535 lameness scores (2,248 sound and 287 unsound) | Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP), to predict cases of lameness using milk production and conformation traits logistc (LR) |
Y. Ye et al., 2021 (74) | Companion | Dogs | 220 images | CNN (ResNet-50) |
M. Zhang et al., 2021 (79) | Companion | Dogs | 2,670 lateral X-ray images | CNN (HRNet) |
A.N. ELKhamary et al., 2022 (21) | Domestic | Horse | 16 horse 32 limbs (16 normal tendons and 16 abnormal tendons) | C4.5 algorithm (Quinlan), a decision tree classifier of Weka software package |
E. A. Bauer and W. Jagusiak, 2022 (9) | Domestic | Cattle | 168 cows | ANN |
K. Benfodil et al., 2022 (10) | Domestic | Dromedaries | 115 dromedaries | ANN |
L. Dumortier et al., 2022 (19) | Companion | Cat | 500 annotated Thoracic radiograph images(348 veterinary visit 296 cats) | CNN (ResNet50V2) |
P. Figueirinhas et al., 2022 (23) | Companion | Dogs | 15 working dogs (pilot study) | LSTM |
Y. Kokkinos et al., 2022 (37) | Companion | Dogs | 57,402 dogs | RNN |
A. Mao et al., 2022 (41) | Domestic | Chicken | 5,336 voice calls (3,363 distress calls and 1,973 natural barn sound) | CNN (light-VGG11) |
A. May et al., 2022 (43) | Domestic | Horse | 2,607 images | CNN |
T. R. Müller et al., 2022 (45) | Companion | Dogs | 62 canine (41 case 21 control) 4,000 images (2,000 case 2,000 control) | CNN (VGG16) |
C. Parra et al., 2022 (55) | Etc. | Reptile | 3,616 images data samples and 26 videos (4,849 frames) | CNN (MobileNet) |
T. Rai et al., 2022 (60) | Companion | Dogs | 32 patients | CNN (DenseNet-161) |
V. A. Teixeira et al., 2022 (70) | Domestic | Cattle | 55 Holstein calves | RNN |
M. ZareBidaki et al., 2022 (77) | Domestic | Goat, sheep cows | 200 paired sample (100 blood, 100 milk) 100 animals | ANN |
Table 2 Validation methodologies and model performance of the included studies in the review
Author and year | CV | Prospective | Multi-center approach | Model performance | Purpose | |||
---|---|---|---|---|---|---|---|---|
Training set | Test set | Index | Value | |||||
G. Theodoropoulos et al., 2000 (71) | Yes | No | No | No | Sensitivity | 42.4-80.7% | Diagnostics | |
W. B. Roush et al., 2001 (63) | Yes | No | No | No | Sensitivity | 0-100% | Prediction | |
H. Schobesberger and C. Peham, 2002 (66) | Yes | No | No | No | Agreement | 78.60% | Diagnostics | |
K. G. Keegan et al., 2003 (32) | Yes | No | No | No | Agreement | 85% | Diagnostics | |
M. E. Pastell and M. Kujalaf, 2007 (56) | Yes | No | No | No | Agreement and sensitivity | Agreement = 96.2% Sensitivity = 100% | Diagnostics | |
S. M. Ghotoorlar et al., 2012 (25) | Yes | No | No | No | Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Pearson correlation coefficient | Sensitivity = 0.5-1Specificity = 0.91-1PPV = 0.76-1NPV = 0.92 -1Pearson correlation coefficient = 0.94 | Diagnostics | |
T. Banzato et al., 2018 (4) | Yes | No | Yes | No | Agreement, Matthews correlation coefficient (MCC) | Agreement = 90-94% MCC = 0.8-0.88 | Diagnostics | |
T. Banzato et al., 2018 (5) | Yes | No | No | No | AUC, sensitivity, specificity | AUC = 0.91 Sensitivity = 100% Specificity = 82.8% | Diagnostics | |
T. Banzato et al., 2018 (6) | Yes | No | Yes | No | Agreement, multi-class Matthew’s correlation coefficient (MCMCC) | Agreement = 65.2-82.2% MCMCC = 0.44-0.68 | Diagnostics | |
A. Yakubu et al., 2018 (73) | Yes | No | No | No | r, R2, RMSE | r = 0.983 R2 = 0.966 RMSE = 0.04806 | Prediction | |
Y. Yoon et al., 2018 (75) | Yes | No | No | No | Accuracy, sensitivity | Accuracy(CNN; 92.9-96.9% and BOF; 79.6-96.9%) and sensitivity (CNN; 92.1-100% and BOF; 74.1-94.8%) | Prediction | |
R. Bradley et al., 2019 (15) | Yes | No | No | No | Sensitivity, specificity | (1 year before) sensitivity 63.0%; (2 year before) sensitivity 44.2% specificity remaining around 99% | Prediction | |
M. Ebrahimi et al., 2019 (20) | Yes | No | No | No | AUC | 0.826 | Prediction | |
J. Y. Kim et al., 2019 (35) | Yes | No | Yes | No | Sensitivity | 79.4-100% | Diagnostics | |
M. Aubreville et al., 2020 (3) | Yes | No | No | No | Correlation coefficient | 0.868-0.979 | Diagnostics | |
V. Biourge et al., 2020 (12) | Yes | Yes | No | Yes | Accuracy, sensitivity, specificity, PPV, NPV | Accuracy = 88% Sensitivity = 87% Specificity = 70% PPV = 53% NPV = 92% | Prediction | |
L. E. Broughton-Neiswanger et al., 2020 (16) | Yes | No | No | No | Sensitivity, specificity, AUC | AUC = 0.87-1Sensitivity = 0-100%Specificity = 50-100% | Diagnostics | |
S. Burti et al., 2020 (17) | Yes | No | No | No | AUC | 0.904-0.973 | Diagnostics | |
E. Fernández-Carrión et al., 2020 (22) | Yes | No | No | No | Agreement | 95.4-97.2% | Diagnostics | |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Yes | No | No | No | Precision, recall, F-measure, Accuracy | (need for surgery)Precision = 69.5-74.1%Recall = 72.4-99.3%F-measure = 72.2-81.8%Accuracy = 69.0-76.0%(survival)Precision = 87.5-97.4%Recall = 80.5-87.8%F-measure = 87.2-89.1%Accuracy = 83.9-85.2% | Prediction | |
X. Kang et al., 2020 (30) | Yes | No | No | No | Sensitivity, specificity | Sensitivity = 0.83-1Specificity = 0.95-1 | Diagnostics | |
N. Kil et al., 2020 (33) | Yes | No | No | No | Sensitivity, accuracy | Sensitivity = 0.79-0.94Accuracy = 0.82-0.94 | Diagnostics | |
S. Li et al., 2020 (39) | Yes | No | No | No | Accuracy, sensitivity, and specificity | Accuracy = 82.71% Sensitivity = 68.42% Specificity = 87.09% | Diagnostics | |
C. Marzahl et al., 2020 (42) | Yes | No | No | No | Precision | 0.64-0.66 | Diagnostics | |
S. Mouloodi et al., 2020 (47) | Yes | No | No | No | Determination coefficient (R2) | 0.9116-0.9599 | Prediction | |
S. Mouloodi et al., 2020 (48) | Yes | No | No | No | Determination coefficient (R2) | 0.9999 | Prediction | |
Y. Nagamori et al., 2020 (50) | Yes | No | No | No | Pearson correlation coefficient, sensitivity, specificity | Pearson correlation coefficient = 0.89-0.99Sensitivity = 0.758-1Specificity = 0.918-1 | Diagnostics | |
C. Post et al., 2020 (59) | Yes | No | No | No | AUC | 0.71-0.79 | Diagnostics | |
A. R. Trachtman et al., 2020 (72) | Yes | No | No | No | Accuracy, sensitivity, specificity | Accuracy = 62-96%Sensitivity = 84-100%Specificity = 92-96% | Diagnostics | |
T. Banzato et al., 2021 (7) | Yes | No | No | No | AUC | 0.8 | Diagnostics | |
T. Banzato et al., 2021 (8) | Yes | No | Yes | No | AUC | 0.58-0.97 | Diagnostics | |
A. Biercher et al., 2021 (11) | Yes | No | Yes | Yes | Sensitivity, specificity | IVDE sens 73.46-90.1/spec 67.6-99.0IVDP sens 67.86-100/spec 74.9-96.4FCE/ANNPE sens 62.2-90.1/spec 90.1-97.9Syringomyelia sens 0-10/spec 100Neoplasma sens 0-37.5/spec 60-94.7 | Diagnostics | |
E. Boissady et al., 2021 (13) | NA | No | No | No | ICC | 0.998-0.999 | Diagnostics | |
L. Bonicelli et al., 2021 (14) | Yes | No | Yes | Yes | Sensitivity, specificity, Pearson correlation coefficient | Sensitivity = 81.25-100 % Specificity = 99.38 % Pearson correlation coefficient = 0.96 | Diagnostics | |
V. Kittichai et al., 2021 (36) | Yes | No | NA | NA | Accuracy | 99% | Dignostics | |
Y. Nagamori et al., 2021 (51) | NA | NA | YES | NA | Sensitivity, specificity | Sensitivity = 75.8-100% Specificity = 93.1-100% | Dignostics | |
J. Park et al., 2021 (54) | Yes | No | No | No | Dice similarity coefficient (DSC) and the Hausdorff distance (HD) | DSC 0.78-0.94 HD 2.30-4.30 mm | Dignostics | |
I. R. Porter et al., 2021 (58) | Yes | No | Yes | Yes | AUC | 0.542-0.920 | Dignostics | |
M. Salvi et al., 2021 (64) | Yes | No | Yes | Yes | Accuracy | 91.66%-100% | Dignostics | |
S. Shahinfar et al., 2021 (68) | Yes | No | Yes | Yes | AUC, F1 | AUC = 0.61-0.67 F1 = 0.01-0.27 | Dignostics | |
Y. Ye et al., 2021 (74) | Yes | No | NA | NA | AUC, accuracy, F1 score | AUC = 99.37Accuracy = 97.62 F1 score = 96.7 | Dignostics | |
M. Zhang et al., 2021 (79) | Yes | No | Yes | Yes | Sensitivity | 86.40% | Dignostics | |
A.N. ELKhamary et al., 2022 (21) | Yes | No | No | No | Accuracy, PPV, sensitivity, kappa | Accuracy = 93.7% PPV = 93.80% Sensitivity = 93.80% Kappa = 0.88 | Dignostics | |
E. A. Bauer and W. Jagusiak, 2022 (9) | Yes | No | YES | YES | AUC | 0.82-0.89 | Dignostics | |
K. Benfodil et al., 2022 (10) | Yes | No | NA | NA | Pearson correlation coefficient | 0.943 | Dignostics | |
L. Dumortier et al., 2022 (19) | Yes | No | No | No | Accuracy, F1-Score, Specificity, Positive Predictive Value and Sensitivity | Accuracy = 82% F1-Score = 85% Specificity = 75% PPV = 81% Sensitivity = 88% | Dignostics | |
P. Figueirinhas et al., 2022 (23) | Yes | No | No | No | Accuracy | Accuracy = 60% | Dignostics | |
Y. Kokkinos et al., 2022 (37) | Yes | No | No | No | Sensitivity, PPV, NPV | Sensitivity = 44.8-68.8% PPV = 15-23% NPV > 99% | Prediction | |
A. Mao et al., 2022 (41) | Yes | No | Yes | Yes | Precision, recall, F1-score and accuracy | Precision = 94.58% Recall = 94.89% F1-score = 94.73% Accuracy = 95.07% | Dignostics | |
A. May et al., 2022 (43) | Yes | No | No | No | Accuracy, cross entropy | Accuracy = 96.66% Cross entropy = 0.02 | Dignostics | |
T. R. Müller et al., 2022 (45) | Yes | No | No | No | Accuracy, sens, spec, PPV, NPV | Accuracy 88.7% Sensitivity 90.2% Specificity 81.8% PPV 92.5% NPV 81.8% | Dignostics | |
C. Parra et al., 2022 (55) | Yes | NA | NA | NA | Accuracy, AUC | Accuracy = 94.26 AUC = 0.996 | Dignostics | |
T. Rai et al., 2022 (60) | Yes | No | No | No | F1-score | 0.708 | Dignostics | |
V. A. Teixeira et al., 2022 (70) | Yes | No | No | No | Accuracy, sensitivity, and specificity, PPV, NPV | Accuracy = 85-98, Sensitivity = 87-96 Specificity = 78-100 PPV = 85-100 NPV = 88-96 | Prediction & diagnosis | |
M. ZareBidaki et al., 2022 (77) | Yes | No | NA | NA | Sensitivity, specificity, AUC | Sensitivity = 81% Specificity = 62% AUC = 0.799 | Dignostics |
The temporal trends in ML/DL-related publications are illustrated in Fig. 2. Although most of these studies were published after 2000, a rapid growth in their quantity began in 2018. Before this surge, the applications of ML/DL were concentrated in industrial animals; however, their applications in companion animals have been expanding since 2018. Only a few studies on other animal species (wildlife and exotic animals) have been published, even after 2020.
Fig. 3 shows the proportion of the specific purposes of each study, such as target animal species and domains of application (whether ML/DL was used for predictive or diagnostic purposes). While the number of studies for both industrial and non-industrial animals was similar (31 and 30 for non-industrial and industrial animals, respectively, including duplicates), the number of diagnostic studies was higher than that of prediction studies (the number of diagnostic and prediction studies were 45 and 11, respectively, including duplicates). In terms of specific animal species, studies on dogs were generally dominant among studies on non-industrial animals (70.3% of diagnostic studies and 50.0% of prediction studies), while studies on cows (39.1% of diagnostic studies and 28.6% of prediction studies) and horses (26.1% of diagnostic studies and 42.9% of prediction studies) were dominant among studies on industrial animals.
Table 3 shows details regarding the identified studies, including the sample size used for model development and validation, the algorithm used, whether the authors employed prospective data collection for validation, whether they used multi-center data for model development and validation, and model performance. In terms of validation, almost every publication stated that they implemented cross-validation (splitting data into training and test sets to avoid over-evaluation), although there was an insufficiency in the relevant descriptions in some of the studies (n = 2). However, a minority of the studies employed a multi-center approach for model development (n = 13) and validation (n = 9), and only one study prospectively collected the test datasets. The majority of the identified studies used neural network-based algorithms, such as RNN and convolutional neural network, and most of the studies targeted binary problems rather than continuous outcomes. Although the numbers of data that used for model development are relatively small for several studies (16,22,33), the reported model performance of most studies tended to be within an acceptable range (e.g., Area Under the Receiver Operating Characteristic Curve (AUC) value >0.9).
Table 3 Profile of included studies
Target animal type | Study purposes | N* | NN† | CV‡ | Pros§ | Multi∥ |
---|---|---|---|---|---|---|
Industrial animals | Diagnostics | 21 | 19 | 21 | 0 | 5 |
Prediction | 7 | 7 | 7 | 0 | 0 | |
Companion animals | Diagnostics | 22 | 21 | 20 | 0 | 2 |
Prediction | 4 | 4 | 4 | 1 | 1 | |
Others¶ | Diagnostics | 2 | 2 | 2 | 0 | 0 |
*Number of studies.
†Number of studies that used neural network-based algorithm.
‡Number of studies that conducted cross-validation approach to measure performance.
§Number of studies that employed prospective approach for collecting dataset for testing.
∥Number of studies that used multi-center data for validation.
¶The others group includes wildlife and exotic animals.
Note: The numbers include duplication. For example, a study for industrial animals have both purpose, diagnostics and prediction. There is no prediction studies for the other animals.
A scoping review was conducted as a situation analysis to identify the current gaps in ML/DL application research in veterinary clinics and suggest directions for further improvement in this field. The review found that the history of ML/DL applications in veterinary medicine is relatively short compared to that in human medicine and the healthcare sector (31). Possibly due to its short history, quantitative scarcity and methodological gaps were identified, especially regarding the validation and data collection framework, although the reported model performance was generally within acceptable levels.
The first gap that must be highlighted is quantitative scarcity. Although there is a possibility that the current review will exclude published papers, it seems clear that the relevant papers are fewer than those in the human medical field (2,52,67,69). Specifically, prediction studies were scarce, possibly because of their technical difficulties. They usually include extrapolation because the prediction target is future data. Considering that extrapolation is more sensitive to overfitting and a lack of variables, the performance of the model tends to be lower than that of the models for interpolation (57). However, prediction studies are practically useful because they can be employed for optimal treatment recommendations and prognostic assessment, which are the most frequent practices in veterinary clinics. Purification has also been observed in studies on wildlife. Lack of data may explain this discrepancy. Compared with medicine for companion and industrial animals, wildlife medicine covers more animal species with less resources. Therefore, the quantity of data for each species is usually lower than that for other medical areas, even though large amounts of data regarding specific species and medical problems are required for ML/DL applications.
Qualitative gaps in model validation should be emphasized. Considering that ML/DL approaches cannot inherently employ physiological or pathological mechanisms, an innate limitation of this data-driven approach is overfitting and induction. The issues can be practically addressed by demonstrating acceptable performance in an independent dataset, which is called cross-validation. Most of the studies identified in this review employed this approach. However, the current review found that only a few of them have obtained appropriate test sets. As the selection of the test set is essential for its validation, the representativeness of the test set must be ensured (27). Therefore, prospective data collection from multiple centers is the best way to ensure this representativeness (34,78). Veterinary clinicians should be aware of the qualitative gaps in current ML/DL application studies to avoid possible misuse of these models in clinical practice.
From the veterinary clinicians’ point of view, excellent model performance alone is not sufficient to recommend its practical use. For instance, even if some ML/DL models show very high AUC, representing great performance in diagnostics, the operation of the model could require a significant amount of manpower, time, or cost, making its usage unaffordable, especially for single-veterinarian clinics. In this regard, successful future studies need to consider the practical applicability as well (29).
Despite these gaps, there are prominent opportunities to improve research on ML/DL applications in veterinary medicine. First, privacy issues are relatively minor, when compared with human medicine. In it, data merging between hospitals and clinics is challenging owing to these issues. Therefore, the major approach in human medicine is the common data model which standardizes the data structure of each institution, facilitating meta-analysis (1,76) rather than merged big data analysis. In contrast, multi-clinic data can be merged without privacy issues in veterinary sectors, and the veterinary compass (44) and Small Animal Veterinary Surveillance Network (28,65) showed these opportunities. Furthermore, the cost of data collection in veterinary medicine, especially for continuous data, may be lower than that in human medicine. Recently, the collection of continuous data and extraction of significant signals using wearable devices (18) has become a leading research topic. In these research areas, veterinary medicine has more opportunities than in human medicine, because employing animal subjects costs less than employing human participants; additionally, compliance in applying the device could be higher in animal subjects than in human participants.
Improving the application of ML/DL in veterinary clinics necessitates the fulfillment of two essential conditions. First, the establishment of a standardized encoding system is crucial. To achieve reliable prediction performance, high-quality big data is indispensable. Considering that the medical big data should be collected by multiple institutions, a unified coding system for diseases diagnosis and prescription is essential to successfully amalgamate data from various sources. However, currently, medical records predominantly rely on free text-based descriptions which is challenging to be standardized. Although automatic encoding systems that translate free text to medical codes have been developed (78), no system is customized currently. Secondly, fostering sustainable motivation among veterinarians for accurate recording is important. The absence of a national insurance system for animal medicine has led to a lack of incentives for veterinarians to ensure precise encoding. Addressing this challenge entails appropriately valuating medical records provided by veterinary clinicians. Currently, the value of such data is not accurately evaluated, and most data utilized in ML/DL models have been acquired without enough compensation to veterinarians. Offering proper remuneration for their data contributions could incentivize them to maintain accurate recording practices (Fig. 4).
This study has some limitations. First, the reviews were conducted by a single researcher. Because the standard review process generally requires at least two researchers to increase the sensitivity and specificity of the screening process, several studies, that should have been included, could have been excluded. Second, this study included only original papers and other types of publications were excluded. Because studies on state-of-the-art methodologies can be published as conference abstracts, several studies may not have been reviewed in this study. Although this preliminary review study successfully revealed current gaps especially for validation methodologies, further studies are highly recommended to address the limitation, confirm the gaps and support the suggestions in this study. The follow-up studies should employ standard review process with at least two independent researchers and include grey articles that report up-to-date technologies.
In this review, I examined studies that covered the application of ML/DL in veterinary clinics. This revealed several gaps in the methodology and validation, that could help future studies improve their quality and allow readers to better screen appropriate veterinary studies. In the era of artificial intelligence, the expanding demand for their application in veterinary clinics is unavoidable. Furthermore, demand-driven active research using proper methodologies can fundamentally improve clinical services. In this regard, researchers should keep practical feasibility in mind when tackling methodology and model performance; moreover, veterinary clinicians should adopt a receptive and critical stance towards these new changes.
This work was supported by a funding for the academic research program of Chungbuk National University in 2022. In addition, this work was carried out with the support of “Cooperative Research Program for Agriculture Science and Technology Development (Project No. RS-2023-00232301).“ Rural Development Administration, Republic of Korea.
The author has no conflicting interests.
J Vet Clin 2023; 40(4): 243-259
Published online August 31, 2023 https://doi.org/10.17555/jvc.2023.40.4.243
Copyright © The Korean Society of Veterinary Clinics.
College of Veterinary Medicine, Chungbuk National University, Cheongju 28644, Korea
Correspondence to:*kdmin@cbnu.ac.kr
This is an open access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Machine learning and deep learning (ML/DL) algorithms have been successfully applied in medical practice. However, their application in veterinary medicine is relatively limited, possibly due to a lack in the quantity and quality of relevant research. Because the potential demands for ML/DL applications in veterinary clinics are significant, it is important to note the current gaps in the literature and explore the possible directions for advancement in this field. Thus, a scoping review was conducted as a situation analysis. We developed a search strategy following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. PubMed and Embase databases were used in the initial search. The identified items were screened based on predefined inclusion and exclusion criteria. Information regarding model development, quality of validation, and model performance was extracted from the included studies. The current review found 55 studies that passed the criteria. In terms of target animals, the number of studies on industrial animals was similar to that on companion animals. Quantitative scarcity of prediction studies (n = 11, including duplications) was revealed in both industrial and non-industrial animal studies compared to diagnostic studies (n = 45, including duplications). Qualitative limitations were also identified, especially regarding validation methodologies. Considering these gaps in the literature, future studies examining the prediction and validation processes, which employ a prospective and multi-center approach, are highly recommended. Veterinary practitioners should acknowledge the current limitations in this field and adopt a receptive and critical attitude towards these new technologies to avoid their abuse.
Keywords: machine learning, deep learning, veterinary clinics, scoping review, validation
The application of machine learning and deep learning (ML/DL) algorithms has altered the medical landscape. With respect to medical diagnostics, such as radiology and pathology, a growing number of studies have reported the reliable performance of ML/DL-based automatic systems (61) which are equivalent to or even better than those of human experts (40). Based on accumulated research, numerous commercialized medical devices have been officially approved for use in clinical practice, especially in countries such as Europe and the USA (49). Moreover, various digital biomarkers have been developed to predict the prognosis of chronic diseases such as cancer and cardiovascular diseases (46).
However, ML/DL applications in veterinary medicine seem to be far behind that in human medicine in terms of both quantity and quality, especially in South Korea. One of the major reasons for this slow progress can be the lack of high-quality medical data. Although most veterinary clinics use electronic medical chart systems (38), the analyzable data present in them is insufficient. Most veterinarians have not been educated and motivated regarding appropriate charting, especially in South Korea, where an insurance system for animals is lacking and purchasing some medical drugs is possible without a veterinarian’s prescription. Even if medical records are accumulated appropriately, merging multi-clinic medical records remains challenging owing to the lack of standardized medical coding classification systems (80).
However, the demand for ML/DL applications has increased with the rapid growth of the veterinary industry. Applications in industrial animal husbandry, disease screening, and medical data management have been developed, representing its growth potential (26). This growth could have positive implications, such as pioneering new markets and improving the quality of veterinary medical services. However, it could also increase the possibility of misuse and abuse, considering the challenges regarding data quality. Veterinarians who are practically involved in clinics should have a proper level of ML/DL literacy to prevent misuse and abuse and guide its development in a constructive manner.
Therefore, a scoping review was conducted to clarify current ML/DL applications in veterinary medicine and explore the directions for the advancement of this field. In this review, the application scope (specific domains in which the ML/DL methodology was applied), methodological details, and medical utility (Performance of the ML/DL models) of previously published relevant studies were investigated.
A scoping review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (53). Two electronic scientific literature databases (PubMed and Embase) were used to identify published studies that used ML/DL algorithms in veterinary clinics, especially those examining diagnostics and prognosis prediction. The search terms were developed (Supplementary Tables 1-2) based on two previous studies that examined artificial intelligence in the medical field (52) and veterinary medicine (62).
An initial search using these terms was conducted on September 22, 2022, and screening processes were implemented using predetermined inclusion/exclusion criteria. The inclusion criteria were as follows: 1) ML/DL applications in veterinary medicine focusing on diagnostics and prognosis prediction; 2) applications to companion animals, industrial animals, and/or wildlife; 3) studies written in English; and 4) original articles. The exclusion criteria were as follows: 1) applications to experimental animals; 2) population-level studies whose analysis unit was the aggregate level; 3) applications to animal husbandry or management (e.g., pregnancy detection); 4) applications to drug discovery; 5) applications to biosecurity; and 6) methodological studies that focused on algorithm development, optimization, or automatic data labelling. During the first screening process, the titles and abstracts of each searched paper were checked. In the second screening process, the full text was examined, and the final list of the included studies was determined.
For each included study the following information was extracted: 1) authors, publication year, and title; 2) purpose of study; 3) target animals; 4) number of samples used for ML/DL models; 5) algorithm types (e.g. artificial neural network, recurrent neural network [RNN], or other types); 6) whether the study used cross-validation framework to assess model performance; 7) whether the study collected test dataset prospectively or retrospectively; 8) whether the study used multi-center datasets for model building or validation; 9) measurements used for the performance of models (e.g., sensitivity, specificity, or other measurements); and 10) model performances.
The screening process for this scoping review is illustrated in Fig. 1. In the initial search, 598 and 376 studies were found in the PubMed and Embase databases, respectively. After removing duplicate studies, 699 studies were identified. A total of 532 and 112 papers were excluded after the first and second screenings, respectively. The remaining 55 studies were included. Detailed information on the included papers is provided in Tables 1, 2.
Table 1 . General information regarding included studies in the review.
Author and year | Animal type | Target animals | Sample size | Algorithm |
---|---|---|---|---|
G. Theodoropoulos et al., 2000 (71) | Domestic | Sheep | 255 images of 57 individual larvae (5genera) | ANN (artificial neural network; feature selection by manual, 16 features were measured) |
W. B. Roush et al., 2001 (63) | Domestic | Chicken | Case 6-40, normal 33-91 | BP3(back propagation neural network), WardBP (Ward back propagation neural network), PNN (Probabilistic neural network), GRNN (general regression neural network) |
H. Schobesberger and C. Peham, 2002 (66) | Domestic | Horse | 175 (42 control/ 133 low to medium grade lame) | ANN (feature selection by manual) |
K. G. Keegan et al., 2003 (32) | Domestic | Horse | 12 adult horse | ANN (feature selection by manual) |
M. E. Pastell and M. Kujalaf, 2007 (56) | Domestic | Dairy cow | 73 cows (training 37 cows, 5,074 observation, validation 36 cows, 4,868 measurements) | Probabilistic Neural Network Model (feature selection by manual) |
S. M. Ghotoorlar et al., 2012 (25) | Domestic | Dairy cow | 105 dairy cows | ANN (feature selection by manual) |
T. Banzato et al., 2018 (4) | Companion | Canine | 80 (56 meningioma, 24 glioma) | Convolutional neural networks (CNN), GoogleNet |
T. Banzato et al., 2018 (5) | Companion | Canine | 48 (32 case, 16 control) | Deep neural networks (DNN), especially AlexNet |
T. Banzato et al., 2018 (6) | Companion | Canine | 56 (grade 1 = 26, grade 2 = 22, grade 3 = 8) | AlexNet, DNN |
A. Yakubu et al., 2018 (73) | Domestic | Chicken | 167 | ANN |
Y. Yoon et al., 2018 (75) | Companion | Dogs | 3,142 for cardiomegaly (1,571 normal and 1,571 abnormal from 1,143 dogs), 2,086 for lung pattern (1,043 normal and 1,043 abnormal from 1,247 dogs), 892 for mediastinal shift (446 normal and 446 abnormal from 387 dogs), 940 for pleural effusion (470 normal and 470 abnormal from 284 dogs), and 78 for pneumothorax (39 normal and 39 abnormal from 61 dogs) | Bag-of-features (BOF) and CNN |
R. Bradley et al., 2019 (15) | Companion | Cat | 106,251 cats | Recurrent Neural Network (RNN) |
M. Ebrahimi et al., 2019 (20) | Domestic | Cow | 297,004 milking samples each with eight milking features | ANN, Naïve Bayes, GLM, Decision tree, Random forest, Gradient boosted tree |
J. Y. Kim et al., 2019 (35) | Companion | Dogs | 1,040 images | CNN (GoogLe net, Resnet, and VGGnet) |
M. Aubreville et al., 2020 (3) | Companion | Dogs | 32 whole slide images | CNN, RetinaNet, ResNet-18, Unet |
V. Biourge et al., 2020 (12) | Companion | Cats | 218 | ANN |
L. E. Broughton-Neiswanger et al., 2020 (16) | Companion | Cats | 12 | Partial least squares discriminant analysis, Random forest |
S. Burti et al., 2020 (17) | Companion | Dogs | 1,465 images | CNN |
E. Fernández-Carrión et al., 2020 (22) | Etc. | Wild boar | 8 | CNN |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Domestic | Horse | 285 horses | Bayes Network, Naïve Bayes, DNN, Random forest |
X. Kang et al., 2020 (30) | Domestic | Cow | 100 cows | RFB_NET_SSD deep learning network |
N. Kil et al., 2020 (33) | Domestic | Horse | 34 horses (65 video) | CNN |
S. Li et al., 2020 (39) | Companion | Dogs | 792 radiographs | CNN |
C. Marzahl et al., 2020 (42) | Domestic | Horse | 17 completely annotated cytology whole slide images (WSI) containing 78,047 hemosiderophages | CNN (RetinaNet) |
S. Mouloodi et al., 2020 (47) | Domestic | Horse | 3 third metacarpal bones from 3 racehorses | ANN |
S. Mouloodi et al., 2020 (48) | Domestic | Horse | 9 equine third metacarpal bones from 9 thoroughbred horses | ANN |
Y. Nagamori et al., 2020 (50) | Companion | Cat, dogs | 100 | CNN |
C. Post et al., 2020 (59) | Domestic | Cow | 167 cows | Logistic Regression (LR), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gaussian Naïve Bayes (GNB), Extra Trees Classifier (ET), Random forest |
A. R. Trachtman et al., 2020 (72) | Domestic | Pigs | 5,902 images | CNN |
T. Banzato et al., 2021 (7) | Companion | Dogs | 3,839 latero-lateral radiographs | CNN (ResNet-50, DenseNet-121) |
T. Banzato et al., 2021 (8) | Companion | Cat | 1,062 latero-lateral radiographs | CNN (ResNet 50 and Inception V3) |
A. Biercher et al., 2021 (11) | Companion | Dogs | Thoracolumbar MR images from 500 dogs | CNN |
E. Boissady et al., 2021 (13) | Companion | Cat, dogs | 30 canine and 30 feline thoracic lateral radiographs | CNN |
L. Bonicelli et al., 2021 (14) | Domestic | Pigs | 7,564 pictures | CNN |
V. Kittichai et al., 2021 (36) | Domestic | Poultry | 12,761 single cell images | CNN (Darknet, Darknet19, Darknet19-448 and Densenet201) |
Y. Nagamori et al., 2021 (51) | Companion | Cat, dogs | 460 samples for 4 parasites (80-200 per parasite) | You only look once (YOLOv3) model |
J. Park et al., 2021 (54) | Companion | Dogs | 90 dogs | HA, DLBAS, and the readjustment of the predicted data obtained via the DLBAS of the clinical test sets (HA_DLBAS) |
I. R. Porter et al., 2021 (58) | Domestic | Cattle | A total of 398 digital images from dairy cows’ udders | CNN (GoogLeNet) |
M. Salvi et al., 2021 (64) | Companion | Dogs | 416 canine cutaneous round cell tumors (RCT) (117 cases) | AlexNet, Inceptionv3, ResNet, Emsemble |
S. Shahinfar et al., 2021 (68) | Domestic | Cattle | 2,535 lameness scores (2,248 sound and 287 unsound) | Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP), to predict cases of lameness using milk production and conformation traits logistc (LR) |
Y. Ye et al., 2021 (74) | Companion | Dogs | 220 images | CNN (ResNet-50) |
M. Zhang et al., 2021 (79) | Companion | Dogs | 2,670 lateral X-ray images | CNN (HRNet) |
A.N. ELKhamary et al., 2022 (21) | Domestic | Horse | 16 horse 32 limbs (16 normal tendons and 16 abnormal tendons) | C4.5 algorithm (Quinlan), a decision tree classifier of Weka software package |
E. A. Bauer and W. Jagusiak, 2022 (9) | Domestic | Cattle | 168 cows | ANN |
K. Benfodil et al., 2022 (10) | Domestic | Dromedaries | 115 dromedaries | ANN |
L. Dumortier et al., 2022 (19) | Companion | Cat | 500 annotated Thoracic radiograph images(348 veterinary visit 296 cats) | CNN (ResNet50V2) |
P. Figueirinhas et al., 2022 (23) | Companion | Dogs | 15 working dogs (pilot study) | LSTM |
Y. Kokkinos et al., 2022 (37) | Companion | Dogs | 57,402 dogs | RNN |
A. Mao et al., 2022 (41) | Domestic | Chicken | 5,336 voice calls (3,363 distress calls and 1,973 natural barn sound) | CNN (light-VGG11) |
A. May et al., 2022 (43) | Domestic | Horse | 2,607 images | CNN |
T. R. Müller et al., 2022 (45) | Companion | Dogs | 62 canine (41 case 21 control) 4,000 images (2,000 case 2,000 control) | CNN (VGG16) |
C. Parra et al., 2022 (55) | Etc. | Reptile | 3,616 images data samples and 26 videos (4,849 frames) | CNN (MobileNet) |
T. Rai et al., 2022 (60) | Companion | Dogs | 32 patients | CNN (DenseNet-161) |
V. A. Teixeira et al., 2022 (70) | Domestic | Cattle | 55 Holstein calves | RNN |
M. ZareBidaki et al., 2022 (77) | Domestic | Goat, sheep cows | 200 paired sample (100 blood, 100 milk) 100 animals | ANN |
Table 2 . Validation methodologies and model performance of the included studies in the review.
Author and year | CV | Prospective | Multi-center approach | Model performance | Purpose | |||
---|---|---|---|---|---|---|---|---|
Training set | Test set | Index | Value | |||||
G. Theodoropoulos et al., 2000 (71) | Yes | No | No | No | Sensitivity | 42.4-80.7% | Diagnostics | |
W. B. Roush et al., 2001 (63) | Yes | No | No | No | Sensitivity | 0-100% | Prediction | |
H. Schobesberger and C. Peham, 2002 (66) | Yes | No | No | No | Agreement | 78.60% | Diagnostics | |
K. G. Keegan et al., 2003 (32) | Yes | No | No | No | Agreement | 85% | Diagnostics | |
M. E. Pastell and M. Kujalaf, 2007 (56) | Yes | No | No | No | Agreement and sensitivity | Agreement = 96.2% Sensitivity = 100% | Diagnostics | |
S. M. Ghotoorlar et al., 2012 (25) | Yes | No | No | No | Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Pearson correlation coefficient | Sensitivity = 0.5-1Specificity = 0.91-1PPV = 0.76-1NPV = 0.92 -1Pearson correlation coefficient = 0.94 | Diagnostics | |
T. Banzato et al., 2018 (4) | Yes | No | Yes | No | Agreement, Matthews correlation coefficient (MCC) | Agreement = 90-94% MCC = 0.8-0.88 | Diagnostics | |
T. Banzato et al., 2018 (5) | Yes | No | No | No | AUC, sensitivity, specificity | AUC = 0.91 Sensitivity = 100% Specificity = 82.8% | Diagnostics | |
T. Banzato et al., 2018 (6) | Yes | No | Yes | No | Agreement, multi-class Matthew’s correlation coefficient (MCMCC) | Agreement = 65.2-82.2% MCMCC = 0.44-0.68 | Diagnostics | |
A. Yakubu et al., 2018 (73) | Yes | No | No | No | r, R2, RMSE | r = 0.983 R2 = 0.966 RMSE = 0.04806 | Prediction | |
Y. Yoon et al., 2018 (75) | Yes | No | No | No | Accuracy, sensitivity | Accuracy(CNN; 92.9-96.9% and BOF; 79.6-96.9%) and sensitivity (CNN; 92.1-100% and BOF; 74.1-94.8%) | Prediction | |
R. Bradley et al., 2019 (15) | Yes | No | No | No | Sensitivity, specificity | (1 year before) sensitivity 63.0%; (2 year before) sensitivity 44.2% specificity remaining around 99% | Prediction | |
M. Ebrahimi et al., 2019 (20) | Yes | No | No | No | AUC | 0.826 | Prediction | |
J. Y. Kim et al., 2019 (35) | Yes | No | Yes | No | Sensitivity | 79.4-100% | Diagnostics | |
M. Aubreville et al., 2020 (3) | Yes | No | No | No | Correlation coefficient | 0.868-0.979 | Diagnostics | |
V. Biourge et al., 2020 (12) | Yes | Yes | No | Yes | Accuracy, sensitivity, specificity, PPV, NPV | Accuracy = 88% Sensitivity = 87% Specificity = 70% PPV = 53% NPV = 92% | Prediction | |
L. E. Broughton-Neiswanger et al., 2020 (16) | Yes | No | No | No | Sensitivity, specificity, AUC | AUC = 0.87-1Sensitivity = 0-100%Specificity = 50-100% | Diagnostics | |
S. Burti et al., 2020 (17) | Yes | No | No | No | AUC | 0.904-0.973 | Diagnostics | |
E. Fernández-Carrión et al., 2020 (22) | Yes | No | No | No | Agreement | 95.4-97.2% | Diagnostics | |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Yes | No | No | No | Precision, recall, F-measure, Accuracy | (need for surgery)Precision = 69.5-74.1%Recall = 72.4-99.3%F-measure = 72.2-81.8%Accuracy = 69.0-76.0%(survival)Precision = 87.5-97.4%Recall = 80.5-87.8%F-measure = 87.2-89.1%Accuracy = 83.9-85.2% | Prediction | |
X. Kang et al., 2020 (30) | Yes | No | No | No | Sensitivity, specificity | Sensitivity = 0.83-1Specificity = 0.95-1 | Diagnostics | |
N. Kil et al., 2020 (33) | Yes | No | No | No | Sensitivity, accuracy | Sensitivity = 0.79-0.94Accuracy = 0.82-0.94 | Diagnostics | |
S. Li et al., 2020 (39) | Yes | No | No | No | Accuracy, sensitivity, and specificity | Accuracy = 82.71% Sensitivity = 68.42% Specificity = 87.09% | Diagnostics | |
C. Marzahl et al., 2020 (42) | Yes | No | No | No | Precision | 0.64-0.66 | Diagnostics | |
S. Mouloodi et al., 2020 (47) | Yes | No | No | No | Determination coefficient (R2) | 0.9116-0.9599 | Prediction | |
S. Mouloodi et al., 2020 (48) | Yes | No | No | No | Determination coefficient (R2) | 0.9999 | Prediction | |
Y. Nagamori et al., 2020 (50) | Yes | No | No | No | Pearson correlation coefficient, sensitivity, specificity | Pearson correlation coefficient = 0.89-0.99Sensitivity = 0.758-1Specificity = 0.918-1 | Diagnostics | |
C. Post et al., 2020 (59) | Yes | No | No | No | AUC | 0.71-0.79 | Diagnostics | |
A. R. Trachtman et al., 2020 (72) | Yes | No | No | No | Accuracy, sensitivity, specificity | Accuracy = 62-96%Sensitivity = 84-100%Specificity = 92-96% | Diagnostics | |
T. Banzato et al., 2021 (7) | Yes | No | No | No | AUC | 0.8 | Diagnostics | |
T. Banzato et al., 2021 (8) | Yes | No | Yes | No | AUC | 0.58-0.97 | Diagnostics | |
A. Biercher et al., 2021 (11) | Yes | No | Yes | Yes | Sensitivity, specificity | IVDE sens 73.46-90.1/spec 67.6-99.0IVDP sens 67.86-100/spec 74.9-96.4FCE/ANNPE sens 62.2-90.1/spec 90.1-97.9Syringomyelia sens 0-10/spec 100Neoplasma sens 0-37.5/spec 60-94.7 | Diagnostics | |
E. Boissady et al., 2021 (13) | NA | No | No | No | ICC | 0.998-0.999 | Diagnostics | |
L. Bonicelli et al., 2021 (14) | Yes | No | Yes | Yes | Sensitivity, specificity, Pearson correlation coefficient | Sensitivity = 81.25-100 % Specificity = 99.38 % Pearson correlation coefficient = 0.96 | Diagnostics | |
V. Kittichai et al., 2021 (36) | Yes | No | NA | NA | Accuracy | 99% | Dignostics | |
Y. Nagamori et al., 2021 (51) | NA | NA | YES | NA | Sensitivity, specificity | Sensitivity = 75.8-100% Specificity = 93.1-100% | Dignostics | |
J. Park et al., 2021 (54) | Yes | No | No | No | Dice similarity coefficient (DSC) and the Hausdorff distance (HD) | DSC 0.78-0.94 HD 2.30-4.30 mm | Dignostics | |
I. R. Porter et al., 2021 (58) | Yes | No | Yes | Yes | AUC | 0.542-0.920 | Dignostics | |
M. Salvi et al., 2021 (64) | Yes | No | Yes | Yes | Accuracy | 91.66%-100% | Dignostics | |
S. Shahinfar et al., 2021 (68) | Yes | No | Yes | Yes | AUC, F1 | AUC = 0.61-0.67 F1 = 0.01-0.27 | Dignostics | |
Y. Ye et al., 2021 (74) | Yes | No | NA | NA | AUC, accuracy, F1 score | AUC = 99.37Accuracy = 97.62 F1 score = 96.7 | Dignostics | |
M. Zhang et al., 2021 (79) | Yes | No | Yes | Yes | Sensitivity | 86.40% | Dignostics | |
A.N. ELKhamary et al., 2022 (21) | Yes | No | No | No | Accuracy, PPV, sensitivity, kappa | Accuracy = 93.7% PPV = 93.80% Sensitivity = 93.80% Kappa = 0.88 | Dignostics | |
E. A. Bauer and W. Jagusiak, 2022 (9) | Yes | No | YES | YES | AUC | 0.82-0.89 | Dignostics | |
K. Benfodil et al., 2022 (10) | Yes | No | NA | NA | Pearson correlation coefficient | 0.943 | Dignostics | |
L. Dumortier et al., 2022 (19) | Yes | No | No | No | Accuracy, F1-Score, Specificity, Positive Predictive Value and Sensitivity | Accuracy = 82% F1-Score = 85% Specificity = 75% PPV = 81% Sensitivity = 88% | Dignostics | |
P. Figueirinhas et al., 2022 (23) | Yes | No | No | No | Accuracy | Accuracy = 60% | Dignostics | |
Y. Kokkinos et al., 2022 (37) | Yes | No | No | No | Sensitivity, PPV, NPV | Sensitivity = 44.8-68.8% PPV = 15-23% NPV > 99% | Prediction | |
A. Mao et al., 2022 (41) | Yes | No | Yes | Yes | Precision, recall, F1-score and accuracy | Precision = 94.58% Recall = 94.89% F1-score = 94.73% Accuracy = 95.07% | Dignostics | |
A. May et al., 2022 (43) | Yes | No | No | No | Accuracy, cross entropy | Accuracy = 96.66% Cross entropy = 0.02 | Dignostics | |
T. R. Müller et al., 2022 (45) | Yes | No | No | No | Accuracy, sens, spec, PPV, NPV | Accuracy 88.7% Sensitivity 90.2% Specificity 81.8% PPV 92.5% NPV 81.8% | Dignostics | |
C. Parra et al., 2022 (55) | Yes | NA | NA | NA | Accuracy, AUC | Accuracy = 94.26 AUC = 0.996 | Dignostics | |
T. Rai et al., 2022 (60) | Yes | No | No | No | F1-score | 0.708 | Dignostics | |
V. A. Teixeira et al., 2022 (70) | Yes | No | No | No | Accuracy, sensitivity, and specificity, PPV, NPV | Accuracy = 85-98, Sensitivity = 87-96 Specificity = 78-100 PPV = 85-100 NPV = 88-96 | Prediction & diagnosis | |
M. ZareBidaki et al., 2022 (77) | Yes | No | NA | NA | Sensitivity, specificity, AUC | Sensitivity = 81% Specificity = 62% AUC = 0.799 | Dignostics |
The temporal trends in ML/DL-related publications are illustrated in Fig. 2. Although most of these studies were published after 2000, a rapid growth in their quantity began in 2018. Before this surge, the applications of ML/DL were concentrated in industrial animals; however, their applications in companion animals have been expanding since 2018. Only a few studies on other animal species (wildlife and exotic animals) have been published, even after 2020.
Fig. 3 shows the proportion of the specific purposes of each study, such as target animal species and domains of application (whether ML/DL was used for predictive or diagnostic purposes). While the number of studies for both industrial and non-industrial animals was similar (31 and 30 for non-industrial and industrial animals, respectively, including duplicates), the number of diagnostic studies was higher than that of prediction studies (the number of diagnostic and prediction studies were 45 and 11, respectively, including duplicates). In terms of specific animal species, studies on dogs were generally dominant among studies on non-industrial animals (70.3% of diagnostic studies and 50.0% of prediction studies), while studies on cows (39.1% of diagnostic studies and 28.6% of prediction studies) and horses (26.1% of diagnostic studies and 42.9% of prediction studies) were dominant among studies on industrial animals.
Table 3 shows details regarding the identified studies, including the sample size used for model development and validation, the algorithm used, whether the authors employed prospective data collection for validation, whether they used multi-center data for model development and validation, and model performance. In terms of validation, almost every publication stated that they implemented cross-validation (splitting data into training and test sets to avoid over-evaluation), although there was an insufficiency in the relevant descriptions in some of the studies (n = 2). However, a minority of the studies employed a multi-center approach for model development (n = 13) and validation (n = 9), and only one study prospectively collected the test datasets. The majority of the identified studies used neural network-based algorithms, such as RNN and convolutional neural network, and most of the studies targeted binary problems rather than continuous outcomes. Although the numbers of data that used for model development are relatively small for several studies (16,22,33), the reported model performance of most studies tended to be within an acceptable range (e.g., Area Under the Receiver Operating Characteristic Curve (AUC) value >0.9).
Table 3 . Profile of included studies.
Target animal type | Study purposes | N* | NN† | CV‡ | Pros§ | Multi∥ |
---|---|---|---|---|---|---|
Industrial animals | Diagnostics | 21 | 19 | 21 | 0 | 5 |
Prediction | 7 | 7 | 7 | 0 | 0 | |
Companion animals | Diagnostics | 22 | 21 | 20 | 0 | 2 |
Prediction | 4 | 4 | 4 | 1 | 1 | |
Others¶ | Diagnostics | 2 | 2 | 2 | 0 | 0 |
*Number of studies..
†Number of studies that used neural network-based algorithm..
‡Number of studies that conducted cross-validation approach to measure performance..
§Number of studies that employed prospective approach for collecting dataset for testing..
∥Number of studies that used multi-center data for validation..
¶The others group includes wildlife and exotic animals..
Note: The numbers include duplication. For example, a study for industrial animals have both purpose, diagnostics and prediction. There is no prediction studies for the other animals..
A scoping review was conducted as a situation analysis to identify the current gaps in ML/DL application research in veterinary clinics and suggest directions for further improvement in this field. The review found that the history of ML/DL applications in veterinary medicine is relatively short compared to that in human medicine and the healthcare sector (31). Possibly due to its short history, quantitative scarcity and methodological gaps were identified, especially regarding the validation and data collection framework, although the reported model performance was generally within acceptable levels.
The first gap that must be highlighted is quantitative scarcity. Although there is a possibility that the current review will exclude published papers, it seems clear that the relevant papers are fewer than those in the human medical field (2,52,67,69). Specifically, prediction studies were scarce, possibly because of their technical difficulties. They usually include extrapolation because the prediction target is future data. Considering that extrapolation is more sensitive to overfitting and a lack of variables, the performance of the model tends to be lower than that of the models for interpolation (57). However, prediction studies are practically useful because they can be employed for optimal treatment recommendations and prognostic assessment, which are the most frequent practices in veterinary clinics. Purification has also been observed in studies on wildlife. Lack of data may explain this discrepancy. Compared with medicine for companion and industrial animals, wildlife medicine covers more animal species with less resources. Therefore, the quantity of data for each species is usually lower than that for other medical areas, even though large amounts of data regarding specific species and medical problems are required for ML/DL applications.
Qualitative gaps in model validation should be emphasized. Considering that ML/DL approaches cannot inherently employ physiological or pathological mechanisms, an innate limitation of this data-driven approach is overfitting and induction. The issues can be practically addressed by demonstrating acceptable performance in an independent dataset, which is called cross-validation. Most of the studies identified in this review employed this approach. However, the current review found that only a few of them have obtained appropriate test sets. As the selection of the test set is essential for its validation, the representativeness of the test set must be ensured (27). Therefore, prospective data collection from multiple centers is the best way to ensure this representativeness (34,78). Veterinary clinicians should be aware of the qualitative gaps in current ML/DL application studies to avoid possible misuse of these models in clinical practice.
From the veterinary clinicians’ point of view, excellent model performance alone is not sufficient to recommend its practical use. For instance, even if some ML/DL models show very high AUC, representing great performance in diagnostics, the operation of the model could require a significant amount of manpower, time, or cost, making its usage unaffordable, especially for single-veterinarian clinics. In this regard, successful future studies need to consider the practical applicability as well (29).
Despite these gaps, there are prominent opportunities to improve research on ML/DL applications in veterinary medicine. First, privacy issues are relatively minor, when compared with human medicine. In it, data merging between hospitals and clinics is challenging owing to these issues. Therefore, the major approach in human medicine is the common data model which standardizes the data structure of each institution, facilitating meta-analysis (1,76) rather than merged big data analysis. In contrast, multi-clinic data can be merged without privacy issues in veterinary sectors, and the veterinary compass (44) and Small Animal Veterinary Surveillance Network (28,65) showed these opportunities. Furthermore, the cost of data collection in veterinary medicine, especially for continuous data, may be lower than that in human medicine. Recently, the collection of continuous data and extraction of significant signals using wearable devices (18) has become a leading research topic. In these research areas, veterinary medicine has more opportunities than in human medicine, because employing animal subjects costs less than employing human participants; additionally, compliance in applying the device could be higher in animal subjects than in human participants.
Improving the application of ML/DL in veterinary clinics necessitates the fulfillment of two essential conditions. First, the establishment of a standardized encoding system is crucial. To achieve reliable prediction performance, high-quality big data is indispensable. Considering that the medical big data should be collected by multiple institutions, a unified coding system for diseases diagnosis and prescription is essential to successfully amalgamate data from various sources. However, currently, medical records predominantly rely on free text-based descriptions which is challenging to be standardized. Although automatic encoding systems that translate free text to medical codes have been developed (78), no system is customized currently. Secondly, fostering sustainable motivation among veterinarians for accurate recording is important. The absence of a national insurance system for animal medicine has led to a lack of incentives for veterinarians to ensure precise encoding. Addressing this challenge entails appropriately valuating medical records provided by veterinary clinicians. Currently, the value of such data is not accurately evaluated, and most data utilized in ML/DL models have been acquired without enough compensation to veterinarians. Offering proper remuneration for their data contributions could incentivize them to maintain accurate recording practices (Fig. 4).
This study has some limitations. First, the reviews were conducted by a single researcher. Because the standard review process generally requires at least two researchers to increase the sensitivity and specificity of the screening process, several studies, that should have been included, could have been excluded. Second, this study included only original papers and other types of publications were excluded. Because studies on state-of-the-art methodologies can be published as conference abstracts, several studies may not have been reviewed in this study. Although this preliminary review study successfully revealed current gaps especially for validation methodologies, further studies are highly recommended to address the limitation, confirm the gaps and support the suggestions in this study. The follow-up studies should employ standard review process with at least two independent researchers and include grey articles that report up-to-date technologies.
In this review, I examined studies that covered the application of ML/DL in veterinary clinics. This revealed several gaps in the methodology and validation, that could help future studies improve their quality and allow readers to better screen appropriate veterinary studies. In the era of artificial intelligence, the expanding demand for their application in veterinary clinics is unavoidable. Furthermore, demand-driven active research using proper methodologies can fundamentally improve clinical services. In this regard, researchers should keep practical feasibility in mind when tackling methodology and model performance; moreover, veterinary clinicians should adopt a receptive and critical stance towards these new changes.
This work was supported by a funding for the academic research program of Chungbuk National University in 2022. In addition, this work was carried out with the support of “Cooperative Research Program for Agriculture Science and Technology Development (Project No. RS-2023-00232301).“ Rural Development Administration, Republic of Korea.
The author has no conflicting interests.
Table 1 General information regarding included studies in the review
Author and year | Animal type | Target animals | Sample size | Algorithm |
---|---|---|---|---|
G. Theodoropoulos et al., 2000 (71) | Domestic | Sheep | 255 images of 57 individual larvae (5genera) | ANN (artificial neural network; feature selection by manual, 16 features were measured) |
W. B. Roush et al., 2001 (63) | Domestic | Chicken | Case 6-40, normal 33-91 | BP3(back propagation neural network), WardBP (Ward back propagation neural network), PNN (Probabilistic neural network), GRNN (general regression neural network) |
H. Schobesberger and C. Peham, 2002 (66) | Domestic | Horse | 175 (42 control/ 133 low to medium grade lame) | ANN (feature selection by manual) |
K. G. Keegan et al., 2003 (32) | Domestic | Horse | 12 adult horse | ANN (feature selection by manual) |
M. E. Pastell and M. Kujalaf, 2007 (56) | Domestic | Dairy cow | 73 cows (training 37 cows, 5,074 observation, validation 36 cows, 4,868 measurements) | Probabilistic Neural Network Model (feature selection by manual) |
S. M. Ghotoorlar et al., 2012 (25) | Domestic | Dairy cow | 105 dairy cows | ANN (feature selection by manual) |
T. Banzato et al., 2018 (4) | Companion | Canine | 80 (56 meningioma, 24 glioma) | Convolutional neural networks (CNN), GoogleNet |
T. Banzato et al., 2018 (5) | Companion | Canine | 48 (32 case, 16 control) | Deep neural networks (DNN), especially AlexNet |
T. Banzato et al., 2018 (6) | Companion | Canine | 56 (grade 1 = 26, grade 2 = 22, grade 3 = 8) | AlexNet, DNN |
A. Yakubu et al., 2018 (73) | Domestic | Chicken | 167 | ANN |
Y. Yoon et al., 2018 (75) | Companion | Dogs | 3,142 for cardiomegaly (1,571 normal and 1,571 abnormal from 1,143 dogs), 2,086 for lung pattern (1,043 normal and 1,043 abnormal from 1,247 dogs), 892 for mediastinal shift (446 normal and 446 abnormal from 387 dogs), 940 for pleural effusion (470 normal and 470 abnormal from 284 dogs), and 78 for pneumothorax (39 normal and 39 abnormal from 61 dogs) | Bag-of-features (BOF) and CNN |
R. Bradley et al., 2019 (15) | Companion | Cat | 106,251 cats | Recurrent Neural Network (RNN) |
M. Ebrahimi et al., 2019 (20) | Domestic | Cow | 297,004 milking samples each with eight milking features | ANN, Naïve Bayes, GLM, Decision tree, Random forest, Gradient boosted tree |
J. Y. Kim et al., 2019 (35) | Companion | Dogs | 1,040 images | CNN (GoogLe net, Resnet, and VGGnet) |
M. Aubreville et al., 2020 (3) | Companion | Dogs | 32 whole slide images | CNN, RetinaNet, ResNet-18, Unet |
V. Biourge et al., 2020 (12) | Companion | Cats | 218 | ANN |
L. E. Broughton-Neiswanger et al., 2020 (16) | Companion | Cats | 12 | Partial least squares discriminant analysis, Random forest |
S. Burti et al., 2020 (17) | Companion | Dogs | 1,465 images | CNN |
E. Fernández-Carrión et al., 2020 (22) | Etc. | Wild boar | 8 | CNN |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Domestic | Horse | 285 horses | Bayes Network, Naïve Bayes, DNN, Random forest |
X. Kang et al., 2020 (30) | Domestic | Cow | 100 cows | RFB_NET_SSD deep learning network |
N. Kil et al., 2020 (33) | Domestic | Horse | 34 horses (65 video) | CNN |
S. Li et al., 2020 (39) | Companion | Dogs | 792 radiographs | CNN |
C. Marzahl et al., 2020 (42) | Domestic | Horse | 17 completely annotated cytology whole slide images (WSI) containing 78,047 hemosiderophages | CNN (RetinaNet) |
S. Mouloodi et al., 2020 (47) | Domestic | Horse | 3 third metacarpal bones from 3 racehorses | ANN |
S. Mouloodi et al., 2020 (48) | Domestic | Horse | 9 equine third metacarpal bones from 9 thoroughbred horses | ANN |
Y. Nagamori et al., 2020 (50) | Companion | Cat, dogs | 100 | CNN |
C. Post et al., 2020 (59) | Domestic | Cow | 167 cows | Logistic Regression (LR), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gaussian Naïve Bayes (GNB), Extra Trees Classifier (ET), Random forest |
A. R. Trachtman et al., 2020 (72) | Domestic | Pigs | 5,902 images | CNN |
T. Banzato et al., 2021 (7) | Companion | Dogs | 3,839 latero-lateral radiographs | CNN (ResNet-50, DenseNet-121) |
T. Banzato et al., 2021 (8) | Companion | Cat | 1,062 latero-lateral radiographs | CNN (ResNet 50 and Inception V3) |
A. Biercher et al., 2021 (11) | Companion | Dogs | Thoracolumbar MR images from 500 dogs | CNN |
E. Boissady et al., 2021 (13) | Companion | Cat, dogs | 30 canine and 30 feline thoracic lateral radiographs | CNN |
L. Bonicelli et al., 2021 (14) | Domestic | Pigs | 7,564 pictures | CNN |
V. Kittichai et al., 2021 (36) | Domestic | Poultry | 12,761 single cell images | CNN (Darknet, Darknet19, Darknet19-448 and Densenet201) |
Y. Nagamori et al., 2021 (51) | Companion | Cat, dogs | 460 samples for 4 parasites (80-200 per parasite) | You only look once (YOLOv3) model |
J. Park et al., 2021 (54) | Companion | Dogs | 90 dogs | HA, DLBAS, and the readjustment of the predicted data obtained via the DLBAS of the clinical test sets (HA_DLBAS) |
I. R. Porter et al., 2021 (58) | Domestic | Cattle | A total of 398 digital images from dairy cows’ udders | CNN (GoogLeNet) |
M. Salvi et al., 2021 (64) | Companion | Dogs | 416 canine cutaneous round cell tumors (RCT) (117 cases) | AlexNet, Inceptionv3, ResNet, Emsemble |
S. Shahinfar et al., 2021 (68) | Domestic | Cattle | 2,535 lameness scores (2,248 sound and 287 unsound) | Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP), to predict cases of lameness using milk production and conformation traits logistc (LR) |
Y. Ye et al., 2021 (74) | Companion | Dogs | 220 images | CNN (ResNet-50) |
M. Zhang et al., 2021 (79) | Companion | Dogs | 2,670 lateral X-ray images | CNN (HRNet) |
A.N. ELKhamary et al., 2022 (21) | Domestic | Horse | 16 horse 32 limbs (16 normal tendons and 16 abnormal tendons) | C4.5 algorithm (Quinlan), a decision tree classifier of Weka software package |
E. A. Bauer and W. Jagusiak, 2022 (9) | Domestic | Cattle | 168 cows | ANN |
K. Benfodil et al., 2022 (10) | Domestic | Dromedaries | 115 dromedaries | ANN |
L. Dumortier et al., 2022 (19) | Companion | Cat | 500 annotated Thoracic radiograph images(348 veterinary visit 296 cats) | CNN (ResNet50V2) |
P. Figueirinhas et al., 2022 (23) | Companion | Dogs | 15 working dogs (pilot study) | LSTM |
Y. Kokkinos et al., 2022 (37) | Companion | Dogs | 57,402 dogs | RNN |
A. Mao et al., 2022 (41) | Domestic | Chicken | 5,336 voice calls (3,363 distress calls and 1,973 natural barn sound) | CNN (light-VGG11) |
A. May et al., 2022 (43) | Domestic | Horse | 2,607 images | CNN |
T. R. Müller et al., 2022 (45) | Companion | Dogs | 62 canine (41 case 21 control) 4,000 images (2,000 case 2,000 control) | CNN (VGG16) |
C. Parra et al., 2022 (55) | Etc. | Reptile | 3,616 images data samples and 26 videos (4,849 frames) | CNN (MobileNet) |
T. Rai et al., 2022 (60) | Companion | Dogs | 32 patients | CNN (DenseNet-161) |
V. A. Teixeira et al., 2022 (70) | Domestic | Cattle | 55 Holstein calves | RNN |
M. ZareBidaki et al., 2022 (77) | Domestic | Goat, sheep cows | 200 paired sample (100 blood, 100 milk) 100 animals | ANN |
Table 2 Validation methodologies and model performance of the included studies in the review
Author and year | CV | Prospective | Multi-center approach | Model performance | Purpose | |||
---|---|---|---|---|---|---|---|---|
Training set | Test set | Index | Value | |||||
G. Theodoropoulos et al., 2000 (71) | Yes | No | No | No | Sensitivity | 42.4-80.7% | Diagnostics | |
W. B. Roush et al., 2001 (63) | Yes | No | No | No | Sensitivity | 0-100% | Prediction | |
H. Schobesberger and C. Peham, 2002 (66) | Yes | No | No | No | Agreement | 78.60% | Diagnostics | |
K. G. Keegan et al., 2003 (32) | Yes | No | No | No | Agreement | 85% | Diagnostics | |
M. E. Pastell and M. Kujalaf, 2007 (56) | Yes | No | No | No | Agreement and sensitivity | Agreement = 96.2% Sensitivity = 100% | Diagnostics | |
S. M. Ghotoorlar et al., 2012 (25) | Yes | No | No | No | Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Pearson correlation coefficient | Sensitivity = 0.5-1Specificity = 0.91-1PPV = 0.76-1NPV = 0.92 -1Pearson correlation coefficient = 0.94 | Diagnostics | |
T. Banzato et al., 2018 (4) | Yes | No | Yes | No | Agreement, Matthews correlation coefficient (MCC) | Agreement = 90-94% MCC = 0.8-0.88 | Diagnostics | |
T. Banzato et al., 2018 (5) | Yes | No | No | No | AUC, sensitivity, specificity | AUC = 0.91 Sensitivity = 100% Specificity = 82.8% | Diagnostics | |
T. Banzato et al., 2018 (6) | Yes | No | Yes | No | Agreement, multi-class Matthew’s correlation coefficient (MCMCC) | Agreement = 65.2-82.2% MCMCC = 0.44-0.68 | Diagnostics | |
A. Yakubu et al., 2018 (73) | Yes | No | No | No | r, R2, RMSE | r = 0.983 R2 = 0.966 RMSE = 0.04806 | Prediction | |
Y. Yoon et al., 2018 (75) | Yes | No | No | No | Accuracy, sensitivity | Accuracy(CNN; 92.9-96.9% and BOF; 79.6-96.9%) and sensitivity (CNN; 92.1-100% and BOF; 74.1-94.8%) | Prediction | |
R. Bradley et al., 2019 (15) | Yes | No | No | No | Sensitivity, specificity | (1 year before) sensitivity 63.0%; (2 year before) sensitivity 44.2% specificity remaining around 99% | Prediction | |
M. Ebrahimi et al., 2019 (20) | Yes | No | No | No | AUC | 0.826 | Prediction | |
J. Y. Kim et al., 2019 (35) | Yes | No | Yes | No | Sensitivity | 79.4-100% | Diagnostics | |
M. Aubreville et al., 2020 (3) | Yes | No | No | No | Correlation coefficient | 0.868-0.979 | Diagnostics | |
V. Biourge et al., 2020 (12) | Yes | Yes | No | Yes | Accuracy, sensitivity, specificity, PPV, NPV | Accuracy = 88% Sensitivity = 87% Specificity = 70% PPV = 53% NPV = 92% | Prediction | |
L. E. Broughton-Neiswanger et al., 2020 (16) | Yes | No | No | No | Sensitivity, specificity, AUC | AUC = 0.87-1Sensitivity = 0-100%Specificity = 50-100% | Diagnostics | |
S. Burti et al., 2020 (17) | Yes | No | No | No | AUC | 0.904-0.973 | Diagnostics | |
E. Fernández-Carrión et al., 2020 (22) | Yes | No | No | No | Agreement | 95.4-97.2% | Diagnostics | |
M. A. Fraiwan and S. M. Abutarbush, 2020 (24) | Yes | No | No | No | Precision, recall, F-measure, Accuracy | (need for surgery)Precision = 69.5-74.1%Recall = 72.4-99.3%F-measure = 72.2-81.8%Accuracy = 69.0-76.0%(survival)Precision = 87.5-97.4%Recall = 80.5-87.8%F-measure = 87.2-89.1%Accuracy = 83.9-85.2% | Prediction | |
X. Kang et al., 2020 (30) | Yes | No | No | No | Sensitivity, specificity | Sensitivity = 0.83-1Specificity = 0.95-1 | Diagnostics | |
N. Kil et al., 2020 (33) | Yes | No | No | No | Sensitivity, accuracy | Sensitivity = 0.79-0.94Accuracy = 0.82-0.94 | Diagnostics | |
S. Li et al., 2020 (39) | Yes | No | No | No | Accuracy, sensitivity, and specificity | Accuracy = 82.71% Sensitivity = 68.42% Specificity = 87.09% | Diagnostics | |
C. Marzahl et al., 2020 (42) | Yes | No | No | No | Precision | 0.64-0.66 | Diagnostics | |
S. Mouloodi et al., 2020 (47) | Yes | No | No | No | Determination coefficient (R2) | 0.9116-0.9599 | Prediction | |
S. Mouloodi et al., 2020 (48) | Yes | No | No | No | Determination coefficient (R2) | 0.9999 | Prediction | |
Y. Nagamori et al., 2020 (50) | Yes | No | No | No | Pearson correlation coefficient, sensitivity, specificity | Pearson correlation coefficient = 0.89-0.99Sensitivity = 0.758-1Specificity = 0.918-1 | Diagnostics | |
C. Post et al., 2020 (59) | Yes | No | No | No | AUC | 0.71-0.79 | Diagnostics | |
A. R. Trachtman et al., 2020 (72) | Yes | No | No | No | Accuracy, sensitivity, specificity | Accuracy = 62-96%Sensitivity = 84-100%Specificity = 92-96% | Diagnostics | |
T. Banzato et al., 2021 (7) | Yes | No | No | No | AUC | 0.8 | Diagnostics | |
T. Banzato et al., 2021 (8) | Yes | No | Yes | No | AUC | 0.58-0.97 | Diagnostics | |
A. Biercher et al., 2021 (11) | Yes | No | Yes | Yes | Sensitivity, specificity | IVDE sens 73.46-90.1/spec 67.6-99.0IVDP sens 67.86-100/spec 74.9-96.4FCE/ANNPE sens 62.2-90.1/spec 90.1-97.9Syringomyelia sens 0-10/spec 100Neoplasma sens 0-37.5/spec 60-94.7 | Diagnostics | |
E. Boissady et al., 2021 (13) | NA | No | No | No | ICC | 0.998-0.999 | Diagnostics | |
L. Bonicelli et al., 2021 (14) | Yes | No | Yes | Yes | Sensitivity, specificity, Pearson correlation coefficient | Sensitivity = 81.25-100 % Specificity = 99.38 % Pearson correlation coefficient = 0.96 | Diagnostics | |
V. Kittichai et al., 2021 (36) | Yes | No | NA | NA | Accuracy | 99% | Dignostics | |
Y. Nagamori et al., 2021 (51) | NA | NA | YES | NA | Sensitivity, specificity | Sensitivity = 75.8-100% Specificity = 93.1-100% | Dignostics | |
J. Park et al., 2021 (54) | Yes | No | No | No | Dice similarity coefficient (DSC) and the Hausdorff distance (HD) | DSC 0.78-0.94 HD 2.30-4.30 mm | Dignostics | |
I. R. Porter et al., 2021 (58) | Yes | No | Yes | Yes | AUC | 0.542-0.920 | Dignostics | |
M. Salvi et al., 2021 (64) | Yes | No | Yes | Yes | Accuracy | 91.66%-100% | Dignostics | |
S. Shahinfar et al., 2021 (68) | Yes | No | Yes | Yes | AUC, F1 | AUC = 0.61-0.67 F1 = 0.01-0.27 | Dignostics | |
Y. Ye et al., 2021 (74) | Yes | No | NA | NA | AUC, accuracy, F1 score | AUC = 99.37Accuracy = 97.62 F1 score = 96.7 | Dignostics | |
M. Zhang et al., 2021 (79) | Yes | No | Yes | Yes | Sensitivity | 86.40% | Dignostics | |
A.N. ELKhamary et al., 2022 (21) | Yes | No | No | No | Accuracy, PPV, sensitivity, kappa | Accuracy = 93.7% PPV = 93.80% Sensitivity = 93.80% Kappa = 0.88 | Dignostics | |
E. A. Bauer and W. Jagusiak, 2022 (9) | Yes | No | YES | YES | AUC | 0.82-0.89 | Dignostics | |
K. Benfodil et al., 2022 (10) | Yes | No | NA | NA | Pearson correlation coefficient | 0.943 | Dignostics | |
L. Dumortier et al., 2022 (19) | Yes | No | No | No | Accuracy, F1-Score, Specificity, Positive Predictive Value and Sensitivity | Accuracy = 82% F1-Score = 85% Specificity = 75% PPV = 81% Sensitivity = 88% | Dignostics | |
P. Figueirinhas et al., 2022 (23) | Yes | No | No | No | Accuracy | Accuracy = 60% | Dignostics | |
Y. Kokkinos et al., 2022 (37) | Yes | No | No | No | Sensitivity, PPV, NPV | Sensitivity = 44.8-68.8% PPV = 15-23% NPV > 99% | Prediction | |
A. Mao et al., 2022 (41) | Yes | No | Yes | Yes | Precision, recall, F1-score and accuracy | Precision = 94.58% Recall = 94.89% F1-score = 94.73% Accuracy = 95.07% | Dignostics | |
A. May et al., 2022 (43) | Yes | No | No | No | Accuracy, cross entropy | Accuracy = 96.66% Cross entropy = 0.02 | Dignostics | |
T. R. Müller et al., 2022 (45) | Yes | No | No | No | Accuracy, sens, spec, PPV, NPV | Accuracy 88.7% Sensitivity 90.2% Specificity 81.8% PPV 92.5% NPV 81.8% | Dignostics | |
C. Parra et al., 2022 (55) | Yes | NA | NA | NA | Accuracy, AUC | Accuracy = 94.26 AUC = 0.996 | Dignostics | |
T. Rai et al., 2022 (60) | Yes | No | No | No | F1-score | 0.708 | Dignostics | |
V. A. Teixeira et al., 2022 (70) | Yes | No | No | No | Accuracy, sensitivity, and specificity, PPV, NPV | Accuracy = 85-98, Sensitivity = 87-96 Specificity = 78-100 PPV = 85-100 NPV = 88-96 | Prediction & diagnosis | |
M. ZareBidaki et al., 2022 (77) | Yes | No | NA | NA | Sensitivity, specificity, AUC | Sensitivity = 81% Specificity = 62% AUC = 0.799 | Dignostics |
Table 3 Profile of included studies
Target animal type | Study purposes | N* | NN† | CV‡ | Pros§ | Multi∥ |
---|---|---|---|---|---|---|
Industrial animals | Diagnostics | 21 | 19 | 21 | 0 | 5 |
Prediction | 7 | 7 | 7 | 0 | 0 | |
Companion animals | Diagnostics | 22 | 21 | 20 | 0 | 2 |
Prediction | 4 | 4 | 4 | 1 | 1 | |
Others¶ | Diagnostics | 2 | 2 | 2 | 0 | 0 |
*Number of studies.
†Number of studies that used neural network-based algorithm.
‡Number of studies that conducted cross-validation approach to measure performance.
§Number of studies that employed prospective approach for collecting dataset for testing.
∥Number of studies that used multi-center data for validation.
¶The others group includes wildlife and exotic animals.
Note: The numbers include duplication. For example, a study for industrial animals have both purpose, diagnostics and prediction. There is no prediction studies for the other animals.