Text
                    Performance Comparison of Various
Feature Extraction Methods for Object
Recognition on Caltech-101 Image
Dataset
Check for
updates
Monika, Munish Kumar, and Manish Kumar
Abstract Object recognition system helps to find the label of the object in an image.
The identification of the object depends on the features extracted from the image.
Features play a very important role in the object recognition system. The more rele-
vant features an object has, the better the recognition system will be. Object recog-
nition system mainly works in two major phases—feature extraction and image
classification. Features may be the color, shape, texture, or some other information
of the object. There are various types of feature extraction methods used in object
recognition. These methods are classified as handcrafted feature extraction methods
and deep learning feature extraction methods. This article contains a comprehen-
sive study of various popular feature extraction methods used in object recognition
system. Various handcrafted methods used in the paper are scale invariant feature
transformation (SIFT), speeded-up robust feature (SURF), oriented FAST and rotated
BRIEF (ORB), Shi-Tomasi comer detector, and Haralick texture descriptor. The
deep learning feature extraction methods used in the paper are ResNet50, Xcep-
tion, and VGG19. In this article, a comparative study of various popular feature
extraction methods is also presented for object recognition using five multi-class
classification methods—Gaussian Naive Bayes, k-NN, decision tree, random forest,
and XGBoosting classifier. The analysis of the performance is conducted in terms of
recognition accuracy, precision, Fl-score, area under curve, false positive rate, root
mean square error, and CPU elapsed time. The experimental results are evaluated
on a standard benchmark image dataset Caltech-101 which comprises 8677 images
grouped in 101 classes.
Keywords Object recognition • Feature extraction • Image classification • Deep
learning
Monika
Department of Computer Science, Punjabi University, Patiala, India
M. Kumar (ES)
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University,
Bathinda, Punjab, India
M. Kumar
Department of Computer Science, Baba Farid College, Bathinda, Punjab, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021	289
A. Choudhary et al. (eds.), Applications of Artificial Intelligence and Machine Learning,
Lecture Notes in Electrical Engineering 778,
https://doi.org/10.1007/978-981-16-3067-5_22






Performance Comparison of Various Feature ... 295 Fig. 1 Basic diagram of convolutional neural network (CNN) model millions of images. In the experiment, the authors have reduced the size of the feature vector by 8 dimensions. This section describes the architecture of the models as. 3.2.1 ResNet50 ResNet50 proposed by He et al. [13] is designed in four stages. It considers an image of size 224 x 224 x 3. Initially, it performs convolutional and max-pooling using 7x7 and 3x3 kernel sizes, respectively. Afterward, Stage 1 starts consisting of 3 residual blocks containing 3 layers. Then, Stage 2 starts by performing an operation on the first residual block with stride 2. It reduces the size of input but doubles the channel width. Continuing it, the next 3 residual blocks perform operation. Stage 3 has 1 residual block with stride 2 and 5 normal residual blocks. Stage 4 has 3 residual layers out of which first performs stride 2. Each residual layer contains 3 layers of convolution of 1 x 1, 3 x 3, and 1 x 1 in sequence. Finally, there is an average pooling layer followed by a fully connected layer. 3.2.2 Xception Xception was developed by Chollet [14]. The architecture of the model is based on depthwise separable convolution layers. It consists of three sections—entry flow, middle flow, and exit flow. Entry flow section initially starts with 2 convolution layers, and afterward, contains 3 Residual blocks. At the start of each residual block, there is a convolution layer with stride 2 to reduce the size of input by half. Then, each of three Residual blocks contains 2 separable convolution layers followed by a max-pooling layer. The middle flow section contains one residual block having 3 separable convolution layers. Exit flow section again has 2 residual blocks which starts with a convolution layer with stride 2. First residual block consisting of 2 convolution layers ends with max-pooling layer. Second residual block comprises of 2 convolution layers and ends with global averaging pooling. Finally, a fully connected layer is added to the end of the model. VGG19 VGG is named after Visual Geometry Group at Oxford’s and is proposed by Simonyan et al. [15]. VGG19 contains 5 residual blocks followed by a max-pooling layer. First two residual blocks consist of 2 convolution layers of 3 x 3 kernel size.



Performance Comparison of Various Feature ... 299 library are used for coding. Table 1 depicts the recognition accuracy of each feature extraction method using various classifiers. It shows that pre-trained model VGG19 has achieved maximum accuracy (85.64%) for object recognition using XGBoosting classifier. Table 2 represents the comparative view among various feature extraction techniques using precision rate. Table 3 presents Fl-score of various feature extrac- tion techniques. Comparison of other parameters like AUC, FPR, RMSE, and CPU elapsed time are presented in Tables 4,5,6 and 7, respectively. For all parameters, pre- trained model VGG19 has shown best results as comparison to other feature extrac- tion methods. The experiment exhibits that feature extraction using pre-trained model VGG19 using XGBoosting classifier is performing best among all the classifiers used except for FPR, RMSE, and CPU elapsed time. The experiment shows that in the same case, XGBoosting algorithm takes more time (2.63 min) as compared to others but other parameters like precision (84.76%), Fl-score (84.78%), and AUC (92.73%) Table 1 Feature extraction technique-wise recognition accuracy for object recognition Feature extraction algorithm/classifier Classifier-wise recognition accuracy (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 54.20 53.89 55.17 57.43 64.43 SURF 48.32 49.72 47.84 51.85 59.58 ORB 56.68 57.79 57.96 61.04 72.01 Haralick texture 61.00 17.94 69.26 55.93 80.73 Shi-Tomasi comer Detector 50.17 51.84 52.53 58.62 64.84 ResNet50 50.19 49.97 51.01 54.69 60.25 Xception 54.66 52.46 54.73 56.92 65.27 VGG19 63.45 75.23 74.71 82.06 85.64 Table 2 Feature extraction technique-wise precision for object recognition Feature extraction algorithm/classifier Classifier-wise precision (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 52.23 54.94 54.56 56.41 63.52 SURF 45.32 50.92 46.00 50.22 57.66 ORB 54.70 60.63 57.97 60.54 70.68 Haralick texture 60.20 20.93 70.39 57.58 81.38 Shi-Tomasi comer Detector 47.74 55.60 51.94 57.99 63.63 ResNet50 47.24 51.35 50.02 52.99 59.09 Xception 53.12 52.69 53.83 55.35 64.12 VGG19 61.85 77.31 73.36 83.47 84.76
300 Monika et al. Table 3 Feature extraction technique-wise Fl-score for object recognition Feature extraction algorithm/classifier Classifier-wise Fl-score (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 52.32 52.43 54.35 56.41 63.60 SURF 45.77 47.98 46.26 49.90 57.59 ORB 54.85 57.36 57.03 59.90 70.72 Haralick texture 59.00 17.40 68.82 55.61 80.31 Shi-Tomasi comer Detector 48.05 51.37 51.64 57.42 63.35 ResNet50 47.78 48.87 49.36 52.51 59.16 Xception 52.86 50.73 53.80 55.26 64.19 VGG19 61.58 74.91 73.49 81.51 84.78 Table 4 Feature extraction technique-wise area under curve (AUC) for object recognition Feature extraction algorithm/classifier Classifier-wise AUC (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 52.48 76.65 52.59 52.66 81.96 SURF 50.16 67.28 50.19 50.24 79.53 ORB 53.68 78.63 53.57 54.02 85.82 Haralick texture 80.36 57.08 84.52 77.83 90.30 Shi-Tomasi comer Detector 52.72 75.62 52.64 52.77 82.20 ResNet50 50.14 74.70 50.34 50.36 67.27 Xception 58.55 75.73 58.34 59.04 82.42 VGG19 50.22 87.49 50.33 50.43 92.73 Table 5 Feature extraction technique-wise false positive rate (FPR) for object recognition Feature extraction algorithm/classifier Classifier-wise FPR (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 0.57 0.60 0.56 0.53 0.50 SURF 0.58 0.58 0.61 0.54 0.53 ORB 0.47 0.54 0.46 0.41 0.38 Haralick texture 0.27 0.60 0.21 0.28 0.12 Shi-Tomasi comer Detector 0.53 0.60 0.53 0.45 0.45 ResNet50 0.55 0.57 0.55 0.50 0.49 Xception 0.48 0.51 0.49 0.45 0.42 VGG19 0.50 0.26 0.29 0.17 0.17
Performance Comparison of Various Feature ... 301 Table 6 Feature extraction technique-wise root mean square error (RMSE) for object recognition Feature extraction algorithm/classifier Classifier-wise RMSE (%) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 30.02 32.97 30.88 29.58 28.60 SURF 33.31 33.90 33.47 32.46 31.20 ORB 29.71 32.79 29.24 2763 26.14 Haralick texture 18.67 31.56 17.09 20.36 11.76 Shi-Tomasi comer detector 30.24 34.85 30.93 29.39 28.72 ResNet50 31.53 33.31 31.07 29.98 30.42 Xception 26.85 29.48 27.55 26.86 24.56 VGG19 30.37 20.03 23.00 17.80 16.82 Table 7 Feature extraction technique-wise CPU elapse time for object recognition Feature extraction algorithm/classifier Classifier-wise CPU elapsed time (min) Gaussian Naive Bayes k-NN Decision tree Random forest XGBoosting SIFT 0.00 0.01 0.00 0.17 2.95 SURF 0.00 0.01 0.01 0.31 3.26 ORB 0.00 0.01 0.01 0.26 2.67 Haralick texture 0.00 0.02 0.01 0.25 2.49 Shi-Tomasi comer detector 0.00 0.01 0.01 0.25 2.89 ResNet50 0.00 0.01 0.01 0.25 3.04 Xception 0.00 0.01 0.01 0.22 2.90 VGG19 0.00 0.01 0.00 0.15 2.63 are maximum as compared to others. But the results obtained for FPR and RMSE using VGG19 model are not satisfactory as compared of Haralick texture descriptor method. Haralick texture method achieves FPR as 0.12% and RMSE as 11.76%. The experimental study on various feature extraction methods using various classifiers might help other researchers for the selection of object recognition technique. 6 Conclusion In this article, the authors have presented a comparison among various feature extrac- tion techniques and multi-class classification methods for object recognition. The authors have taken some well-known handcrafted methods, i.e., SIFT, SURF, ORB,