Abstract
Computer vision, or image recognition, analyses and interprets visual data in real-world scenarios like
images and videos. AI and ML research focusses on object, scene, action, and feature identification because of its
usefulness in image processing. Neural networks and deep learning have improved image recognition systems
significantly in recent years.
Early image recognition used template matching to identify objects. A photo is compared to a stored template using
similarity measures like correlation to get the best match. There are several constraints, especially with distorted,
scaled, or noisy images. Template matching is computationally expensive. CNNs are the most common deep learning
architecture for computer vision. CNNs automatically learn visual input hierarchies. They include pooling,
convolutional, and fully connected layers. Regional patterns like shapes, textures, and edges in the input image are
filtered by convolutional layers. Pooling layers reduce feature map spatial dimensions, making feature extraction
consistent and robust. Thick layers, or fully linked layers, classify or regress using learning features. CNN training
requires a big labelled dataset. This research article introduces a deep learning model for the classification and
recognition of images. The input data for this model is the Images Data Set. The GenNet Algorithm is employed to
derive critical features. The accuracy of classification results is enhanced through the use of preprocessing and feature
extraction. The classification model is generated using Convolution Neural Network, AlexNet, ResNet, and VGG