Enhanced Image Captioning Using CNN and Transformers with Attention Mechanism

International Journal of Engineering Innovations and Management Strategies 1 (1):1-12 (2024)
  Copy   BIBTEX

Abstract

Image captioning has seen remarkable advancements with the integration of deep learning techniques, notably Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, for generating descriptive captions for images. Despite these improvements, capturing intricate details and context remains a challenge. This project introduces an enhanced image captioning model that integrates transformers with an attention mechanism to address these limitations. By leveraging CNNs for feature extraction and LSTMs for sequence generation, while utilizing transformers to apply sophisticated attention to significant image regions, the proposed model aims to generate more contextually rich and coherent captions. Experimental results indicate that incorporating transformers with attention mechanisms leads to a significant enhancement in caption accuracy and descriptiveness, surpassing traditional CNN-LSTM models. This advancement is particularly beneficial in various applications, including assistive technologies for the visually impaired, content-based image retrieval systems, automatic image annotation for digital asset management, and improved human-computer interaction. This approach represents a substantial step forward in achieving more precise and detailed image captioning, with potential impacts across numerous fields.

Other Versions

No versions found

Links

PhilArchive

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Deep Learning Based Video Captioning through Encoder-Decoder Based Long Short-Term Memory (LSTM).Grimsby Chelsea - forthcoming - International Journal of Advanced Computer Science and Applications:1-6.
Automated Plant Disease Detection through Deep Learning for Enhanced Agricultural Productivity.M. Sheik Dawood - 2024 - Journal of Science Technology and Research (JSTAR) 5 (1):640-650.
Comparing LSTM, GRU, and CNN Approaches in Air Quality Prediction Models.A. Manoj Prabharan - 2024 - Journal of Science Technology and Research (JSTAR) 5 (1):576-585.
Deep Learning - Driven Data Leakage Detection for Secure Cloud Computing.Yoheswari S. - 2024 - International Journal of Engineering Innovations and Management Strategies 5 (1):1-4.
Deep Learning - Driven Data Leakage Detection for Secure Cloud Computing.Yoheswari S. - 2025 - International Journal of Engineering Innovations and Management Strategies 1 (1):1-4.

Analytics

Added to PP
2025-01-09

Downloads
40 (#555,704)

6 months
40 (#108,330)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

RAINFALL DETECTION USING DEEP LEARNING TECHNIQUE.M. Arul Selvan & S. Miruna Joe Amali - 2024 - Journal of Science Technology and Research 5 (1):37-42.
Innovative Approaches in Cardiovascular Disease Prediction Through Machine Learning Optimization.M. Arul Selvan - 2024 - Journal of Science Technology and Research (JSTAR) 5 (1):350-359.
INDUSTRY-SPECIFIC INTELLIGENT FIRE MANAGEMENT SYSTEM.M. Arul Selvan - 2023 - Journal of Science Technology and Research (JSTAR) 4 (1):247-259.
CONTAINMENT ZONE ALERTING APPLICATION A PROJECT BASED LEARNING REPORT.M. Arul Selvan - 2023 - Journal of Science Technology and Research (JSTAR) 4 (1):233-246.

View all 6 references / Add more references