Image Captioning using Transformer

Rasik Maharjan;
Rizen Bikram Prajapati;
Shreejan Hakuduwal;
Swostika Shrestha;

Please use this identifier to cite or link to this item: https://elibrary.khec.edu.np:8080/handle/123456789/874

Title:	Image Captioning using Transformer
Authors:	Rasik Maharjan; Rizen Bikram Prajapati; Shreejan Hakuduwal; Swostika Shrestha;
Advisor:	Er. Milan Chikanbanjar
Keywords:	Computer Vision;Convolutional Neural Networks Flickr8k Inception
Issue Date:	2024
College Name:	Khwopa Engineering College
Level:	Bachelor's Degree
Degree:	BE Computer
Department Name:	Department of Computer Engineering
Abstract:	Image captioning is a multidisciplinary artificial intelligence (AI) research field that combines computer vision, natural language processing (NLP), and machine learning techniques. It aims to automatically generate textual descriptions for images, bridging the semantic gap between visual content and natural language. It has gained significant attention due to its potential applications in areas such as assistive technologies for visually impaired individuals, content-based image retrieval, and enhancing the accessibility of visual content on the web. Most previous works are based on the RNN-CNN approach, which produces inferior results compared to image captioning using the Transformer model. In this paper, we propose a model for image captioning using CNN and Transformer architecture. The image features are extracted using the convolutional neural network architecture Inception V3. Instead of using traditional recurrent neural network (RNN) as decoder, we present a Transformer architecture. The Transformer decoder leverages self-attention mechanisms for caption generation, enabling it to effectively recognize important objects, their attributes, and the relationships among objects in an image. The model is trained on the Flickr-8K dataset using the Cross-Entropy Loss Function. Our approach aims to generate syntactically and semantically correct sentences that accurately describe the image content.
URI:	https://elibrary.khec.edu.np:8080/handle/123456789/874
Appears in Collections:	PU Computer Report

Files in This Item:

File	Description	Size	Format
Image captioning using Transformer.pdf Restricted Access		13.12 MB	Adobe PDF	View/Open Request a copy

Show full item record