dc.description.abstract | In this new era of technology, companies and developers around the world are talking
about embracing artificial intelligence (AI), machine learning (ML), and deep
learning (DL). Deep learning systems help a computer model to filter the input data
through layers to predict and classify information. Assertive vision also focuses on
artificial Intelligence and Machine learning resources and its other concepts for
identifying the image and object on the basis of their attributes and features and then
will provide caption to them and then the caption text which is generated will be
converted to voice using API’s. Computer vision based assertive devices for the blind
is promising and efficient technology and help the blind people in understanding the
surrounding. The purpose of this model is to generate captions for an image. Image
captioning aims at generating captions of an image automatically using deep learning
techniques. Initially, the objects in the image are detected using a Convolutional
Neural Network (InceptionV3). Using the objects detected, a syntactically and
semantically correct caption for the image is generated using Recurrent Neural
Networks (LSTM) with attention mechanism. Computer vision has become
ubiquitous in our society, with applications in several fields. In this project, we focus
on one of the visual recognition facets of computer vision, i.e. image captioning. The
problem of generating language descriptions for visual data has been studied from a
long time but in the field of videos. In the recent few years emphasis has been lead on
still image description with natural text. Due to the recent advancements in the field
of object detection, the task of scene description in an image has become easier.
Computer vision has become ubiquitous in our society, with applications in several
fields. In this project, we focus on one of the visual recognition facets of computer
vision. | en_US |