13.6.9 Captioning, Image Captioning

Chapter Contents (Back)
Image Captioning. Captioning. Annotation. Fine-Grained. The most important object or concept in the image. See also Video Retrieval, Video Annotation, Video Categorization, Genre.

Feng, Y.S.[Yan-Song], Lapata, M.,
Automatic Caption Generation for News Images,
PAMI(35), No. 4, April 2013, pp. 797-812.
IEEE DOI 1303
Use existing captions and tags, expand to similar images. BibRef

Nakayama, H.[Hideki], Harada, T.[Tatsuya], Kuniyoshi, Y.[Yasuo],
Dense Sampling Low-Level Statistics of Local Features,
IEICE(E93-D), No. 7, July 2010, pp. 1727-1736.
WWW Link. 1008
BibRef
Earlier: CIVR09(Article No 17).
DOI Link 0907
BibRef
And:
Global Gaussian approach for scene categorization using information geometry,
CVPR10(2336-2343).
IEEE DOI 1006
BibRef
Earlier:
AI Goggles: Real-time Description and Retrieval in the Real World with Online Learning,
CRV09(184-191).
IEEE DOI 0905
local features. Scalability of matching for large-scale indexing. Boost global features with sampled statistics of local features. BibRef

Ushiku, Y.[Yoshitaka], Yamaguchi, M.[Masataka], Mukuta, Y.[Yusuke], Harada, T.[Tatsuya],
Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images,
ICCV15(2668-2676)
IEEE DOI 1602
Feature extraction BibRef

Jin, J.[Jiren], Nakayama, H.[Hideki],
Annotation order matters: Recurrent Image Annotator for arbitrary length image tagging,
ICPR16(2452-2457)
IEEE DOI 1705
Correlation, Feature extraction, Indexes, Predictive models, Recurrent neural networks, Training BibRef

Harada, T.[Tatsuya], Nakayama, H.[Hideki], Kuniyoshi, Y.[Yasuo],
Improving Local Descriptors by Embedding Global and Local Spatial Information,
ECCV10(IV: 736-749).
Springer DOI 1009
BibRef
Earlier: A2, A1, A3:
Evaluation of dimensionality reduction methods for image auto-annotation,
BMVC10(xx-yy).
HTML Version. 1009
BibRef

Verma, Y.[Yashaswi], Jawahar, C.V.,
A support vector approach for cross-modal search of images and texts,
CVIU(154), No. 1, 2017, pp. 48-63.
Elsevier DOI 1612
Image search BibRef

Xue, J.F.[Jian-Fei], Eguchi, K.[Koji],
Video Data Modeling Using Sequential Correspondence Hierarchical Dirichlet Processes,
IEICE(E100-D), No. 1, January 2017, pp. 33-41.
WWW Link. 1701
multimodal data such as the mixture of visual words and speech words extracted from video files BibRef

Tariq, A.[Amara], Foroosh, H.[Hassan],
A Context-Driven Extractive Framework for Generating Realistic Image Descriptions,
IP(26), No. 2, February 2017, pp. 619-632.
IEEE DOI 1702
image annotation BibRef

Vinyals, O.[Oriol], Toshev, A.[Alexander], Bengio, S.[Samy], Erhan, D.[Dumitru],
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge,
PAMI(39), No. 4, April 2017, pp. 652-663.
IEEE DOI 1703
BibRef
Earlier:
Show and tell: A neural image caption generator,
CVPR15(3156-3164)
IEEE DOI 1510
Computational modeling BibRef

Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.,
Video Captioning With Attention-Based LSTM and Semantic Consistency,
MultMed(19), No. 9, September 2017, pp. 2045-2055.
IEEE DOI 1708
Computational modeling, Correlation, Feature extraction, Neural networks, Semantics, Two dimensional displays, Visualization, Attention mechanism, embedding, long short-term memory (LSTM), video, captioning BibRef

Hu, M., Yang, Y., Shen, F., Zhang, L., Shen, H.T., Li, X.,
Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge,
IP(26), No. 10, October 2017, pp. 4871-4884.
IEEE DOI 1708
image annotation, image retrieval, iterative methods, learning (artificial intelligence), multimedia systems, optimisation, pattern classification, RMSL, data structural information, digital technologies, image semantic indexing, image semantic retrieval, robust multiview semi-supervised learning, visual features, Manifolds, Multimedia communication, Semantics, Semisupervised learning, Supervised learning, Image annotation, l2, p-norm, multi-view learning, semi-supervised learning BibRef

Wang, J.Y.[Jing-Ya], Zhu, X.T.[Xia-Tian], Gong, S.G.[Shao-Gang],
Discovering visual concept structure with sparse and incomplete tags,
AI(250), No. 1, 2017, pp. 16-36.
Elsevier DOI 1708
Automatically discovering the semantic structure of tagged visual data (e.g. web videos and images). BibRef

Kilickaya, M.[Mert], Akkus, B.K.[Burak Kerim], Cakici, R.[Ruket], Erdem, A.[Aykut], Erdem, E.[Erkut], Ikizler-Cinbis, N.[Nazli],
Data-driven image captioning via salient region discovery,
IET-CV(11), No. 6, September 2017, pp. 398-406.
DOI Link 1709
BibRef

Fu, K.[Kun], Jin, J.Q.[Jun-Qi], Cui, R.P.[Run-Peng], Sha, F.[Fei], Zhang, C.S.[Chang-Shui],
Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts,
PAMI(39), No. 12, December 2017, pp. 2321-2334.
IEEE DOI 1711
Adaptation models, Computational modeling, Context modeling, Data mining, Feature extraction, Image classification, Visualization, Image captioning, LSTM, visual attention. BibRef

Liu, A.A.[An-An], Xu, N.[Ning], Wong, Y.[Yongkang], Li, J.[Junnan], Su, Y.T.[Yu-Ting], Kankanhalli, M.[Mohan],
Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language,
CVIU(163), No. 1, 2017, pp. 113-125.
Elsevier DOI 1712
Video to text BibRef

Nian, F.D.[Fu-Dong], Li, T.[Teng], Wang, Y.[Yan], Wu, X.Y.[Xin-Yu], Ni, B.B.[Bing-Bing], Xu, C.S.[Chang-Sheng],
Learning explicit video attributes from mid-level representation for video captioning,
CVIU(163), No. 1, 2017, pp. 126-138.
Elsevier DOI 1712
Mid-level video representation BibRef

He, X.D.[Xiao-Dong], Deng, L.[Li],
Deep Learning for Image-to-Text Generation: A Technical Overview,
SPMag(34), No. 6, November 2017, pp. 109-116.
IEEE DOI 1712
BibRef
And: Errata: SPMag(35), No. 1, January 2018, pp. 178.
IEEE DOI Artificial intelligence, Computer vision, Image classification, Natural language processing, Pediatrics, Semantics, Training data, Visualization BibRef

Li, L.H.[Ling-Hui], Tang, S.[Sheng], Zhang, Y.D.[Yong-Dong], Deng, L.X.[Li-Xi], Tian, Q.[Qi],
GLA: Global-Local Attention for Image Description,
MultMed(20), No. 3, March 2018, pp. 726-737.
IEEE DOI 1802
Computational modeling, Decoding, Feature extraction, Image recognition, Natural language processing, recurrent neural network BibRef

Guan, J.N.[Jin-Ning], Wang, E.[Eric],
Repeated review based image captioning for image evidence review,
SP:IC(63), 2018, pp. 141-148.
Elsevier DOI 1804
Repeated review, Image captioning, Encoder-decoder, Multimodal layer BibRef

Lu, X., Wang, B., Zheng, X., Li, X.,
Exploring Models and Data for Remote Sensing Image Caption Generation,
GeoRS(56), No. 4, April 2018, pp. 2183-2195.
IEEE DOI 1804
Computer vision, Feature extraction, Image representation, Recurrent neural networks, Remote sensing, Semantics, semantic understanding BibRef

Cheng, Q.[Qimin], Zhang, Q.[Qian], Fu, P.[Peng], Tu, C.H.[Cong-Huan], Li, S.[Sen],
A survey and analysis on automatic image annotation,
PR(79), 2018, pp. 242-259.
Elsevier DOI 1804
Automatic image annotation, Generative model, Nearest-neighbor model, Discriminative model, Tag-completion, Deep learning BibRef

Ben Rejeb, I.[Imen], Ouni, S.[Sonia], Barhoumi, W.[Walid], Zagrouba, E.[Ezzeddine],
Fuzzy VA-Files for multi-label image annotation based on visual content of regions,
SIViP(12), No. 5, July 2018, pp. 877-884.
Springer DOI 1806
Vector Approximation Files. BibRef

Helmy, T.[Tarek],
A Generic Framework for Semantic Annotation of Images,
IJIG(18), No. 3, July 2018, pp. Article 1850013.
DOI Link 1807
BibRef

Wu, C.L.[Chun-Lei], Wei, Y.[Yiwei], Chu, X.L.[Xiao-Liang], Su, F.[Fei], Wang, L.[Leiquan],
Modeling visual and word-conditional semantic attention for image captioning,
SP:IC(67), 2018, pp. 100-107.
Elsevier DOI 1808
Image captioning, Word-conditional semantic attention, Visual attention, Attention variation BibRef


Gomez-Garay, A.[Alejandro], Raducanu, B.[Bogdan], Salas, J.[Joaquín],
Dense Captioning of Natural Scenes in Spanish,
MCPR18(145-154).
Springer DOI 1807
BibRef

Zhang, Z., Wu, Q., Wang, Y., Chen, F.,
Fine-Grained and Semantic-Guided Visual Attention for Image Captioning,
WACV18(1709-1717)
IEEE DOI 1806
feedforward neural nets, image representation, image resolution, image segmentation, convolutional neural network, Visualization BibRef

Yao, L.[Li], Ballas, N.[Nicolas], Cho, K.[Kyunghyun], Smith, J.[John], Bengio, Y.[Yoshua],
Oracle Performance for Visual Captioning,
BMVC16(xx-yy).
HTML Version. 1805
BibRef

Shin, A.[Andrew], Ushiku, Y.[Yoshitaka], Harada, T.[Tatsuya],
Image Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset,
BMVC16(xx-yy).
HTML Version. 1805
BibRef

Khatchatoorian, A.G., Jamzad, M.,
Post Rectifying Methods to Improve the Accuracy of Image Annotation,
DICTA17(1-7)
IEEE DOI 1804
feature extraction, image annotation, image classification, image retrieval, matrix algebra, Class-tag relation matrix, Time division multiplexing BibRef

Dong, H.[Hao], Zhang, J.Q.[Jing-Qing], McIlwraith, D.[Douglas], Guo, Y.[Yike],
I2T2I: Learning text to image synthesis with textual data augmentation,
ICIP17(2015-2019)
IEEE DOI 1803
Birds, Generators, Image generation, Recurrent neural networks, Shape, Training, Deep learning, GAN, Image Synthesis BibRef

Pellegrin, L.[Luis], Escalante, H.J.[Hugo Jair], Montes-y-Gómez, M.[Manuel], Villegas, M.[Mauricio], González, F.A.[Fabio A.],
A Flexible Framework for the Evaluation of Unsupervised Image Annotation,
CIARP17(508-516).
Springer DOI 1802
BibRef

Jia, Y.H.[Yu-Hua], Bai, L.[Liang], Wang, P.[Peng], Guo, J.L.[Jin-Lin], Xie, Y.X.[Yu-Xiang],
Deep Convolutional Neural Network for Correlating Images and Sentences,
MMMod18(I:154-165).
Springer DOI 1802
BibRef

Liu, J.Y.[Jing-Yu], Wang, L.[Liang], Yang, M.H.[Ming-Hsuan],
Referring Expression Generation and Comprehension via Attributes,
ICCV17(4866-4874)
IEEE DOI 1802
Language Descriptions for objects. learning (artificial intelligence), object detection, RefCOCO, RefCOCO+, RefCOCOg, attribute learning model, common space model, Visualization BibRef

Dai, B., Fidler, S., Urtasun, R., Lin, D.,
Towards Diverse and Natural Image Descriptions via a Conditional GAN,
ICCV17(2989-2998)
IEEE DOI 1802
image retrieval, image sequences, inference mechanisms, learning (artificial intelligence), Visualization BibRef

Niu, Z.X.[Zhen-Xing], Zhou, M.[Mo], Wang, L.[Le], Gao, X.[Xinbo], Hua, G.[Gang],
Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding,
ICCV17(1899-1907)
IEEE DOI 1802
map sentences and images. document image processing, image representation, recurrent neural nets, HM-LSTM, Hierarchical Multimodal LSTM, Recurrent neural networks BibRef

Liang, X., Hu, Z., Zhang, H., Gan, C., Xing, E.P.,
Recurrent Topic-Transition GAN for Visual Paragraph Generation,
ICCV17(3382-3391)
IEEE DOI 1802
document image processing, inference mechanisms, natural scenes, recurrent neural nets, text analysis, RTT-GAN, Visualization BibRef

Shetty, R., Rohrbach, M., Hendricks, L.A., Fritz, M., Schiele, B.,
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training,
ICCV17(4155-4164)
IEEE DOI 1802
image matching, learning (artificial intelligence), sampling methods, vocabulary, adversarial training, Visualization BibRef

Liu, S., Zhu, Z., Ye, N., Guadarrama, S., Murphy, K.,
Improved Image Captioning via Policy Gradient optimization of SPIDEr,
ICCV17(873-881)
IEEE DOI 1802
Maximum likelihood estimation, Measurement, Mixers, Robustness, SPICE, Training BibRef

Gu, J., Wang, G., Cai, J., Chen, T.,
An Empirical Study of Language CNN for Image Captioning,
ICCV17(1231-1240)
IEEE DOI 1802
computer vision, convolution, learning (artificial intelligence), natural language processing, recurrent neural nets, Recurrent neural networks BibRef

Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.,
Areas of Attention for Image Captioning,
ICCV17(1251-1259)
IEEE DOI 1802
image segmentation, inference mechanisms, natural language processing, object detection, Visualization BibRef

Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.,
Scene Graph Generation from Objects, Phrases and Region Captions,
ICCV17(1270-1279)
IEEE DOI 1802
graph theory, image classification, image representation, neural nets, object detection, Visualization BibRef

Zhang, Z., Wu, J.J., Li, Q., Huang, Z., Traer, J., McDermott, J.H., Tenenbaum, J.B., Freeman, W.T.,
Generative Modeling of Audible Shapes for Object Perception,
ICCV17(1260-1269)
IEEE DOI 1802
audio recording, audio signal processing, audio-visual systems, feature extraction, inference mechanisms, interactive systems, Visualization BibRef

Wu, J.J.[Jia-Jun], Lim, J.[Joseph], Zhang, H.Y.[Hong-Yi], Tenenbaum, J.B.[Joshua B.], Freeman, W.T.[William T.],
Physics 101: Learning Physical Object Properties from Unlabeled Videos,
BMVC16(xx-yy).
HTML Version. 1805
BibRef

Tavakoliy, H.R., Shetty, R., Borji, A., Laaksonen, J.,
Paying Attention to Descriptions Generated by Image Captioning Models,
ICCV17(2506-2515)
IEEE DOI 1802
feature extraction, image processing, human descriptions, human-written descriptions, image captioning model, Visualization BibRef

Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.,
Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner,
ICCV17(521-530)
IEEE DOI 1802
image processing, inference mechanisms, text analysis, MSCOCO, adversarial training procedure, captioner act, critic networks, Training data BibRef

Tripathi, A.[Anurag], Gupta, A.[Abhinav], Chaudhary, S.[Santanu], Lall, B.[Brejesh],
Image Annotation Using Latent Components and Transmedia Association,
PReMI17(493-500).
Springer DOI 1711
BibRef

Pini, S.[Stefano], Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Cucchiara, R.[Rita],
Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach,
CIAP17(II:384-395).
Springer DOI 1711
BibRef

Wu, B.Y.[Bao-Yuan], Jia, F.[Fan], Liu, W.[Wei], Ghanem, B.[Bernard],
Diverse Image Annotation,
CVPR17(6194-6202)
IEEE DOI 1711
Correlation, Feature extraction, Measurement, Redundancy, Semantics BibRef

Krause, J.[Jonathan], Johnson, J.[Justin], Krishna, R.[Ranjay], Fei-Fei, L.[Li],
A Hierarchical Approach for Generating Descriptive Image Paragraphs,
CVPR17(3337-3345)
IEEE DOI 1711
Feature extraction, Natural languages, Pragmatics, Recurrent neural networks, Speech, Visualization BibRef

Vedantam, R., Bengio, S., Murphy, K., Parikh, D., Chechik, G.,
Context-Aware Captions from Context-Agnostic Supervision,
CVPR17(1070-1079)
IEEE DOI 1711
Birds, Cats, Cognition, Context modeling, Pragmatics, Training BibRef

Gan, Z., Gan, C., He, X., Pu, Y., Tran, K., Gao, J., Carin, L., Deng, L.,
Semantic Compositional Networks for Visual Captioning,
CVPR17(1141-1150)
IEEE DOI 1711
Feature extraction, Mouth, Pediatrics, Semantics, Tensile stress, Training, Visualization BibRef

Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.,
Deep Reinforcement Learning-Based Image Captioning with Embedding Reward,
CVPR17(1151-1159)
IEEE DOI 1711
Decision making, Learning (artificial intelligence), Measurement, Neural networks, Training, Visualization BibRef

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.,
Self-Critical Sequence Training for Image Captioning,
CVPR17(1179-1195)
IEEE DOI 1711
Inference algorithms, Learning (artificial intelligence), Logic gates, Measurement, Predictive models, Training BibRef

Yang, L., Tang, K., Yang, J., Li, L.J.,
Dense Captioning with Joint Inference and Visual Context,
CVPR17(1978-1987)
IEEE DOI 1711
Bioinformatics, Genomics, Object detection, Proposals, Semantics, Training, Visualization BibRef

Lu, J., Xiong, C., Parikh, D., Socher, R.,
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning,
CVPR17(3242-3250)
IEEE DOI 1711
Adaptation models, Computational modeling, Context modeling, Decoding, Logic gates, Mathematical model, Visualization BibRef

Yao, T., Pan, Y., Li, Y., Mei, T.,
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects,
CVPR17(5263-5271)
IEEE DOI 1711
Decoding, Hidden Markov models, Object recognition, Recurrent neural networks, Standards, Training, Visualization BibRef

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.,
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning,
CVPR17(6298-6306)
IEEE DOI 1711
Detectors, Feature extraction, Image coding, Neural networks, Semantics, Visualization BibRef

Park, C.C., Kim, B., Kim, G.,
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks,
CVPR17(6432-6440)
IEEE DOI 1711
Pattern, recognition BibRef

Sun, Q., Lee, S., Batra, D.,
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning,
CVPR17(7215-7223)
IEEE DOI 1711
Approximation algorithms, Computational modeling, Decoding, History, Inference algorithms, Recurrent, neural, networks BibRef

Wang, Y., Lin, Z., Shen, X., Cohen, S., Cottrell, G.W.,
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition,
CVPR17(7378-7387)
IEEE DOI 1711
Measurement, Recurrent neural networks, SPICE, Semantics, Skeleton, Training BibRef

Zanfir, M.[Mihai], Marinoiu, E.[Elisabeta], Sminchisescu, C.[Cristian],
Spatio-Temporal Attention Models for Grounded Video Captioning,
ACCV16(IV: 104-119).
Springer DOI 1704
BibRef

Chen, T.H.[Tseng-Hung], Zeng, K.H.[Kuo-Hao], Hsu, W.T.[Wan-Ting], Sun, M.[Min],
Video Captioning via Sentence Augmentation and Spatio-Temporal Attention,
Assist16(I: 269-286).
Springer DOI 1704
BibRef

Tan, Y.H.[Ying Hua], Chan, C.S.[Chee Seng],
phi-LSTM: A Phrase-Based Hierarchical LSTM Model for Image Captioning,
ACCV16(V: 101-117).
Springer DOI 1704
BibRef

Weiland, L.[Lydia], Hulpus, I.[Ioana], Ponzetto, S.P.[Simone Paolo], Dietz, L.[Laura],
Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images,
MMMod17(II: 405-418).
Springer DOI 1701
BibRef

Liu, Y.[Yu], Guo, Y.M.[Yan-Ming], Lew, M.S.[Michael S.],
What Convnets Make for Image Captioning?,
MMMod17(I: 416-428).
Springer DOI 1701
BibRef

Tran, K., He, X., Zhang, L., Sun, J.,
Rich Image Captioning in the Wild,
DeepLearn-C16(434-441)
IEEE DOI 1612
BibRef

Wang, Y.L.[Yi-Lin], Wang, S.H.[Su-Hang], Tang, J.L.[Ji-Liang], Liu, H.[Huan], Li, B.X.[Bao-Xin],
PPP: Joint Pointwise and Pairwise Image Label Prediction,
CVPR16(6005-6013)
IEEE DOI 1612
BibRef

Yatskar, M.[Mark], Ordonez, V., Zettlemoyer, L.[Luke], Farhadi, A.[Ali],
Commonly Uncommon: Semantic Sparsity in Situation Recognition,
CVPR17(6335-6344)
IEEE DOI 1711
BibRef
Earlier: A1, A3, A4, Only:
Situation Recognition: Visual Semantic Role Labeling for Image Understanding,
CVPR16(5534-5542)
IEEE DOI 1612
Image recognition, Image representation, Predictive models, Semantics, Tensile stress, Training BibRef

Kottur, S.[Satwik], Vedantam, R.[Ramakrishna], Moura, J.M.F.[José M. F.], Parikh, D.[Devi],
VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes,
CVPR16(4985-4994)
IEEE DOI 1612
BibRef

Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.,
Visual7W: Grounded Question Answering in Images,
CVPR16(4995-5004)
IEEE DOI 1612
BibRef

Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.,
Yin and Yang: Balancing and Answering Binary Visual Questions,
CVPR16(5014-5022)
IEEE DOI 1612
BibRef

Venugopalan, S.[Subhashini], Hendricks, L.A.[Lisa Anne], Rohrbach, M.[Marcus], Mooney, R.[Raymond], Darrell, T.J.[Trevor J.], Saenko, K.[Kate],
Captioning Images with Diverse Objects,
CVPR17(1170-1178)
IEEE DOI 1711
BibRef
Earlier: A2, A1, A3, A4, A6, A5:
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data,
CVPR16(1-10)
IEEE DOI 1612
Data models, Image recognition, Predictive models, Semantics, Training, Visualization. Novel objects not in training data. BibRef

Johnson, J.[Justin], Karpathy, A.[Andrej], Fei-Fei, L.[Li],
DenseCap: Fully Convolutional Localization Networks for Dense Captioning,
CVPR16(4565-4574)
IEEE DOI 1612
Both localize and describe salient regions in images in natural language. BibRef

Wang, M.[Minsi], Song, L.[Li], Yang, X.K.[Xiao-Kang], Luo, C.F.[Chuan-Fei],
A parallel-fusion RNN-LSTM architecture for image caption generation,
ICIP16(4448-4452)
IEEE DOI 1610
Computational modeling deep convolutional networks and recurrent neural networks. BibRef

Lin, X.[Xiao], Parikh, D.[Devi],
Leveraging Visual Question Answering for Image-Caption Ranking,
ECCV16(II: 261-277).
Springer DOI 1611
BibRef
Earlier:
Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks,
CVPR15(2984-2993)
IEEE DOI 1510
BibRef

You, Q.Z.[Quan-Zeng], Jin, H.L.[Hai-Lin], Wang, Z.W.[Zhao-Wen], Fang, C.[Chen], Luo, J.B.[Jie-Bo],
Image Captioning with Semantic Attention,
CVPR16(4651-4659)
IEEE DOI 1612
BibRef

Jia, X.[Xu], Gavves, E.[Efstratios], Fernando, B.[Basura], Tuytelaars, T.[Tinne],
Guiding the Long-Short Term Memory Model for Image Caption Generation,
ICCV15(2407-2415)
IEEE DOI 1602
Computer architecture BibRef

Chen, X.L.[Xin-Lei], Zitnick, C.L.[C. Lawrence],
Mind's eye: A recurrent visual representation for image caption generation,
CVPR15(2422-2431)
IEEE DOI 1510
BibRef

Vedantam, R.[Ramakrishna], Zitnick, C.L.[C. Lawrence], Parikh, D.[Devi],
CIDEr: Consensus-based image description evaluation,
CVPR15(4566-4575)
IEEE DOI 1510
BibRef

Fang, H.[Hao], Gupta, S.[Saurabh], Iandola, F.[Forrest], Srivastava, R.K.[Rupesh K.], Deng, L.[Li], Dollar, P.[Piotr], Gao, J.F.[Jian-Feng], He, X.D.[Xiao-Dong], Mitchell, M.[Margaret], Platt, J.C.[John C.], Zitnick, C.L.[C. Lawrence], Zweig, G.[Geoffrey],
From captions to visual concepts and back,
CVPR15(1473-1482)
IEEE DOI 1510
BibRef

Ramnath, K.[Krishnan], Baker, S.[Simon], Vanderwende, L.[Lucy], El-Saban, M.[Motaz], Sinha, S.N.[Sudipta N.], Kannan, A.[Anitha], Hassan, N.[Noran], Galley, M.[Michel], Yang, Y.[Yi], Ramanan, D.[Deva], Bergamo, A.[Alessandro], Torresani, L.[Lorenzo],
AutoCaption: Automatic caption generation for personal photos,
WACV14(1050-1057)
IEEE DOI 1406
Clouds BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos],
MMSS: Graph-based Multi-modal Story-oriented Video Summarization and Retrieval,
CMU-CS-TR-04-114.
HTML Version. 0501
BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos], Duygulu, P.[Pinar],
GCap: Graph-based Automatic Image Captioning,
MMDE04(146).
IEEE DOI 0406
BibRef

Pan, J.Y.[Jia-Yu],
Advanced Tools for Video and Multimedia Mining,
CMU-CS-06-126, May 2006. BibRef 0605 Ph.D.Thesis,
HTML Version. BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
General References for Matching .


Last update:Aug 16, 2018 at 18:22:30