13.6.9 Captioning, Image Captioning

Chapter Contents (Back)
Image Captioning. Captioning. Fine-Grained. The most important object or concept in the image.

Feng, Y.S.[Yan-Song], Lapata, M.,
Automatic Caption Generation for News Images,
PAMI(35), No. 4, April 2013, pp. 797-812.
IEEE DOI 1303
Use existing captions and tags, expand to similar images. BibRef

Nakayama, H.[Hideki], Harada, T.[Tatsuya], Kuniyoshi, Y.[Yasuo],
Dense Sampling Low-Level Statistics of Local Features,
IEICE(E93-D), No. 7, July 2010, pp. 1727-1736.
WWW Link. 1008
BibRef
Earlier: CIVR09(Article No 17).
DOI Link 0907
BibRef
And:
Global Gaussian approach for scene categorization using information geometry,
CVPR10(2336-2343).
IEEE DOI 1006
BibRef
Earlier:
AI Goggles: Real-time Description and Retrieval in the Real World with Online Learning,
CRV09(184-191).
IEEE DOI 0905
local features. Scalability of matching for large-scale indexing. Boost global features with sampled statistics of local features. BibRef

Ushiku, Y.[Yoshitaka], Yamaguchi, M.[Masataka], Mukuta, Y.[Yusuke], Harada, T.[Tatsuya],
Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images,
ICCV15(2668-2676)
IEEE DOI 1602
Feature extraction BibRef

Jin, J.[Jiren], Nakayama, H.[Hideki],
Annotation order matters: Recurrent Image Annotator for arbitrary length image tagging,
ICPR16(2452-2457)
IEEE DOI 1705
Correlation, Feature extraction, Indexes, Predictive models, Recurrent neural networks, Training BibRef

Harada, T.[Tatsuya], Nakayama, H.[Hideki], Kuniyoshi, Y.[Yasuo],
Improving Local Descriptors by Embedding Global and Local Spatial Information,
ECCV10(IV: 736-749).
Springer DOI 1009
BibRef
Earlier: A2, A1, A3:
Evaluation of dimensionality reduction methods for image auto-annotation,
BMVC10(xx-yy).
HTML Version. 1009
BibRef

Verma, Y.[Yashaswi], Jawahar, C.V.,
A support vector approach for cross-modal search of images and texts,
CVIU(154), No. 1, 2017, pp. 48-63.
Elsevier DOI 1612
Image search BibRef

Xue, J.F.[Jian-Fei], Eguchi, K.[Koji],
Video Data Modeling Using Sequential Correspondence Hierarchical Dirichlet Processes,
IEICE(E100-D), No. 1, January 2017, pp. 33-41.
WWW Link. 1701
multimodal data such as the mixture of visual words and speech words extracted from video files BibRef

Tariq, A.[Amara], Foroosh, H.[Hassan],
A Context-Driven Extractive Framework for Generating Realistic Image Descriptions,
IP(26), No. 2, February 2017, pp. 619-632.
IEEE DOI 1702
image annotation BibRef

Vinyals, O.[Oriol], Toshev, A.[Alexander], Bengio, S.[Samy], Erhan, D.[Dumitru],
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge,
PAMI(39), No. 4, April 2017, pp. 652-663.
IEEE DOI 1703
BibRef
Earlier:
Show and tell: A neural image caption generator,
CVPR15(3156-3164)
IEEE DOI 1510
Computational modeling BibRef

Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.,
Video Captioning With Attention-Based LSTM and Semantic Consistency,
MultMed(19), No. 9, September 2017, pp. 2045-2055.
IEEE DOI 1708
Computational modeling, Correlation, Feature extraction, Neural networks, Semantics, Two dimensional displays, Visualization, Attention mechanism, embedding, long short-term memory (LSTM), video, captioning BibRef

Hu, M., Yang, Y., Shen, F., Zhang, L., Shen, H.T., Li, X.,
Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge,
IP(26), No. 10, October 2017, pp. 4871-4884.
IEEE DOI 1708
image annotation, image retrieval, iterative methods, learning (artificial intelligence), multimedia systems, optimisation, pattern classification, RMSL, data structural information, digital technologies, image semantic indexing, image semantic retrieval, robust multiview semi-supervised learning, visual features, Manifolds, Multimedia communication, Semantics, Semisupervised learning, Supervised learning, Image annotation, l2, p-norm, multi-view learning, semi-supervised learning BibRef

Wang, J.Y.[Jing-Ya], Zhu, X.T.[Xia-Tian], Gong, S.G.[Shao-Gang],
Discovering visual concept structure with sparse and incomplete tags,
AI(250), No. 1, 2017, pp. 16-36.
Elsevier DOI 1708
Automatically discovering the semantic structure of tagged visual data (e.g. web videos and images). BibRef


Zanfir, M.[Mihai], Marinoiu, E.[Elisabeta], Sminchisescu, C.[Cristian],
Spatio-Temporal Attention Models for Grounded Video Captioning,
ACCV16(IV: 104-119).
Springer DOI 1704
BibRef

Chen, T.H.[Tseng-Hung], Zeng, K.H.[Kuo-Hao], Hsu, W.T.[Wan-Ting], Sun, M.[Min],
Video Captioning via Sentence Augmentation and Spatio-Temporal Attention,
Assist16(I: 269-286).
Springer DOI 1704
BibRef

Tan, Y.H.[Ying Hua], Chan, C.S.[Chee Seng],
phi-LSTM: A Phrase-Based Hierarchical LSTM Model for Image Captioning,
ACCV16(V: 101-117).
Springer DOI 1704
BibRef

Weiland, L.[Lydia], Hulpus, I.[Ioana], Ponzetto, S.P.[Simone Paolo], Dietz, L.[Laura],
Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images,
MMMod17(II: 405-418).
Springer DOI 1701
BibRef

Liu, Y.[Yu], Guo, Y.M.[Yan-Ming], Lew, M.S.[Michael S.],
What Convnets Make for Image Captioning?,
MMMod17(I: 416-428).
Springer DOI 1701
BibRef

Tran, K., He, X., Zhang, L., Sun, J.,
Rich Image Captioning in the Wild,
DeepLearn-C16(434-441)
IEEE DOI 1612
BibRef

Wang, Y.L.[Yi-Lin], Wang, S.H.[Su-Hang], Tang, J.L.[Ji-Liang], Liu, H.[Huan], Li, B.X.[Bao-Xin],
PPP: Joint Pointwise and Pairwise Image Label Prediction,
CVPR16(6005-6013)
IEEE DOI 1612
BibRef

Yatskar, M.[Mark], Zettlemoyer, L.[Luke], Farhadi, A.[Ali],
Situation Recognition: Visual Semantic Role Labeling for Image Understanding,
CVPR16(5534-5542)
IEEE DOI 1612
BibRef

Kottur, S.[Satwik], Vedantam, R.[Ramakrishna], Moura, J.M.F.[Josť M. F.], Parikh, D.[Devi],
VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes,
CVPR16(4985-4994)
IEEE DOI 1612
BibRef

Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.,
Visual7W: Grounded Question Answering in Images,
CVPR16(4995-5004)
IEEE DOI 1612
BibRef

Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.,
Yin and Yang: Balancing and Answering Binary Visual Questions,
CVPR16(5014-5022)
IEEE DOI 1612
BibRef

Hendricks, L.A.[Lisa Anne], Venugopalan, S.[Subhashini], Rohrbach, M.[Marcus], Mooney, R.[Raymond], Saenko, K.[Kate], Darrell, T.J.[Trevor J.],
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data,
CVPR16(1-10)
IEEE DOI 1612
Novel objects not in training data. BibRef

Johnson, J.[Justin], Karpathy, A.[Andrej], Fei-Fei, L.[Li],
DenseCap: Fully Convolutional Localization Networks for Dense Captioning,
CVPR16(4565-4574)
IEEE DOI 1612
Both localize and describe salient regions in images in natural language. BibRef

Wang, M.[Minsi], Song, L.[Li], Yang, X.K.[Xiao-Kang], Luo, C.F.[Chuan-Fei],
A parallel-fusion RNN-LSTM architecture for image caption generation,
ICIP16(4448-4452)
IEEE DOI 1610
Computational modeling deep convolutional networks and recurrent neural networks. BibRef

Lin, X.[Xiao], Parikh, D.[Devi],
Leveraging Visual Question Answering for Image-Caption Ranking,
ECCV16(II: 261-277).
Springer DOI 1611
BibRef
Earlier:
Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks,
CVPR15(2984-2993)
IEEE DOI 1510
BibRef

You, Q.Z.[Quan-Zeng], Jin, H.L.[Hai-Lin], Wang, Z.W.[Zhao-Wen], Fang, C.[Chen], Luo, J.B.[Jie-Bo],
Image Captioning with Semantic Attention,
CVPR16(4651-4659)
IEEE DOI 1612
BibRef

Jia, X.[Xu], Gavves, E.[Efstratios], Fernando, B.[Basura], Tuytelaars, T.[Tinne],
Guiding the Long-Short Term Memory Model for Image Caption Generation,
ICCV15(2407-2415)
IEEE DOI 1602
Computer architecture BibRef

Chen, X.L.[Xin-Lei], Zitnick, C.L.[C. Lawrence],
Mind's eye: A recurrent visual representation for image caption generation,
CVPR15(2422-2431)
IEEE DOI 1510
BibRef

Vedantam, R.[Ramakrishna], Zitnick, C.L.[C. Lawrence], Parikh, D.[Devi],
CIDEr: Consensus-based image description evaluation,
CVPR15(4566-4575)
IEEE DOI 1510
BibRef

Fang, H.[Hao], Gupta, S.[Saurabh], Iandola, F.[Forrest], Srivastava, R.K.[Rupesh K.], Deng, L.[Li], Dollar, P.[Piotr], Gao, J.F.[Jian-Feng], He, X.D.[Xiao-Dong], Mitchell, M.[Margaret], Platt, J.C.[John C.], Zitnick, C.L.[C. Lawrence], Zweig, G.[Geoffrey],
From captions to visual concepts and back,
CVPR15(1473-1482)
IEEE DOI 1510
BibRef

Ramnath, K.[Krishnan], Baker, S.[Simon], Vanderwende, L.[Lucy], El-Saban, M.[Motaz], Sinha, S.N.[Sudipta N.], Kannan, A.[Anitha], Hassan, N.[Noran], Galley, M.[Michel], Yang, Y.[Yi], Ramanan, D.[Deva], Bergamo, A.[Alessandro], Torresani, L.[Lorenzo],
AutoCaption: Automatic caption generation for personal photos,
WACV14(1050-1057)
IEEE DOI 1406
Clouds BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos],
MMSS: Graph-based Multi-modal Story-oriented Video Summarization and Retrieval,
CMU-CS-TR-04-114.
HTML Version. 0501
BibRef

Pan, J.Y.[Jia-Yu], Yang, H.J.[Hyung-Jeong], Faloutsos, C.[Christos], Duygulu, P.[Pinar],
GCap: Graph-based Automatic Image Captioning,
MMDE04(146).
IEEE DOI 0406
BibRef

Pan, J.Y.[Jia-Yu],
Advanced Tools for Video and Multimedia Mining,
CMU-CS-06-126, May 2006. BibRef 0605 Ph.D.Thesis,
HTML Version. BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
General References for Matching .


Last update:Sep 18, 2017 at 11:34:11