25.2 Analysis Systems Applied to Documents, Document Analysis

Chapter Contents (Back)
Document Analysis. Application, Document Analysis. For a comparison of some of these techniques
See also Evaluation of Binarization Methods for Document Images.
See also Historical Document Analysis, Ancient Documents.

Deutsch, S.,
A Notes on Some Statistics Concerning Typewritten of Printed Material,
IT(3), No. 2, June 1957, pp. 147-149. BibRef 5706

Schurmann, J., Bartneck, N., Bayer, T.A., Franke, J., Mandler, E., and Oberlander, M.,
Document Analysis: From Pixels to Contents,
PIEEE(80), No. 7, July 1992, pp. 1101-1119.
IEEE Top Reference. In Special issue on OCR. BibRef 9207

Bayer, T.A., Franke, J., Kressel, U., Mandler, E., Oberlaender, M., Schuermann, J.,
Towards the Understanding of Printed Documents,
SDIA92(xx-yy). BibRef 9200

Hershey, A.V.[Allen V.],
A Computer System for Scientific Typography,
CGIP(1), No. 4, December 1972, pp. 373-385.
Elsevier DOI BibRef 7212

Johnston, E.G.[Emily G.],
Printed Text Discrimination,
CGIP(3), No. 1, March 1974, pp. 83-89.
Elsevier DOI 0501
BibRef

Gard, R.L.[Robert L.],
Digital Picture Processing Techniques for the Publishing Industry,
CGIP(5), No. 2, June 1976, pp. 151-171.
Elsevier DOI BibRef 7606

Wong, K.Y., Casey, R.G., and Wahl, F.M.,
Document Analysis System,
IBMRD(26), No. 6, November 1982, pp. 647-656. BibRef 8211

Inagaki, K.[Kosaku], Kato, T.[Toshikazu], Hiroshima, T.[Tadashi], Sakai, T.[Toshiyuki],
MACSYM: A Hierarchical Parallel Image Processing System for Event Driven Pattern Understanding of Documents,
PR(17), No. 1, 1984, pp. 85-108.
Elsevier DOI BibRef 8400

Baird, H.S., and Thompson, K.,
Reading Chess,
PAMI(12), No. 6, June 1990, pp. 552-559.
IEEE DOI BibRef 9006
Earlier: CVWS87(277-282). Skew Correction. Text Analysis. Using several basic ideas and techniques, this is a system to read the text of chess matches and get the meaning. 98% of the games are read correctly implying a much higher accuracy at the character/word level. Corrects for the skew of the printing. BibRef

Baird, H.S.[Henry S.], Fortune, S.J.[Steven J.], Jones, S.E.[Susan E.],
Image segmenting apparatus and methods,
US_Patent5,430,808, July 4, 1995.
WWW Link. BibRef 9507
Earlier: A1, A3, A2:
Image Segmentation by Shape-Directed Covers,
ICPR90(I: 820-825).
IEEE DOI Application, Document Analysis. BibRef

Srikantan, G., Srihari, S.N.,
A Study Relating Image Sampling Rate and Image Pattern Recognition,
CVPR94(709-712).
IEEE DOI BibRef 9400

Akiyama, T.[Teruo], Hagita, N.[Norihiro],
Automated entry system for printed documents,
PR(23), No. 11, 1990, pp. 1141-1154.
Elsevier DOI Japanese and English, Headlines, text lines, graphics. BibRef 9000

Masuda, I., Hagita, N., Akiyama, T., Takahashi, T., Naito, S.,
Approach to Smart Document Reader System,
CVPR85(550-557). BibRef 8500

Brandt, J.W., Jain, A.K., Algazi, V.R.,
Medial Axis Representation and Encoding of Scanned Documents,
JVCIR(2), 1991, pp. 151-165. BibRef 9100

Story, G.A., O'Gorman, L., Fox, D., Schaper, L.L., and Jagadish, H.V.,
The RightPages Image-Based Electronic Library for Alerting and Browsing,
Computer(25), No. 9, September 1992, pp. 17-26. BibRef 9209

O'Gorman, L.,
Image and document processing techniques for the RightPages electronic library system,
ICPR92(II:260-263).
IEEE DOI 9208
BibRef

Dengel, A.R.[Andreas R.], Bleisinger, R., Hoch, R., Fein, F.[Frank], Hönes, F.[Frank],
From Paper to Office Document Standard Representation,
Computer(25), No. 7, July 1992, pp. 63-67. BibRef 9207

Dengel, A.R.[Andreas R.],
ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents,
SDIA92(xx-yy). BibRef 9200

Maio, D., and Rizzi, S.,
MAP Learning and Clustering in Autonomous Systems,
PAMI(15), No. 12, December 1993, pp. 1286-1297.
IEEE DOI BibRef 9312

Dengel, A.R., and Barth, G.,
High Level Document Analysis Guided by Geometric Aspects,
PRAI(2), No. 4, December 1988, pp. 641-656. Hierarchical document model, document tree. BibRef 8812

de Silva, G.L., Hull, J.J.,
Proper Noun Detection in Document Images,
PR(27), No. 2, February 1994, pp. 311-320.
Elsevier DOI BibRef 9402

Chen, F.R.[Francine R.], Bloomberg, D.S.[Dan S.],
Summarization of Imaged Documents without OCR,
CVIU(70), No. 3, June 1998, pp. 307-320.
DOI Link BibRef 9806
Earlier:
Extraction of Indicative Summary Sentences from Imaged Documents,
ICDAR97(227-232).
IEEE DOI 9708
BibRef
Earlier: A2, A1:
Document Image Summarization without OCR,
ICIP96(II: 229-232).
IEEE DOI BibRef

Chen, F.R.[Francine R.], Bloomberg, D.S.[Dan S.],
Extraction Of Thematically Relevant Text From Images,
SDAIR96(XX) Xerox Palo Alto Research Center. BibRef 9600

Spitz, A.L.[A. Lawrence], Wilcox, L.D.[Lynn D.],
Method and apparatus for classifying documents,
US_Patent5,414,781, May 9, 1995
WWW Link. BibRef 9505

Ozaki, M.[Masaharu],
Method and apparatus for document element classification by analysis of major white region geometry,
US_Patent5,574,802, Nov 12, 1996
WWW Link. BibRef 9611

McLean, G.F.,
Geometric Correction of Digitized Art,
GMIP(58), No. 2, March 1996, pp. 142-154. BibRef 9603

Yamashita, A., Amano, T., Hirayama, Y., Itoh, N., Katoh, S., Mano, T., and Toyokawa, K.,
A document recognition system and its applications,
IBMRD(40), No. 3, May 1996, pp. 341-352.
WWW Link. BibRef 9605

Maderlechner, G.[Gerd], Suda, P.[Peter], Bruckner, T.,
Classification of Documents by Form and Content,
PRL(18), No. 11-13, November 1997, pp. 1225-1231. 9806
BibRef

Nishida, H.[Hirobumi],
A Note on Practical Uses of Gray-Scale Image Analysis in Document Recognition,
PRL(19), No. 9, 31 July 1998, pp. 889-897. BibRef 9807

Nishida, H.[Hirobumi],
Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures,
GMIP(60), No. 1, January 1998, pp. 35-45. BibRef 9801
Earlier:
Boundary Feature Extraction From Gray-Scale Document Images,
ICDAR97(132-136).
IEEE DOI 9708
BibRef

Chauvet, P., Lopez Krahe, J., Taflin, E., Maltre, H.,
System for an intelligent office document analysis, recognition and description,
SP(32), No. 1-2 1993, pp. 161-190. BibRef 9300

Kundu, S.[Sukhamay],
A better fitness measure of a text-document for a given set of keywords,
PR(33), No. 5, May 2000, pp. 841-848.
Elsevier DOI 0003
BibRef

Kenney, A.R.[Anne R.], and Rieger, O.Y.[Oya Y.], (editors)
Moving Theory into Practice: Digital Imaging for Libraries and Archives,
Mountain View, CA: Research Libraries Group2000. ISBN 0-9700225-0-6. A how-to book for moving to the digital world for documents. (Not for analysis of them.) BibRef 0001

Lee, W.L.[Win-Long], Fan, K.C.[Kuo-Chin],
Document image preprocessing based on optimal Boolean filters,
SP(80), No. 1, January 2000, pp. 45-55. 0005
BibRef

Caere,
Company Information.
WWW Link. Vendor, OCR. OCR, document analysis, etc.

ScanSoft,
Company Information.
WWW Link. OCR, Document analysis, etc. Vendor, OCR.

Wenzel, C.[Claudia], Maus, H.[Heiko],
Leveraging corporate context within knowledge-based document analysis and understanding,
IJDAR(3), No. 4, 2001, pp. 248-260.
Springer DOI 0106
BibRef

Chan, W.[Woei], Coghill, G.[George],
Text analysis using local energy,
PR(34), No. 12, December 2001, pp. 2523-2532.
Elsevier DOI 0110
Text in clutter. BibRef

Chang, F.[Fu],
Retrieving Information from Document Images: Problems and Solutions,
IJDAR(4), No. 1, 2001, pp. 46-55.
Springer DOI 0111
BibRef

Le Cun, Y.L.[Yann L.], Bottou, L.[Leon], Bengio, Y.[Yoshua], Haffner, P.,
Gradient-Based Learning applied to Document Recognition,
PIEEE(86), No. 11, November 1998, pp. 2278-2324.
IEEE Top Reference. BibRef 9811

Aiello, M.[Marco], Monz, C.[Christof], Todoran, L.[Leon], Worring, M.[Marcel],
Document understanding for a broad class of documents,
IJDAR(5), No. 1, 2002, pp. 1-16.
PDF File. 0211
BibRef

Juola, P.[Patrick],
Document categorization and evaluation via cross-entrophy,
US_Patent6,397,205, May 28, 2002
WWW Link. BibRef 0205

Klein, B.[Bertin], Dengel, A.R.[Andreas R.],
Problem-adaptable document analysis and understanding for high-volume applications,
IJDAR(6), No. 3, March 2004, pp. 167-180.
Springer DOI 0406
BibRef
Earlier: A2, A1:
smartFIX: A Requirements-Driven System for Document Analysis and Understanding,
DAS02(433 ff.).
Springer DOI 0303
BibRef

Dengel, A.R.,
Learning of Pattern-Based Rules for Document Classification,
ICDAR07(123-127).
IEEE DOI 0709
BibRef

ReadSoft International,
2007. Document processing, OCR.
WWW Link. Vendor, Document Analysis. Vendor, OCR.

Aseervatham, S.[Sujeevan], Bennani, Y.[Younes],
Semi-structured document categorization with a semantic kernel,
PR(42), No. 9, September 2009, pp. 2067-2076.
Elsevier DOI 0905
Mercer kernel; Support vector machine; Text categorization; Semantic similarity; Semi-structured data BibRef

Aseervatham, S.[Sujeevan], Antoniadis, A.[Anestis], Gaussier, E.[Eric], Burlet, M.[Michel], Denneulin, Y.[Yves],
A sparse version of the ridge logistic regression for large-scale text categorization,
PRL(32), No. 2, 15 January 2011, pp. 101-106.
Elsevier DOI 1101
Logistic regression; Model selection; Text categorization; Large scale categorization BibRef

Sharan, A.[Aditi], Joshi, M.L.[Manju L.],
An algorithm for finding document concepts using semantic similarities from WordNet ontology,
IJCVR(1), No. 2, 2010, pp. 147-157.
DOI Link 1011
BibRef

Ferilli, S.[Stefano],
Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques,
Springer2011, ISBN: 978-0-85729-197-4
WWW Link. 1101

See also Automatic Content-based Indexing of Digital Documents through Intelligent Processing Techniques. BibRef

Bunke, H.[Horst], Riesen, K.[Kaspar],
Recent Advances in Graph-Based Pattern Recognition with Applications in Document Analysis,
PR(44), No. 5, May 2011, pp. 1057-1067.
Elsevier DOI 1101
Graph-based representation; Graph kernel; Graph embedding; Graph classification
See also Recent Advances in Structural Pattern Recognition with Applications to Visual Form Analysis. BibRef

Fischer, A.[Andreas], Riesen, K.[Kaspar], Bunke, H.[Horst],
Graph Similarity Features for HMM-Based Handwriting Recognition in Historical Documents,
FHR10(253-258).
IEEE DOI 1011
BibRef

Tsimboukakis, N., Tambouratzis, G.,
Word-Map Systems for Content-Based Document Classification,
SMC-C(41), No. 5, September 2011, pp. 662-673.
IEEE DOI 1109
BibRef

Medvet, E.[Eric], Bartoli, A.[Alberto], Davanzo, G.[Giorgio],
A probabilistic approach to printed document understanding,
IJDAR(14), No. 4, December 2011, pp. 335-347.
WWW Link. 1112
BibRef

Liu, Q.[Qiong], Liao, C.Y.[Chun-Yuan],
PaperUI,
CBDAR11(83-100).
Springer DOI 1204
Interface concept to use paper as the display and movile devices as the mouse. A long conceptual discussion. BibRef

Chen, F.[Francine], Girgensohn, A.[Andreas], Cooper, M.[Matthew], Lu, Y.J.[Yi-Juan], Filby, G.[Gerry],
Genre identification for office document search and browsing,
IJDAR(15), No. 3, September 2012, pp. 167-182.
WWW Link. 1209
BibRef

de Oliveira Mendes, A.[António], Torrão Fiadeiro, P.[Paulo], Matos Ramos, A.M.[Ana Maria], Lopes de Sousa, S.C.[Sónia Cristina],
Development of an optical system for analysis of the ink-paper interaction,
MVA(24), No. 8, November 2013, pp. 1733-1750.
Springer DOI 1310
BibRef

Gaceb, D.[Djamel], Eglin, V.[Véronique], Lebourgeois, F.[Frank],
Classification of business documents for real-time application,
RealTimeIP(9), No. 2, June 2014, pp. 329-345.
WWW Link. 1407
BibRef

Gaceb, D.[Djamel], Lebourgeois, F.[Frank], Duong, J.,
Adaptative Smart-Binarization Method: For Images of Business Documents,
ICDAR13(118-122)
IEEE DOI 1312
business data processing BibRef

Gaceb, D.[Djamel], Eglin, V.[Véronique], Le Bourgeois, F.[Frank], Emptoz, H.[Hubert],
Graph b-Coloring for Automatic Recognition of Documents,
ICDAR09(261-265).
IEEE DOI 0907
BibRef
Earlier:
Application of graph coloring in physical layout segmentation,
ICPR08(1-4).
IEEE DOI 0812

See also Improvement of postal mail sorting system. BibRef

Kezzoula, Z.[Zakia], Gaceb, D.[Djamel],
A new fast DBSCAN using dual-space analysis and colour integral volume for document image segmentation,
IJCVR(15), No. 3, 2025, pp. 395-416.
DOI Link 2505
BibRef

Liu, D.[Ding], Jiang, M.[Minghu], Yang, X.F.[Xiao-Fang], Li, H.[Hui],
Analyzing documents with Quantum Clustering: A novel pattern recognition algorithm based on quantum mechanics,
PRL(77), No. 1, 2016, pp. 8-13.
Elsevier DOI 1606
Quantum clustering BibRef

Chen, J.[Jin], Lopresti, D.P.[Daniel P.], Nagy, G.[George],
Conservative preprocessing of document images,
IJDAR(19), No. 4, December 2016, pp. 321-333.
Springer DOI 1611
BibRef

Bhushan, S.N.B.[S.N. Bharath], Danti, A.[Ajit],
Classification of text documents based on score level fusion approach,
PRL(94), No. 1, 2017, pp. 118-126.
Elsevier DOI 1708
Text, classification BibRef

Song, L.Y.[Ling-Yun], Liu, J.[Jun], Luo, M.[Minnan], Qian, B.[Buyue], Yang, K.[Kuan],
Sparse Relational Topical Coding on multi-modal data,
PR(72), No. 1, 2017, pp. 368-380.
Elsevier DOI 1708
Multi-modal documents and the links between them. BibRef

Pushpalatha, K., Ananthanarayana, V.S.,
A tree based representation for effective pattern discovery from multimedia documents,
PRL(93), No. 1, 2017, pp. 143-153.
Elsevier DOI 1706
Multimedia document BibRef

Nguyen, K.C.[Kha Cong], Nguyen, C.T.[Cuong Tuan], Nakagawa, M.[Masaki],
Nom document digitalization by deep convolution neural networks,
PRL(133), 2020, pp. 8-16.
Elsevier DOI 2005
BibRef

das Neves Junior, R.B.[Ricardo Batista], Lima, E.[Estanislau], Bezerra, B.L.D.[Byron L.D.], Zanchettin, C.[Cleber], Toselli, A.H.[Alejandro H.],
HU-PageScan: A fully convolutional neural network for document page crop,
IET-IPR(14), No. 15, 15 December 2020, pp. 3890-3898.
DOI Link 2103
BibRef

Zhang, H.[Hao], Chen, B.[Bo], Cong, Y.L.[Yu-Lai], Guo, D.D.[Dan-Dan], Liu, H.W.[Hong-Wei], Zhou, M.Y.[Ming-Yuan],
Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference,
PAMI(43), No. 12, December 2021, pp. 4306-4322.
IEEE DOI 2112
Analytical models, Probabilistic logic, Artificial neural networks, Decoding, Bayes methods, feature extraction BibRef

Appalaraju, S.[Srikar], Jasani, B.[Bhavan], Kota, B.U.[Bhargava Urala], Xie, Y.S.[Yu-Sheng], Manmatha, R.,
DocFormer: End-to-End Transformer for Document Understanding,
ICCV21(973-983)
IEEE DOI 2203
Visualization, Computational modeling, Layout, Transformers, Task analysis, Vision + language BibRef

Mondal, T.[Tanmoy], Das, A.[Abhijit], Ming, Z.[Zuheng],
Exploring multi-tasking learning in document attribute classification,
PRL(157), 2022, pp. 49-59.
Elsevier DOI 2205
Multi-tasks Learning, Multi-instance Learning, Weighted Multi-task Learning, Convolutional Neural Networks, Scanning Resolution Recognition BibRef

Bakkali, S.[Souhail], Ming, Z.[Zuheng], Coustaty, M.[Mickael], Rusiñol, M.[Marçal], Terrades, O.R.[Oriol Ramos],
VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification,
PR(139), 2023, pp. 109419.
Elsevier DOI 2304
Multimodal document representation learning, Document classification, Contrastive learning, Self-Attention, Transformers BibRef

Cao, P.F.[Pan-Feng], Wu, J.[Jian],
GraphRevisedIE: Multimodal information extraction with graph-revised network,
PR(140), 2023, pp. 109542.
Elsevier DOI 2305
Document information extraction, Graph convolutional network, Transformer BibRef

Voerman, J.[Joris], Souleiman-Mahamoud, I.[Ibrahim], Coustaty, M.[Mickael], Joseph, A.[Aurélie], Poulain-d'Andecy, V.[Vincent], Ogier, J.M.[Jean-Marc],
Automatic classification of company's document stream: Comparison of two solutions,
PRL(172), 2023, pp. 181-187.
Elsevier DOI 2309
Document processing, Imbalanced classification, Neural network BibRef

Bi, H.Y.[Heng-Yue], Xu, C.H.[Can-Hui], Shi, C.[Cao], Liu, G.Z.[Guo-Zhu], Li, Y.T.[Yu-Teng], Zhang, H.H.[Hong-Hong], Qu, J.[Jing],
SRRV: A Novel Document Object Detector Based on Spatial-Related Relation and Vision,
MultMed(25), 2023, pp. 3788-3798.
IEEE DOI 2310
BibRef

Zhang, Z.R.[Zhen-Rong], Ma, J.F.[Jie-Feng], Du, J.[Jun], Wang, L.C.[Li-Cheng], Zhang, J.S.[Jian-Shu],
Multimodal Pre-Training Based on Graph Attention Network for Document Understanding,
MultMed(25), 2023, pp. 6743-6755.
IEEE DOI 2311
BibRef

Fu, W.L.[Wen-Long], Xue, B.[Bing], Gao, X.Y.[Xiao-Ying], Zhang, M.J.[Meng-Jie],
Genetic Programming for Document Classification: A Transductive Transfer Learning System,
Cyber(54), No. 2, February 2024, pp. 1119-1132.
IEEE DOI 2402
Transfer learning, Training, Training data, Task analysis, Feature extraction, Support vector machines, Data models, transductive transfer learning BibRef

Liu, T.F.[Teng-Fei], Hu, Y.L.[Yong-Li], Gao, J.B.[Jun-Bin], Sun, Y.F.[Yan-Feng], Yin, B.C.[Bao-Cai],
Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification,
CirSysVideo(34), No. 7, July 2024, pp. 6376-6390.
IEEE DOI 2407
BibRef
And:
Hierarchical Multi-Modal Transformer for Cross-Modal Long Document Classification,
MultMed(27), 2025, pp. 8981-8994.
IEEE DOI 2601
Transformers, Task analysis, Feature extraction, Visualization, Adaptation models, Computational modeling, prompt learning. Visualization, Text analysis, dynamic mask transfer BibRef

Xu, Z.W.[Zhe-Wei], Iwaihara, M.[Mizuho],
Confidence-Driven Contrastive Learning for Document Classification without Annotated Data,
IEICE(E108-D), No. 8, August 2024, pp. 1029-1039.
WWW Link. 2408
BibRef

Abramovich, O.[Ofir], Nayman, N.[Niv], Fogel, S.[Sharon], Lavi, I.[Inbal], Litman, R.[Ron], Tsiper, S.[Shahar], Tichauer, R.[Royee], Appalaraju, S.[Srikar], Mazor, S.[Shai], Manmatha, R.,
Visfocus: Prompt-guided Vision Encoders for Ocr-free Dense Document Understanding,
ECCV24(VIII: 241-259).
Springer DOI 2412
BibRef

Cheng, Y.F.[Yu-Feng], Wang, D.X.[Dong-Xue], Bai, S.[Shuang], Ma, J.K.[Jing-Kai], Liang, C.[Chen], Liu, K.[Kailong], Deng, T.[Tao],
Understanding document images by introducing explicit semantic information and short-range information interaction,
IVC(154), 2025, pp. 105392.
Elsevier DOI 2502
DocVQA, Document semantic segmentation, Explicit semantic information, Information interaction BibRef

Zhang, L.L.[Ling-Ling], Zhong, Y.J.[Yu-Jie], Zheng, Q.H.[Qing-Hua], Liu, J.[Jun], Wang, Q.Y.[Qian-Ying], Wang, J.X.[Jia-Xin], Chang, X.J.[Xiao-Jun],
TDGI: Translation-Guided Double-Graph Inference for Document-Level Relation Extraction,
PAMI(47), No. 4, April 2025, pp. 2647-2659.
IEEE DOI 2503
Cognition, Translation, Semantics, Transformers, Data mining, Context modeling, Vectors, Training, Robustness, relation direction BibRef

Wang, J.W.[Jia-Wei], Hu, K.[Kai], Huo, Q.[Qiang],
UniHDSA: A unified relation prediction approach for hierarchical document structure analysis,
PR(165), 2025, pp. 111617.
Elsevier DOI 2505
Document layout analysis, Relation prediction, Unified label space BibRef

Liu, T.F.[Teng-Fei], Hu, Y.L.[Yong-Li], Li, M.J.[Ming-Jie], Yi, J.F.[Jun-Fei], Chang, X.J.[Xiao-Jun], Gao, J.B.[Jun-Bin], Yin, B.C.[Bao-Cai],
Tackling Real-World Complexity: Hierarchical Modeling and Dynamic Prompting for Multimodal Long Document Classification,
CirSysVideo(35), No. 6, June 2025, pp. 5776-5790.
IEEE DOI 2506
Adaptation models, Transformers, Data models, Correlation, Robustness, Uncertainty, Training, hierarchical heterogeneous graph BibRef

Duan, Y.C.[Yu-Chen], Chen, Z.[Zhe], Hu, Y.[Yusong], Wang, W.Y.[Wei-Yun], Ye, S.L.[Sheng-Long], Shi, B.[Botian], Lu, L.W.[Le-Wei], Hou, Q.[Qibin], Lu, T.[Tong], Li, H.S.[Hong-Sheng], Dai, J.F.[Ji-Feng], Wang, W.H.[Wen-Hai],
Docopilot: Improving Multimodal Models for Document-Level Understanding,
CVPR25(4026-4037)
IEEE DOI Code:
WWW Link. 2508
Training, Adaptation models, Costs, Computational modeling, Large language models, Retrieval augmented generation, long context BibRef

de Rodrigo, I.[Ignacio], Sanchez-Cuadrado, A.[Alberto], Boal, J.[Jaime], Lopez-Lopez, A.J.[Alvaro J.],
The MERIT dataset: Modelling and efficiently rendering interpretable transcripts,
PR(172), 2026, pp. 112502.
Elsevier DOI 2512
Synthetic dataset, Multimodal dataset, Visually-rich document understanding, Vision-language models BibRef

Liu, J.[Jiang], Li, B.[Bobo], Yang, X.R.[Xin-Ran], Yang, N.[Na], Fei, H.[Hao], Zhang, M.Y.[Ming-Yao], Li, F.[Fei], Ji, D.H.[Dong-Hong],
M3D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-Level Information Extraction,
PAMI(48), No. 1, January 2026, pp. 807-823.
IEEE DOI 2512
Videos, Visualization, Information retrieval, Grounding, Data mining, Multilingual, Feature extraction, Brain modeling, Annotations, large language model BibRef

Shen, Y.[Yehu], Wei, J.[Jikun], Niu, X.M.[Xue-Mei], Fu, G.Z.[Gui-Zhong], Cao, Z.[Zihe],
Efficient ultra-lightweight convolutional attention network for embedded identity document recognition system,
IVC(168), 2026, pp. 105930.
Elsevier DOI Code:
WWW Link. 2603
Lightweight architecture, Document recognition, Attention mechanism, Model deployment, Recognition system BibRef

Liu, Y.L.[Yu-Liang], Yang, B.[Biao], Liu, Q.[Qiang], Li, Z.[Zhang], Ma, Z.[Zhiyin], Zhang, S.[Shuo], Bai, X.[Xiang],
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document,
PAMI(48), No. 5, May 2026, pp. 6008-6019.
IEEE DOI 2604
Visualization, Image resolution, Training, Grounding, Layout, Computational modeling, Transformers, Text analysis, OCRBench BibRef

Yang, F.X.[Fu-Xiang], Hou, W.[Wendi], Fan, L.[Lei], Su, T.[Tonghua], He, L.X.[Ling-Xiao], Li, C.Z.[Cheng-Zhou], Wang, M.[Meng], Xie, Q.L.[Qian-Long], Wang, X.X.[Xing-Xing], Di, D.L.[Dong-Lin], Yang, X.[Xun],
Learning priority-aware controllable poster layout generation,
PR(179), 2026, pp. 113497.
Elsevier DOI 2606
Poster layout generation, Optimal transport, Flow matching BibRef

Le, B.M.[Binh M.], Xu, S.Y.[Shao-Yuan], Fu, J.M.[Jin-Miao], Huang, Z.S.[Zhi-Shen], Li, M.[Moyan], Guo, Y.H.[Yan-Hui], Li, H.D.[Hong-Dong], Ramasinghe, S.[Sameera], Wang, B.[Bryan],
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-Free Visual Document Understanding,
MULA25(86-96)
IEEE DOI 2512
Visualization, Adaptation models, Computational modeling, Semantics, Layout, Performance gain, Streaming media, Vectors BibRef

Xiao, H.[Han], Xie, Y.[Yina], Tan, G.X.[Guan-Xin], Chen, Y.H.[Ying-Hao], Hu, R.[Rui], Wang, K.[Ke], Zhou, A.[Aojun], Li, H.[Hao], Shao, H.[Hao], Lu, X.D.[Xu-Dong], Gao, P.[Peng], Wen, Y.F.[Ya-Fei], Chen, X.X.[Xiao-Xin], Ren, S.[Shuai], Li, H.S.[Hong-Sheng],
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding,
CVPR25(29558-29568)
IEEE DOI Code:
WWW Link. 2508
Visualization, Adaptation models, Computational modeling, Pipelines, Layout, Cognition, HTML, Visual perception BibRef