24.2.2.2.2 Find Text in Documents

Chapter Contents (Back)
Document Analysis. Generally documents designed for text. General scenes:
See also Text Detection, Find Text in General Scenes, Scene Text.

Fuller, P.[Paul],
Character reader,
US_Patent4,292,621, Sep 29, 1981
WWW Link. BibRef 8109

Beato, L.J.[Louis J.],
Bi-tonal image non-text matter removal with run length and connected component analysis,
US_Patent5,048,096, Sep 10, 1991
WWW Link. BibRef 9109

Amano, T.[Tomio],
Method for detecting character strings,
US_Patent5,033,104, Jul 16, 1991
WWW Link. Text in documents. BibRef 9107

Chen, S., Haralick, R.M., Phillips, I.T.,
Extraction of Text Words in Document Images Based on a Statistical Characterization,
JEI(5), No. 1, January 1996, pp. 25-36. BibRef 9601

Chen, F.R., Bloomberg, D.S., Wilcox, L.D.,
Detection and Location of Multicharacter Sequences in Lines of Imaged Text,
JEI(5), No. 1, January 1996, pp. 37-49. BibRef 9601
And:
Spotting Phrases in Lines of Imaged Text,
SPIE(2422), February 1995, pp. 256-269. BibRef

Suen, H.M., Wang, J.F.,
Text String Extraction from Images of Color-Printed Documents,
VISP(143), No. 4, August 1996, pp. 210-216. 9611
BibRef

Suen, H.M., Wang, J.F.,
Segmentation of Uniform Colored Text from Color Graphics Background,
VISP(144), No. 6, December 1997, pp. 317-322. 9806
BibRef

Aas, K.[Kjersti], Eikvil, L.[Line],
Text Page Recognition Using Grey-Level Features and Hidden Markov-Models,
PR(29), No. 6, June 1996, pp. 977-985.
Elsevier DOI 9606
BibRef

Aas, K.[Kjersti], Eikvil, L.[Line], Andersen, T.[Tove],
Text recognition from grey level images using hidden Markov models,
CAIP95(503-508).
Springer DOI 9509
BibRef

Shinghal, R.[Rajjan],
A Hybrid Algorithm for Contextual Text Recognition,
PR(16), No. 2, 1983, pp. 261-267.
Elsevier DOI 9611
BibRef

Lu, Z.Y.[Zhao-Yang],
Detection of text regions from digital engineering drawings,
PAMI(20), No. 4, April 1998, pp. 431-439.
IEEE DOI 0401
BibRef

Tan, C.L., Ng, P.O.,
Text Extraction Using Pyramid,
PR(31), No. 1, January 1998, pp. 63-72.
Elsevier DOI 9802
BibRef

Hwang, W.L.[Wen L.], Chang, F.[Fu],
Character extraction from documents using wavelet maxima,
IVC(16), No. 5, April 27 1998, pp. 307-315.
Elsevier DOI 0401
BibRef

Strouthopoulos, C., Papamarkos, N.,
Text Identification for Document Image Analysis Using a Neural Network,
IVC(16), No. 12-13, 24 August 1998, pp. 879-896.
Elsevier DOI BibRef 9808

Parodi, P.[Pietro], Fontana, R.[Roberto],
Efficient and flexible text extraction from document pages,
IJDAR(2), No. 2/3, 1999, pp. 67-79. 9912
BibRef

Parodi, P., Piccioli, G.,
An Efficient Preprocessing of Mixed-Content Document Images for OCR Systems,
ICPR96(III: 778-782).
IEEE DOI 9608
(Univ. di Genova, I) BibRef

Parodi, P.[Pietro], Piccioli, G.[Giulia],
A Fast and Flexible Statistical Method for Text Extraction in Document Pages,
CVPR96(619-624).
IEEE DOI BibRef 9600

Liang, J., Phillips, I.T., Haralick, R.M.,
Consistent Partition and Labelling of Text Blocks,
PAA(3), No. 2, 2000, pp. 196-208. 0010
BibRef

Hase, H.[Hiroyuki], Shinokawa, T.[Toshiyuki], Yoneda, M.[Masaaki], Suen, C.Y.[Ching Y.],
Character string extraction from color documents,
PR(34), No. 7, July 2001, pp. 1349-1365.
Elsevier DOI 0105
BibRef

Hase, H., Shinokawa, T., Yoneda, M., Sakai, M., Maruyama, H.,
Character String Extraction by Multi-Stage Relaxation,
ICDAR97(298-302).
IEEE DOI 9708
BibRef

Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.,
Text extraction in complex color documents,
PR(35), No. 8, August 2002, pp. 1743-1758.
Elsevier DOI 0206
BibRef

Xiao, Y.[Yi], Yan, H.[Hong],
Text region extraction in a document image based on the Delaunay tessellation,
PR(36), No. 3, March 2003, pp. 799-809.
Elsevier DOI 0301

See also Location of title and author regions in document images based on the Delaunay triangulation. BibRef

Nishida, H.[Hirobumi], Suzuki, T.[Takeshi],
Correcting Show-Through Effects on Scanned Color Document Images by Multiscale Analysis,
PR(36), No. 12, December 2003, pp. 2835-2847.
Elsevier DOI 0310
BibRef
Earlier:
Correcting Show-Through Effects on Document Images by Multiscale Analysis,
ICPR02(III: 65-68).
IEEE DOI 0211

See also Adaptive Inverse Halftoning for Scanned Document Images Through Multiresolution and Multiscale Analysis. BibRef

Kumar, S.I.[Sun-Il], Gupta, R., Khanna, N.[Nitin], Chaudhury, S.[Santanu], Joshi, S.D.[Shiv Dutt],
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model,
IP(16), No. 8, August 2007, pp. 2117-2128.
IEEE DOI 0709
BibRef
Earlier: A1, A3, A4, A5, Only:
Locating text in images using matched wavelets,
ICDAR05(II: 595-599).
IEEE DOI 0508
BibRef

Mukherjee, D.[Debargha],
Enhancing text-like edges in digital images,
US_Patent7,433,535, Oct 7, 2008
WWW Link. BibRef 0810

Liu, Z.Y.[Zong-Yi], Zhou, H.N.[Han-Ning], Yang, N.[Ning],
Semi-supervised learning for text-line detection,
PRL(31), No. 11, 1 August 2010, pp. 1260-1273.
Elsevier DOI 1008
Document segmentation; Semi-supervised learning; Text-line detection; Language adaptiveness BibRef

Zhao, M.[Ming], Li, S.T.[Shu-Tao], Kwok, J.[James],
Text detection in images using sparse representation with discriminative dictionaries,
IVC(28), No. 12, December 2010, pp. 1590-1599.
Elsevier DOI 1003
Text detection; Sparse representation; Discriminative dictionary BibRef

Marinai, S.[Simone],
Text retrieval from early printed books,
IJDAR(14), No. 2, June 2011, pp. 117-129.
WWW Link. 1106
BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Ramachandrula, S.[Sitaram],
Using a boosted tree classifier for text segmentation in hand-annotated documents,
PRL(33), No. 7, 1 May 2012, pp. 943-950.
Elsevier DOI 1203
Classification; Text separation; Document analysis; Decision tree BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Sitaram, R.[Ramachandrula],
Handwritten Text Separation from Annotated Machine Printed Documents Using Markov Random Fields,
IJDAR(16), No. 1, March 2013, pp. 1-16.
WWW Link. 1303
BibRef
Earlier:
Text Separation from Mixed Documents Using a Tree-Structured Classifier,
ICPR10(241-244).
IEEE DOI 1008
Award, ICPR.
See also Preprocessing of Low-Quality Handwritten Documents Using Markov Random Fields. BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Sitaram, R.[Ramachandrula], Bhuvanagiri, K.[Kiran],
Markov Random Field Based Text Identification from Annotated Machine Printed Documents,
ICDAR09(431-435).
IEEE DOI 0907
BibRef

Pan, Z.T.[Zhao-Tai], Shen, H.F.[Hui-Feng], Lu, Y.[Yan], Li, S.P.[Shi-Peng], Yu, N.H.[Neng-Hai],
A Low-Complexity Screen Compression Scheme for Interactive Screen Sharing,
CirSysVideo(23), No. 6, 2013, pp. 949-960.
IEEE DOI 1307
BibRef
Earlier: A1, A2, A3, A5, A4:
A low-complexity screen compression scheme,
VCIP12(1-6).
IEEE DOI 1302
H.264 intra coding; multiple block modes Text vs. images. BibRef

Singh, B.M.[Brij Mohan], Sharma, R.[Rahul], Ghosh, D.[Debashis], Mittal, A.[Ankush],
Multi-Oriented Text Extraction in Stylistic Documents,
IJIG(15), No. 01, 2015, pp. 1550002.
DOI Link 1503
BibRef

Bhowmik, S.[Showmik], Sarkar, R.[Ram], Nasipuri, M.[Mita], Doermann, D.[David],
Text and non-text separation in offline document images: a survey,
IJDAR(21), No. 1-2, June 2018, pp. 1-20.
Springer DOI 1806
BibRef

Moysset, B.[Bastien], Kermorvant, C.[Christopher], Wolf, C.[Christian],
Learning to detect, localize and recognize many text objects in document images from few examples,
IJDAR(21), No. 3, September 2018, pp. 161-175.
Springer DOI 1810
BibRef

Rajesh, B.[Bulla], Javed, M.[Mohammed], Nagabhushan, P.,
Automatic tracing and extraction of text-line and word segments directly in JPEG compressed document images,
IET-IPR(14), No. 9, 20 July 2020, pp. 1909-1919.
DOI Link 2007
BibRef

Carbonell, M.[Manuel], Fornés, A.[Alicia], Villegas, M.[Mauricio], Lladós, J.[Josep],
A neural model for text localization, transcription and named entity recognition in full pages,
PRL(136), 2020, pp. 219-227.
Elsevier DOI 2008
Document image analysis, Information extraction, Text detection, Handwritten text recognition, Multi-task learning BibRef


Qin, S., Bissaco, A., Raptis, M., Fujii, Y., Xiao, Y.,
Towards Unconstrained End-to-End Text Spotting,
ICCV19(4703-4713)
IEEE DOI 2004
document image processing, feature extraction, image classification, image coding, image segmentation, Training BibRef

Wei, H., Zhang, H., Gao, G.,
Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents,
ICPR18(3616-3621)
IEEE DOI 1812
Visualization, Semantics, Euclidean distance, Histograms, Image representation, Image segmentation, Training, visual word, query-by-example BibRef

Puybareau, É., Géraud, T.,
Real-Time Document Detection in Smartphone Videos,
ICIP18(1498-1502)
IEEE DOI 1809
Image segmentation, Videos, Real-time systems, Transforms, Robustness, Morphology, Detectors, Image processing, Real-time video processing BibRef

Xiong, H.[Huaixin],
Specific Document Sign Location Detection Based on Point Matching and Clustering,
ISVC18(180-190).
Springer DOI 1811
BibRef

Baek, Y., Nam, D., Park, S., Lee, J., Shin, S., Baek, J., Lee, C.Y., Lee, H.,
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks,
WTDDL20(2404-2412)
IEEE DOI 2008
Measurement, Text recognition, Task analysis, Character recognition, Reliability, Optical character recognition software BibRef

Böschen, F.[Falk], Scherp, A.[Ansgar],
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures,
MMMod17(I: 15-27).
Springer DOI 1701
BibRef

Hofmann, S., Gropp, M., Bernecker, D., Pollin, C., Maier, A., Christlein, V.,
Vesselness for text detection in historical document images,
ICIP16(3259-3263)
IEEE DOI 1610
Encoding BibRef

Hedjam, R.[Rachid], Nafchi, H.Z.[Hossein Ziaei], Moghaddam, R.F.[Reza Farrahi], Kalacska, M.[Margaret], Cheriet, M.[Mohamed],
ICDAR 2015 contest on MultiSpectral Text Extraction (MS-TEx 2015),
ICDAR15(1181-1185)
IEEE DOI 1511
Document Image Binarization BibRef

Chiba, N.,
Text Image Classifier Using Image-Wise Annotation,
ACPR13(877-881)
IEEE DOI 1408
document image processing BibRef

Marder, M.[Mattias], Geva, A.B.[Amir B.], Ruan, Y.P.[Yao-Ping],
Lightweight searchable screen video recording,
VCIP12(1-6).
IEEE DOI 1302
Video monitoring of computer screens. BibRef

Zagoris, K.[Konstantinos], Pratikakis, I.[Ioannis], Antonacopoulos, A.[Apostolos], Gatos, B.[Basilis], Papamarkos, N.[Nikos],
Handwritten and Machine Printed Text Separation in Document Images Using the Bag of Visual Words Paradigm,
FHR12(103-108).
IEEE DOI 1302
BibRef

Lin, X.R.[Xiao-Rong], Guo, C.Y.[Chien-Yang], Chang, F.[Fu],
Classifying Textual Components of Bilingual Documents with Decision-Tree Support Vector Machines,
ICDAR11(498-502).
IEEE DOI 1111
BibRef

Wang, X.F.[Xiu-Fei], Huang, L.[Lei], Liu, C.P.[Chang-Ping],
A Novel Method for Embedded Text Segmentation Based on Stroke and Color,
ICDAR11(151-155).
IEEE DOI 1111
BibRef

Fan, J.[Jian],
Text Segmentation of Consumer Magazines in PDF Format,
ICDAR11(794-798).
IEEE DOI 1111
BibRef

Nirmala, S., Nagabhushan, P.,
Foreground Text Extraction in Color Document Images for Enhanced Readability,
PReMI09(387-392).
Springer DOI 0912
BibRef

Zhou, H.N.[Han-Ning], Liu, Z.Y.[Zong-Yi],
Page frame segmentation for contextual advertising in print on demand books,
InterNet09(17-22).
IEEE DOI 0906
BibRef

Strouthopoulos, C.[Charalambos], Nikolaidis, A.[Athanasios],
A robust technique for text extraction in mixed-type binary documents,
ICPR08(1-4).
IEEE DOI 0812
BibRef

Kandan, R., Reddy, N.K.[Nirup Kumar], Arvind, K.R., Ramakrishnan, A.G.,
A Robust Two Level Classification Algorithm for Text Localization in Documents,
ISVC07(II: 96-105).
Springer DOI 0711
BibRef

Ma, Y., Wang, C., Xiao, B., Dai, R.,
Usage-Oriented Performance Evaluation for Text Localization Algorithms,
ICDAR07(1033-1037).
IEEE DOI 0709
BibRef

Ar, I.[Ilktan], Karsligil, M.E.[M. Elif],
Text Area Detection in Digital Documents Images Using Textural Features,
CAIP07(555-562).
Springer DOI 0708
BibRef

Farooq, F.[Faisal], Sridharan, K.[Karthik], Govindaraju, V.[Venu],
Identifying Handwritten Text in Mixed Documents,
ICPR06(II: 1142-1145).
IEEE DOI 0609
BibRef

Nakano, Y., Kashio, K., Yoshida, T.,
HMM-Based Approach for Text Region Detection in Coded Video Bitstreams,
ICIP06(3209-3212).
IEEE DOI 0610
BibRef

Lucas, S.M.,
ICDAR 2005 text locating competition results,
ICDAR05(I: 80-84).
IEEE DOI 0508
BibRef

Coutinho, D.P.[David Pereira], Figueiredo, M.A.T.[Mário A.T.],
Information Theoretic Text Classification Using the Ziv-Merhav Method,
IbPRIA05(II:355).
Springer DOI 0509
BibRef

Zhang, X.[Xian], Zhu, X.Y.[Xiao-Yan],
A New Type of Feature: Loose N-Gram Feature in Text Categorization,
IbPRIA07(I: 378-385).
Springer DOI 0706
BibRef
Earlier:
Extended Bi-gram Features in Text Categorization,
IbPRIA05(II:379).
Springer DOI 0509
BibRef

Gllavata, J.[Julinda], Freisleben, B.[Bernd],
Adaptive Fuzzy Text Segmentation in Images with Complex Backgrounds Using Color and Texture,
CAIP05(756).
Springer DOI 0509
BibRef

Song, Y.J., Kim, K.C., Choi, Y.W., Byun, H.R., Kim, S.H., Chi, S.Y., Jang, D.K., Chung, Y.K.,
Text region extraction and text segmentation on camera-captured document style images,
ICDAR05(I: 172-176).
IEEE DOI 0508
BibRef

Kim, K.C., Byun, H.R., Song, Y.J., Choi, Y.W., Chi, S.Y., Kim, K.K., Chung, Y.K.,
Scene text extraction in natural scene images using hierarchical feature combining and verification,
ICPR04(II: 679-682).
IEEE DOI 0409
BibRef

Byun, H.R.[Hye-Ran], Roh, M.C.[Myung-Cheol], Kim, K.C.[Kil-Cheon], Choi, Y.W.[Yeong-Woo], Lee, S.W.[Seong-Whan],
Scene Text Extraction in Complex Images,
DAS02(329 ff.).
Springer DOI 0303
BibRef

Gllavata, J.[Julinda], Ewerth, R.[Ralph], Stefi, T.[Teuta], Freisleben, B.[Bernd],
Unsupervised Text Segmentation Using Color and Wavelet Features,
CIVR04(216-224).
Springer DOI 0505
BibRef

Gllavata, J., Ewerth, R., Freisleben, B.,
Text detection in images based on unsupervised classification of high-frequency wavelet coefficients,
ICPR04(I: 425-428).
IEEE DOI 0409
BibRef

Sabari Raju, S., Pati, P.B.[Peeta Basa], Ramakrishnan, A.G.,
Gabor filter based block energy analysis for text extraction from digital document images,
DIAL04(233-243).
IEEE DOI 0404
BibRef

Pinto, J.R.C.[João R. Caldas], Pina, P.[Pedro], Bandeira, L.[Lourenço], Pimentel, L.[Luís], Ramalho, M.[Mário],
Underline Removal on Old Documents,
ICIAR04(II: 226-233).
Springer DOI 0409
BibRef

Lu, Y.[Yue], Tan, C.L.,
Constructing area Voronoi diagram in document images,
ICDAR05(I: 342-346).
IEEE DOI 0508
BibRef

Lu, Y.[Yue], Wang, Z.[Zhe], Tan, C.L.[Chew Lim],
Word Grouping in Document Images Based on Voronoi Tessellation,
DAS04(147-157).
Springer DOI 0505
BibRef

Hu, Y.[Yi], Nagao, T.,
Matching of characters in scene images by using local shape feature vectors,
CIAP03(207-212).
IEEE DOI 0310
BibRef

Kim, E.Y.[Eun Yi], Chang, J.S.[Jae Sik], Kim, H.J.[Hang Joon],
Automatic text location using cluster-based template matching,
ICPR02(III: 423-426).
IEEE DOI 0211
BibRef

Bres, S.[Stéphane], Eglin, V.[Véronique], Gagneux, A.,
Unsupervised clustering of text entities in heterogeneous grey level documents,
ICPR02(III: 224-227).
IEEE DOI 0211
BibRef

Sin, B.K.[Bong-Kee], Kim, S.K.[Seon-Kyu], Cho, B.J.[Beom-Joon],
Locating characters in scene images using frequency features,
ICPR02(III: 489-492).
IEEE DOI 0211
BibRef

Kim, S.K.[Seon-Kyu], Sin, B.K.[Bong-Kee], Lee, S.W.[Seong-Whan],
Character spotting using image-based stochastic models,
ICDAR01(60-63).
IEEE DOI 0109
BibRef

Okun, O., Yan, Y.[Yu], Pietikainen, M.,
Robust text detection from binarized document images,
ICPR02(III: 61-64).
IEEE DOI 0211
BibRef

Pietkäinen, M.[Matti], Okun, O.[Oleg],
Edge-based method for text detection from complex document images,
ICDAR01(286-291).
IEEE DOI 0109
BibRef
And:
Text Extraction from Grey Scale Page Images by Simple Edge Detectors,
SCIA01(P-W3B). 0206
BibRef

Chen, X.R.[Xiang-Rong], Zhang, H.J.[Hong-Jiang],
Photo time-stamp detection and recognition,
ICDAR03(319-322).
IEEE DOI 0311
BibRef

Kise, K.[Koichi], Tsujino, M.[Masaaki], Matsumoto, K.[Keinosuke],
Spotting Where to Read on Pages: Retrieval of Relevant Parts from Page Images,
DAS02(388 ff.).
Springer DOI 0303
BibRef

Perroud, T., Sobottka, K., Bunke, H.,
Text extraction from color documents-clustering approaches in three and four dimensions,
ICDAR01(937-941).
IEEE DOI 0109
BibRef

Yuan, Q., Tan, C.L.,
Text extraction from gray scale document images using edge information,
ICDAR01(302-306).
IEEE DOI 0109
BibRef

Rennie, J.D.M.[Jason D. M.], Rifkin, R.[Ryan],
Improving Multiclass Text Classification with the Support Vector Machine,
MIT AIM-2001-026, October 2001.
WWW Link. 0205
BibRef

Rennie, J.D.M.[Jason D.M.],
Improving Multi-class Text Classification with Naive Bayes,
MIT AI-TR-2001-004, September 2001.
WWW Link. 0205
BibRef

Zagoris, K., Papamarkos, N., Chamzas, C.,
Web Document Image Retrieval System Based on Word Spotting,
ICIP06(477-480).
IEEE DOI 0610
BibRef

Strouthopoulos, C., Papamarkos, N., Atsalakis, A., Chamzas, C.,
Locating Text in Color Documents,
ICIP01(I: 1066-1069).
IEEE DOI 0108
BibRef

Dimov, D.,
Using an Exact Performance of Hough Transform for Image Text Segmentation,
ICIP01(I: 778-781).
IEEE DOI 0108
BibRef

Lin, L.[Lin], Tan, C.L.[Chew Lim],
Text extraction from name cards with complex design,
ICDAR05(II: 977-980).
IEEE DOI 0508
BibRef

Lee, H.J.[Hsi-Jian], Lee, S.H.[Shan-Hung],
Design of a Chinese name card understanding system,
ICDAR05(II: 981-985).
IEEE DOI 0508
BibRef

Park, T., Kim, D., Chung, K.,
Orientation and Scale Invariant Text Region Extraction in WWW Images,
MVA98(xx-yy). BibRef 9800

Li, Y., Jain, A.K.[Anil K.],
Classification of Text Documents,
ICPR98(Vol II: 1295-1297).
IEEE DOI 9808
BibRef

Li, J.[Jia], Gray, R.M.,
Text and picture segmentation by the distribution analysis of wavelet coefficients,
ICIP98(III: 790-794).
IEEE DOI 9810
BibRef

Zhou, J., Lopresti, D.P.,
Extracting Text from WWW Images,
ICDAR97(248-252).
IEEE DOI 9708
BibRef

Gao, J.B.[Jing-Bo], Li, X.Y.[Xin-You], Tang, Z.S.[Ze-Sheng],
Segmentation of stick text based on sub connected area analysis,
ICDAR97(417-421).
IEEE DOI 9708
BibRef

Cavnar, W., Trenkle, J.,
N-Gram-Based Text Categorization,
SDAIR94(161-169). BibRef 9400

Le Bourgeois, F., Bublinski, Z., Emptoz, H.,
A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents,
ICPR92(II:272-276).
IEEE DOI 9208
BibRef

Filipski, A.,
Recognition of hand-lettered characters in the GTX 5000 drawing processor,
CVPR89(686-691).
IEEE DOI 0403
BibRef

Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Text Line Extraction in Documents .


Last update:Nov 1, 2021 at 09:26:50