Nadler, M.,
Document Segmentation and Coding Techniques,
CVGIP(28), No. 2, November 1984, pp. 240-262.
Survey, Page Segmentation.
BibRef
8411
Pavlidis, T.[Theo],
Zhou, J.Y.[Jiang-Ying],
Page Segmentation and Classification,
GMIP(54), No. 6, November 1992, pp. 484-496.
Survey, Page Segmentation.
BibRef
9211
Pavlidis, T.[Theo],
Page Segmentation by White Streams,
ICDAR91(945-953).
BibRef
9100
Zlatopolsky, A.A.,
Automated Document Segmentation,
PRL(15), No. 7, July 1994, pp. 699-704.
BibRef
9407
Leng, G.W.,
Mital, D.P.,
Yong, T.S.,
Kang, T.K.,
A Differential-Processing Extraction Approach to
Text and Image Segmentation,
EngAAI(7), No. 6, December 1994, pp. 639-651.
BibRef
9412
Jain, A.K.,
Zhong, Y.,
Page Segmentation Using Texture Analysis,
PR(29), No. 5, May 1996, pp. 743-770.
WWW Version.
9605
BibRef
Earlier:
Page segmentation using texture discrimination masks,
ICIP95(III: 308-311).
WWW Version.
9510
BibRef
Jain, A.K.,
Bhattacharjee, S.,
Text Segmentation Using Gabor Filters for
Automatic Document Processing,
MVA(5), 1992, pp. 169-184.
BibRef
9200
Jain, A.K.,
Bhattacharjee, S.K.,
Chen, Y.,
On texture in document images,
CVPR92(677-680).
IEEE Abstract. IEEE Top Reference.
0403
BibRef
Venkateswarlu, N.B.,
Boyle, R.D.,
New segmentation techniques for document image analysis,
IVC(13), No. 7, September 1995, pp. 573-583.
WWW Version.
0401
BibRef
Shih, F.Y.,
Chen, S.S.,
Adaptive Document Block Segmentation and Classification,
SMC-B(26), No. 5, October 1996, pp. 797-802.
IEEE Top Reference. Segment based on run length smoothing. Then a rule-based classification
into text, graphics, picture.
BibRef
9610
Chen, S.,
Haralick, R.M.,
Phillips, I.T.,
Extraction of Text Lines and Text Blocks on
Document Images Based on Statistical Modeling,
IJIST(7), No. 4, Winter 1996, pp. 343-356.
9612
BibRef
Patel, D.,
Page Segmentation for Document Image-Analysis Using a Neural-Network,
OptEng(35), No. 7, July 1996, pp. 1854-1861.
9608
BibRef
Patel, D.,
Stonham, T.J.,
Texture image classification and segmentation using RANK-order
clustering,
ICPR92(III:92-95).
WWW Version.
9208
BibRef
Payne, J.S.,
Stonham, T.J.,
Patel, D.,
Document segmentation using texture analysis,
ICPR94(B:380-382).
WWW Version.
9410
BibRef
Etemad, K.,
Doermann, D.,
Chellappa, R.,
Multiscale Segmentation of Unstructured Document Pages Using
Soft Decision Integration,
PAMI(19), No. 1, January 1997, pp. 92-96.
IEEE Abstract. IEEE Top Reference.
WWW Version.
9702
BibRef
And:
Multiscale Document Page Segmentation Using Soft Decision Integration,
UMDTR3444, 1995.
WWW Version.
WWW Version.
BibRef
Earlier:
Page Segmentation Using Decision Integration and Wavelet Packets,
ICPR94(B:345-349).
WWW Version. Classify regions of the page image into text or images.
BibRef
Etemad, K.[Kamran],
Multi-Scale Discriminant Analysis and Recognition of Signals and Images,
Ph.D.Thesis, April 1996.
BibRef
9604
UMDTR3629.
The goal is to find efficient multi-scale representations that yield
maximum between-class separations and minimum within-class scatters.
WWW Version.
WWW Version. Also for Faces.
BibRef
Chen, J.L.,
A Simplified Approach to the HMM Based Texture Analysis
and Its Application to Document Segmentation,
PRL(18), No. 10, October 1997, pp. 993-1007.
9802Markov model texture analysis.
BibRef
Kise, K.[Koichi],
Sato, A.[Akinori],
Iwata, M.[Motoi],
Segmentation of Page Images Using the Area Voronoi Diagram,
CVIU(70), No. 3, June 1998, pp. 370-382.
WWW Version.
For evaluation:
See also Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms.
BibRef
9806
Hobby, J.D.[John D.],
Matching Document Images with Ground Truth,
IJDAR(1), No. 1, Spring 1998, pp. xx-yy.
BibRef
9800
Earlier:
ICDAR97(Tu-2B)
9708
BibRef
Cinque, L.,
Lombardi, L.,
Manzini, G.,
A Multiresolution Approach for Page Segmentation,
PRL(19), No. 2, February 1998, pp. 217-225.
9808
See also Shape-Description and Recognition by a Multiresolution Approach.
BibRef
Cantoni, V.,
Cinque, L.,
Lombardi, L.,
Manzini, G.,
Page Segmentation Using a Pyramidal Architecture,
CAMP97(Session 6).
BibRef
9700
Cinque, L.,
Levialdi, S.,
Lombardi, L.,
Tanimoto, S.,
Segmentation of page images having artifacts of photocopying and
scanning,
PR(35), No. 5, May 2002, pp. 1167-1177.
WWW Version.
0202
BibRef
Cinque, L.,
Forino, L.,
Levialdi, S.,
Lombardi, L.,
Tanimoto, S.,
Understanding the page logical structure,
CIAP99(1003-1008).
WWW Version.
9909
BibRef
Cinque, L.,
Levialdi, S.,
Malizia, A.,
de Rosa, F.,
DAN:
An Automatic Segmentation and Classification Engine for Paper Documents,
DAS02(491 ff.).
HTML Version.
0303
BibRef
Cinque, L.,
Levialdi, S.,
Malizia, A.,
A system for the automatic layout segmentation and classification of
digital documents,
CIAP03(201-206).
IEEE Abstract. IEEE Top Reference.
0310
BibRef
Liu, J.M.,
Tang, Y.Y.,
Distributed Autonomous Agents For Chinese Document Image Segmentation,
PRAI(12), No. 1, February 1998, pp. 97-118.
9806
See also Adaptive Image Segmentation With Distributed Behavior-Based Agents.
BibRef
de Queiroz, R.L.,
Processing JPEG Compressed Images and Documents,
IP(7), No. 12, December 1998, pp. 1661-1672.
WWW Version.
9812
BibRef
de Queiroz, R.L.,
Processing JPEG-Compressed Images,
ICIP97(II: 334-337).
WWW Version.
BibRef
9700
de Queiroz, R.L.,
Eschbach, R.,
Fast Segmentation of the JPEG Compressed Documents,
JEI(7), No. 2, April 1998, pp. 367-377.
9807
BibRef
de Queiroz, R.L., and
Eschbach, R.,
Segmentation of Compressed Documents,
ICIP97(III: 70-73).
WWW Version.
BibRef
9700
de Queiroz, R.L.[Ricardo L.],
Compression of Compound Documents,
ICIP99(I:209-213).
IEEE Abstract. IEEE Top Reference.
BibRef
9900
Antonacopoulos, A.[Apostolos],
Page Segmentation Using the Description of the Background,
CVIU(70), No. 3, June 1998, pp. 350-369.
WWW Version.
BibRef
9806
Jain, A.K.,
Yu, B.,
Document Representation and Its Application to Page Decomposition,
PAMI(20), No. 3, March 1998, pp. 294-308.
IEEE Abstract. IEEE Top Reference.
WWW Version.
9805Generates a structured version of the document for editing, storage,
retrieval, and analysis. Performs skew correction, segmentation, and
labeling (text, table, image, drawing, and ruler).
Some review of approaches.
BibRef
Jain, A.K.,
Yu, B.,
Model-Based Document Representation: Application to Page Segmentation,
ICDAR97(Mo-2B)
9708
BibRef
Yang, J.C.Y.[James Ching-Yu],
Tsai, W.H.[Wen-Hsiang],
Document image segmentation and quality improvement by moiré pattern
analysis,
SP:IC(15), No. 9, July 2000, pp. 781-797.
WWW Version.
0008
BibRef
Mao, S.[Song],
Kanungo, T.[Tapas],
Empirical Performance Evaluation Methodology and Its Application to
Page Segmentation Algorithms,
PAMI(23), No. 3, March 2001, pp. 242-256.
IEEE Abstract. IEEE Top Reference.
WWW Version.
0103
Survey, Page Segmentation.
Evaluation, Page Segmentation. Created separate test and training data, a computable performance metric,
find optimal parameters for different algorithms, evaluate.
Compare
Voronoi (Kise) (
See also Segmentation of Page Images Using the Area Voronoi Diagram. );
Docstrum (O'Gorman) (
See also Document Spectrum for Page Layout Analysis, The. );
Caere (commercial system) (
See also Caere. );
(these 3 have about the same performance)
Are better than
ScanSoft (commercial system) (
See also ScanSoft. );
which is better than the older X-Y cut (
See also Prototype Document Image Analysis System for Technical Journals, A. ).
Similar conclusion in later analysis:
See also Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms.
BibRef
Mao, S.[Song],
Kanungo, T.[Tapas],
Software Architecture of PSET: A Page Segmentation Evaluation Toolkit,
IJDAR(4), No. 3, 2002, pp. 205-217.
HTML Version.
0205
BibRef
Earlier:
UMD--TR4190, September 2000.
WWW Version.
WWW Version.
Evaluation, Page Segmentation.
BibRef
Mao, S.[Song],
Kanungo, T.[Tapas],
A Methodology for Empirical Performance Evaluation of
Page Segmentation Algorithms,
UMD--TR4093, December 1999.
WWW Version.
WWW Version.
WWW Version.
WWW Version.
BibRef
9912
Mao, S.,
Kanungo, T.,
Automatic Training of Page Segmentation Algorithms:
An Optimization Approach,
ICPR00(Vol IV: 531-534).
WWW Version.
HTML Version.
0009
BibRef
Kanungo, T.,
Mao, S.[Song],
Stochastic language models for style-directed layout analysis of
document images,
IP(12), No. 5, May 2003, pp. 583-596.
WWW Version.
0307
BibRef
Amin, A.[Adnan],
Shiu, R.[Ricky],
Page Segmentation And Classification Utilizing Bottom-up Approach,
IJIG(1), No. 2, April 2001, pp. 345-361.
0104
BibRef
Deng, S.[Shulan],
Latifi, S.[Shahram],
Regentova, E.E.[Emma E.],
Document segmentation using polynomial spline wavelets,
PR(34), No. 12, December 2001, pp. 2533-2545.
WWW Version.
0110
BibRef
Regentova, E.E.,
Latifi, S.,
Chen, D.,
Taghva, K.,
Yao, D.,
Document analysis by processing JBIG-encoded images,
IJDAR(7), No. 4, September 2005, pp. 260-272.
WWW Version.
0512
BibRef
Diligenti, M.[Michelangelo],
Frasconi, P.[Paolo],
Gori, M.[Marco],
Hidden Tree Markov Models for Document Image Classification,
PAMI(25), No. 4, April 2003, pp. 520-524.
IEEE Abstract. IEEE Top Reference.
0304
Learning. Learn the concept of a set of documents of similar structure.
BibRef
Diligenti, M.,
Gori, M.,
Maggini, M.,
Scarselli, F.,
Classification of HTML documents by Hidden Tree-Markov Models,
ICDAR01(849-853).
WWW Version.
0109
BibRef
Haji, M.M.,
Katebi, S.D.,
An Efficient Text Segmentation Technique Based
on Naive Bayes Classifier,
GVIP(05), No. V7, 2005, pp. 21-30
HTML Version.
BibRef
0500
Wang, Y.[Yalin],
Phillips, I.T.[Ihsin T.],
Haralick, R.M.[Robert M.],
Document zone content classification and its performance evaluation,
PR(39), No. 1, January 2006, pp. 57-73.
WWW Version.
0512
Evaluation, Page Segmentation.
BibRef
Earlier:
A Study on the Document Zone Content Classification Problem,
DAS02(212 ff.).
HTML Version.
0303
BibRef
And:
A method for document zone content classification,
ICPR02(III: 196-199).
WWW Version.
0211
BibRef
Earlier: A1, A3, A2:
Zone content classification and its performance evaluation,
ICDAR01(540-544).
WWW Version.
0109 See also Table structure understanding and its performance evaluation.
BibRef
Leydier, Y.[Yann],
Le Bourgeois, F.[Frank],
Emptoz, H.[Hubert],
Text search for medieval manuscript images,
PR(40), No. 12, December 2007, pp. 3552-3567.
WWW Version.
0709
BibRef
Earlier:
Omnilingual segmentation-free word spotting for ancient manuscripts
indexation,
ICDAR05(I: 533-537).
WWW Version.
0508
BibRef
Earlier:
Serialized unsupervised classifier for adaptative color image
segmentation: application to digitized ancient manuscripts,
ICPR04(I: 494-497).
WWW Version.
0409Word-spotting; Medieval manuscripts
BibRef
Le Bourgeois, F.[Frank],
Kaileh, H.[Hala],
Automatic Metadata Retrieval from Ancient Manuscripts,
DAS04(75-89).
WWW Version.
0505
BibRef
Allier, B.,
Emptoz, H.,
Segmentation and typography extraction in document images using
geodesic active regions,
ICPR04(I: 409-412).
WWW Version.
0409
BibRef
Shafait, F.[Faisal],
Keysers, D.[Daniel],
Breuel, T.M.[Thomas M.],
Performance Evaluation and Benchmarking of Six-Page Segmentation
Algorithms,
PAMI(30), No. 6, June 2008, pp. 941-954.
WWW Version.
0804
Survey, Page Segmentation.
Evaluation, Page Segmentation.
BibRef
Earlier:
Performance Comparison of Six Algorithms for Page Segmentation,
DAS06(368-379).
WWW Version.
0602
BibRef
And:
Pixel-Accurate Representation and Evaluation of Page Segmentation in
Document Images,
ICPR06(I: 872-875).
WWW Version.
0609Also use the dummy program -- no segmentation for a minimum level.
X-Y Cut (
See also Prototype Document Image Analysis System for Technical Journals, A. ),
Run Length Smearing (
See also Document Analysis System. ),
Whitespace Analysis (
See also Two Geometric Algorithms for Layout Analysis. ) and
Constrained textline detection.
The last two:
Docstrum (
See also Document Spectrum for Page Layout Analysis, The. ),
Voronoi (
See also Segmentation of Page Images Using the Area Voronoi Diagram. ).
are generally the best choice.
For similar analysis also see:
See also Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms.
BibRef
Peng, L.R.[Liang-Rui],
Chen, M.[Ming],
Liu, C.S.[Chang-Song],
Ding, X.Q.[Xiao-Qing],
Zheng, J.R.[Ji-Rong],
An automatic performance evaluation method for document page
segmentation,
ICDAR01(134-137).
WWW Version.
0109
BibRef
Fumera, G.,
Pillai, I.,
Roli, F.,
Classification with reject option in text categorisation systems,
CIAP03(582-587).
IEEE Abstract. IEEE Top Reference.
0310
BibRef
Ma, H.[Huanfeng],
Doermann, D.,
Gabor filter based multi-class classifier for scanned document images,
ICDAR03(968-972).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
Allier, B.[Bénédicte],
Emptoz, H.[Hubert],
Type extraction and character prototyping using gabor filters,
ICDAR03(799-803).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
And:
Character prototyping in document images using Gabor filters,
ICIP03(I: 537-540).
IEEE Abstract. IEEE Top Reference.
0312
BibRef
And:
SCIA03(28-35).
WWW Version.
0310
BibRef
Laurence, D.[Duffy],
Le Bourgeois, F.[Frank],
Emptoz, H.[Hubert],
Logical structure analysis by typographic characteristics extraction,
CIAP97(II: 639-646).
WWW Version.
9709
BibRef
Allier, B.,
Duong, J.,
Gagneux, A.,
Mallet, P.,
Emptoz, H.,
Texture feature characterization for logical pre-labeling,
ICDAR03(567-571).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
Liu, L.J.[Li-Jie],
Dong, Y.[Yan],
Song, X.M.[Xiao-Mu],
Fan, G.L.[Guo-Liang],
An entropy-based segmentation algorithm for computer-generated
documentimages,
ICIP03(I: 541-544).
IEEE Abstract. IEEE Top Reference.
0312
BibRef
Antonacopoulos, A.,
Gatos, B.,
Bridson, D.,
ICDAR2005 page segmentation competition,
ICDAR05(I: 75-79).
WWW Version.
0508
BibRef
Earlier:
ICDAR 2003 page segmentation competition,
ICDAR03(688-692).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
Leedham, G.,
Yan, C.[Chen],
Takru, K.,
Tan, J.H.N.[Joie Hadi Nata],
Mian, L.[Li],
Comparison of some thresholding algorithms for text/background
segmentation in difficult document images,
ICDAR03(859-864).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
Leedham, G.,
Varma, S.,
Patankar, A.,
Govindaraju, V.,
Separating text and background in degraded document images:
A comparison of global thresholding techniques for
multi-stage thresholding,
FHR02(244-249).
IEEE Top Reference.
0209
BibRef
Kise, K.,
Miki, Y.,
Matsumoto, K.,
Stippling data on backgrounds of pages-toward seamless integration of
paper and electronic documents,
ICDAR03(1213-1217).
IEEE Abstract. IEEE Top Reference.
0311
BibRef
Kise, K.,
Yanagida, O.,
Takamatsu, S.,
Page Segmentation Based on Thinning of Background,
ICPR96(III: 788-792).
WWW Version.
9608(Osaka Prefecture Univ., J)
BibRef
Kise, K.,
Yamaoka, M.,
Babaguchi, N.,
Tezuka, Y.,
Model based system for analyzing document images,
ICPR92(II:647-650).
WWW Version.
9208
BibRef
Suvichakorn, A.[Aimamorn],
Watcharabusaracum, S.[Sarin],
Sinthupinyo, W.[Wasin],
Simple Layout Segmentation of Gray-Scale Document Images,
DAS02(245 ff.).
HTML Version.
0303
BibRef
Caillault, E.,
Viard-Gaudin, C.,
Ahmad, A.R.,
MS-TDNN with global discriminant trainings,
ICDAR05(II: 856-860).
WWW Version.
0508NN HMM.
BibRef
Golenzer, J.,
Viard-Gaudin, C.,
Lallican, P.M.,
Finding regions of interest in document images by planar HMM,
ICPR02(III: 415-418).
WWW Version.
0211
BibRef
Sivaramakrishnam, R.,
Phillips, I.T.,
Ha, J.,
Subramanium, S.,
Haralick, R.M.,
Zone Classification in a Document Using the Method of
Feature Vector Generation,
ICDAR95(541-544).
Pixel based, multiple classes.
BibRef
9500
Cheng, H.[Hui],
Fan, Z.G.[Zhi-Gang],
Background identification based segmentation and multilayer tree
representation of document images,
ICIP02(III: 1005-1008).
IEEE Abstract. IEEE Top Reference.
0210
BibRef
Blumenstein, M.,
Verma, B.,
Analysis of segmentation performance on the CEDAR benchmark database,
ICDAR01(1142-1146).
WWW Version.
0109
BibRef
Yang, Y.D.[Yu-Dong],
Zhang, H.J.[Hong-Jiang],
HTML page analysis based on visual cues,
ICDAR01(859-864).
WWW Version.
0109
BibRef
Mukherjee, D.P.[Dipti Prasad],
Acton, S.T.[Scott T.],
Document Page Segmentation using Multiscale Clustering,
ICIP99(I:234-238).
IEEE Abstract. IEEE Top Reference.
BibRef
9900
He, S.,
Abe, N.,
A Clustering-Based Approach to the Separation of Text Strings from
Mixed Text/Graphics Documents,
ICPR96(III: 706-710).
WWW Version.
9608(National Univers. of Singapore, SGP)
BibRef
Randen, T.[Trygve], and
Husøy, J.H.[John Håkon],
Segmentation of text/image documents using texture approaches,
Proc.
NOBIM-konferansen-94, Asker (Norway), June 1994, pp. 60-67.
HTML Version.
BibRef
9406
Fischer, S.,
Amin, A., and
Drivas, D.,
Segmentation of the Yellow Pages,
ICDAR95(605-609).
BibRef
9500
Randriamasy, S.,
Vincent, L.,
Benchmarking Page Segmentation Algorithms,
CVPR94(411-416).
IEEE Abstract. IEEE Top Reference.
BibRef
9400
Higashino, J.,
Fujisawa, H.,
Nakano, Y.,
Ejiri, M.,
A Knowledge-Based Segmentation Method for Document Understanding,
ICPR86(745-748).
Top-down layout analysis using FDL.
BibRef
8600
Makino, H.,
Representation and Segmentation of Document Images,
CVPR839291-295).
BibRef
8300
Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Find Text in Documents .