25.2.2.3.13 Document Layout, Structure Analysis, Web Documents, Online Documents

Chapter Contents (Back)
Document Analysis. Document Layout. Web Pages. Application, Document Layout.

Diffbot,
2011.
WWW Link. Vendor, Document Analysis. Structural analysis of web pages, observe when changes occur.

Karatzs, D.[Dimosthenis], Antonacopoulos, A.[Apostolos],
Colour text segmentation in web images based on human perception,
IVC(25), No. 5, 1 May 2007, pp. 564-577.
Elsevier DOI 0703
BibRef
Earlier:
Text extraction from web images based on a split-and-merge segmentation method using colour perception,
ICPR04(II: 634-637).
IEEE DOI 0409
BibRef
Earlier:
Two approaches for text segmentation in web images,
ICDAR03(131-137).
IEEE DOI 0311
BibRef
Earlier: A2, A1:
Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception,
DAS02(295 ff.).
Springer DOI 0303
Web document image analysis; Colour document analysis; Character segmentation; Text segmentation; Colour images BibRef

Ashraf, F., Ozyer, T., Alhajj, R.,
Employing Clustering Techniques for Automatic Information Extraction From HTML Documents,
SMC-C(38), No. 5, September 2008, pp. 660-673.
IEEE DOI 0810
BibRef

Carullo, M.[Moreno], Binaghi, E.[Elisabetta], Gallo, I.[Ignazio],
An online document clustering technique for short web contents,
PRL(30), No. 10, 15 July 2009, pp. 870-876.
Elsevier DOI 0906
Online clustering; Short documents analysis; Similarity measures BibRef

Carullo, M.[Moreno], Binaghi, E.[Elisabetta], Gallo, I.[Ignazio], Lamberti, N.[Nicola],
Clustering of short commercial documents for the web,
ICPR08(1-4).
IEEE DOI 0812
BibRef

Borges, K.A.V.[Karla A.V.], Davis, C.A.[Clodoveu A.], Laender, A.H.F.[Alberto H.F.], Medeiros, C.B.[Claudia Bauzer],
Ontology-driven discovery of geospatial evidence in web pages,
GeoInfo(15), No. 4, October 2011, pp. 609-631.
WWW Link. 1110
BibRef

Lu, W.T.[Wen-Ting], Li, J.X.[Jing-Xuan], Li, T.[Tao], Guo, W.D.[Wei-Dong], Zhang, H.G.[Hong-Gang], Guo, J.[Jun],
Web Multimedia Object Classification Using Cross-Domain Correlation Knowledge,
MultMed(15), No. 8, December 2013, pp. 1920-1929.
IEEE DOI 1402
Internet BibRef

Lu, W.T.[Wen-Ting], Li, L.[Lei], Li, T.[Tao], Zhang, H.G.[Hong-Gang], Guo, J.[Jun],
Web Multimedia Object Clustering via Information Fusion,
ICDAR11(319-323).
IEEE DOI 1111
BibRef

Embley, D.W.[David W.], Krishnamoorthy, M.S.[Mukkai S.], Nagy, G.[George], Seth, S.[Sharad],
Converting heterogeneous statistical tables on the web to searchable databases,
IJDAR(19), No. 2, June 2016, pp. 119-138.
Springer DOI 1605
BibRef

Nagy, G.[George], Seth, S.[Sharad],
Table headers: An entrance to the data mine,
ICPR16(4065-4070)
IEEE DOI 1705
Aggregates, Algorithm design and analysis, HTML, Indexing, Resource description framework, Syntactics, category hierarchies, spanning cells, table, headers BibRef

Wu, O., Zuo, H., Hu, W., Li, B.,
Multimodal Web Aesthetics Assessment Based on Structural SVM and Multitask Fusion Learning,
MultMed(18), No. 6, June 2016, pp. 1062-1076.
IEEE DOI 1605
Feature extraction BibRef

Cormier, M.[Michael], Moffatt, K.[Karyn], Cohen, R.[Robin], Mann, R.[Richard],
Purely vision-based segmentation of web pages for assistive technology,
CVIU(148), No. 1, 2016, pp. 46-66.
Elsevier DOI 1606
Webpage segmentation BibRef

Cormier, M.[Michael], Mann, R.[Richard], Moffatt, K.[Karyn], Cohen, R.[Robin],
Towards an Improved Vision-Based Web Page Segmentation Algorithm,
CRV17(345-352)
IEEE DOI 1804
Internet, handicapped aids, image segmentation, visual perception, back-end process, decluttering, segmentation BibRef

Mei, T., Li, L., Tian, X., Tao, D., Ngo, C.W.,
PageSense: Toward Stylewise Contextual Advertising via Visual Analysis of Web Pages,
CirSysVideo(28), No. 1, January 2018, pp. 254-266.
IEEE DOI 1801
Advertising, Circuits and systems, Color, Layout, Visualization, Vocabulary, Contextual advertising, visual content analysis BibRef

Kim, N.[Namyun],
Extracting and searching news articles in web portal news pages,
IJCVR(10), No. 3, 2020, pp. 202-212.
DOI Link 2005
BibRef

Huang, X.[Xia], Chong, K.F.E.[Kai Fong Ernest],
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence,
IJCV(131), No. 1, January 2023, pp. 3035-3059.
Springer DOI 2310
BibRef


Sadeghi, Z.[Zahra], Homayounvala, E.[Elaheh], Borhani, M.[Mostafa],
HCI for Elderly, Measuring Visual Complexity of Webpages Based on Machine Learning,
DICTA20(1-6)
IEEE DOI 2201
Human computer interaction, Computers, Visualization, Senior citizens, Web pages, Complexity theory, Clutter, personalization BibRef

Zheng, Q.L.[Quan-Long], Jiao, J.B.[Jian-Bo], Cao, Y.[Ying], Lau, R.W.H.[Rynson W. H.],
Task-Driven Webpage Saliency,
ECCV18(XIV: 300-316).
Springer DOI 1810
Where people look on web page. BibRef

Li, J., Su, L., Wu, B., Pang, J., Wang, C., Wu, Z., Huang, Q.,
Webpage saliency prediction with multi-features fusion,
ICIP16(674-678)
IEEE DOI 1610
Computational modeling BibRef

Baharum, A.[Aslina], Jaafar, A.[Azizah],
Identifying the Importance of Web Objects: A Study of ASEAN Perspectives,
IVIC15(464-475).
Springer DOI 1511
BibRef

Goyal, A., Jadon, M.K., Pujari, A.K.,
Spectral approach to find number of clusters of short-text documents,
NCVPRIPG13(1-4)
IEEE DOI 1408
Markov processes BibRef

Marinai, S.[Simone], Marino, E.[Emanuele], Soda, G.[Giovanni],
Conversion of PDF Books in ePub Format,
ICDAR11(478-482).
IEEE DOI 1111
BibRef

Karatzas, D., Mestre, S.R.[S. Robles], Mas, J., Nourbakhsh, F., Roy, P.P.[P. Pratim],
ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email),
ICDAR11(1485-1490).
IEEE DOI 1111
BibRef

Liu, G.[Gang], Qiu, B.[Bite], Liu, W.Y.[Wen-Yin],
Automatic Detection of Phishing Target from Phishing Webpage,
ICPR10(4153-4156).
IEEE DOI 1008
BibRef

Hassan, T.[Tamir],
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques,
ICDAR09(631-635).
IEEE DOI 0907
PDF does not have the structure give by html. BibRef

Ghosh, S.[Saptarshi], Mitra, P.[Pabitra],
Combining content and structure similarity for XML document classification using composite SVM kernels,
ICPR08(1-4).
IEEE DOI 0812
BibRef

Hirano, T.[Takashi], Okano, Y.[Yuichi], Okada, Y.[Yasuhiro], Yoda, F.[Fumio],
Text and Layout Information Extraction from Document Files of Various Formats Based on the Analysis of Page Description Language,
ICDAR07(262-266).
IEEE DOI 0709
BibRef

Burget, R.,
Layout Based Information Extraction from HTML Documents,
ICDAR07(624-628).
IEEE DOI 0709
BibRef

Guo, H., Mahmud, J., Borodin, Y., Stent, A., Ramakrishnan, I.,
A General Approach for Partitioning Web Page Content Based on Geometric and Style Information,
ICDAR07(929-933).
IEEE DOI 0709
BibRef

Yoshida, M., Nakagawa, H.,
Web Document Parsing: A New Approach to Modeling Layout-Language Relations,
ICDAR07(203-207).
IEEE DOI 0709
BibRef

Ferilli, S.[Stefano], Biba, M.[Marenglen], Basile, T.M.A.[Teresa M.A.], Esposito, F.[Floriana],
Incremental machine learning techniques for document layout understanding,
ICPR08(1-4).
IEEE DOI 0812
BibRef

Esposito, F., Ferilli, S., di Mauro, N., Basile, T.M.A.,
Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents,
ICDAR07(1093-1097).
IEEE DOI 0709
BibRef
Earlier: A1, A2, A4, A3:
Automatic Content-based Indexing of Digital Documents through Intelligent Processing Techniques,
DIAL06(204-219).
IEEE DOI 0604
BibRef
Earlier: A1, A2, A4, A3:
Intelligent document processing,
ICDAR05(II: 1100-1104).
IEEE DOI 0508

See also Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques. BibRef

Watai, Y.[Yasuyuki], Yamasaki, T.[Toshihiko], Aizawa, K.[Kiyoharu],
View-Based Web Page Retrieval using Interactive Sketch Query,
ICIP07(VI: 357-360).
IEEE DOI 0709
BibRef

Ma, J.C.[Jun-Chang], Gu, Z.M.[Zhi-Min],
A Shared Fragments Analysis System for Large Collections of Web Pages,
DAS06(390-401).
Springer DOI 0602
BibRef

Liu, W.Y.[Wen-Yin], Huang, G.[Guanglin], Liu, X.Y.[Xiao-Yue], Deng, X.[Xiaotie], Min, Z.[Zhang],
Phishing Web page detection,
ICDAR05(II: 560-564).
IEEE DOI 0508
BibRef

Feng, J., Haffner, P., Gilbert, M.,
A learning approach to discovering Web page semantic structures,
ICDAR05(II: 1055-1059).
IEEE DOI 0508
BibRef

Chao, H.[Hui], Lin, X.F.[Xiao Fan],
Capturing the layout of electronic documents for reuse in variable data printing,
ICDAR05(II: 940-944).
IEEE DOI 0508
BibRef

Chao, H.[Hui], Fan, J.[Jian],
Layout and Content Extraction for PDF Documents,
DAS04(213-224).
Springer DOI 0505
BibRef

Behera, A., Lalanne, D., Ingold, R.,
Enhancement of layout-based identification of low-resolution documents using geometrical color distribution,
ICDAR05(I: 468-472).
IEEE DOI 0508
BibRef

Mekhaldi, D.[Dalila], Lalanne, D.[Denis], Ingold, R.[Rolf],
From searching to browsing through multimodal documents linking,
ICDAR05(II: 924-928).
IEEE DOI 0508
BibRef
Earlier:
Unity Is Strength: Coupling Media for Thematic Segmentation,
DAS04(559-562).
Springer DOI 0505
BibRef

Rigamonti, M., Bloechle, J.L., Hadjar, K., Lalanne, D., Ingold, R.,
Towards a canonical and structured representation of PDF documents through reverse engineering,
ICDAR05(II: 1050-1054).
IEEE DOI 0508
BibRef

Hadjar, K., Rigamonti, M., Lalanne, D., Ingold, R.,
Xed: a new tool for extracting hidden structures from electronic documents,
DIAL04(212-224).
IEEE DOI 0404
BibRef

Hadjar, K., Ingold, R.,
Logical labeling of Arabic newspapers using artificial neural nets,
ICDAR05(I: 426-430).
IEEE DOI 0508
BibRef

Schenker, A.[Adam], Bunke, H.[Horst], Last, M.[Mark], Kandel, A.[Abraham],
A Graph-Based Framework for Web Document Mining,
DAS04(401-412).
Springer DOI 0505
BibRef

Schenker, A.[Adam], Last, M.[Mark], Bunke, H.[Horst], Kandel, A.[Abraham],
Classification of web documents using a graph model,
ICDAR03(240-244).
IEEE DOI 0311
BibRef

Vitali, F.[Fabio], di Iorio, A.[Angelo], Campori, E.V.[Elisa Ventura],
Rule-Based Structural Analysis of Web Pages,
DAS04(425-437).
Springer DOI 0505
BibRef

Hu, J.Y.[Jian-Ying], Bagga, A.,
Identifying story and preview images in news web pages,
ICDAR03(640-644).
IEEE DOI 0311
BibRef

Ramachandran, S., Kashi, R.,
An architecture for ink annotations on web documents,
ICDAR03(256-260).
IEEE DOI 0311
BibRef

Gagneux, A., Emptoz, H.,
Web site: a structured document,
ICDAR03(1158-1162).
IEEE DOI 0311
BibRef

Mukherjee, S., Yang, G.Z.[Gui-Zhen], Tan, W.F.[Wen-Fang], Ramakrishnan, I.V.,
Automatic discovery of semantic structures in HTML documents,
ICDAR03(245-249).
IEEE DOI 0311
BibRef

Alam, H., Kumar, A., Nakamura, M., Rahman, F., Tarnikova, Y., Wilcox, C.[Che],
Structured and unstructured document summarization: Design of a commercial summarizer using Lexical chains,
ICDAR03(1147-1152).
IEEE DOI 0311
BibRef

Rahman, F., Alam, H.,
A commercial Web based digital library for sharing and distributing documents,
DIAL04(93-103).
IEEE DOI 0404
BibRef

Alam, H., Hartono, R., Kumar, A., Rahman, F., Tarnikova, Y., Wilcox, C.[Che],
Web page summarization for handheld devices: a natural language approach,
ICDAR03(1153-1158).
IEEE DOI 0311
BibRef

Rahman, A.F.R., Alam, H., Hartono, R., Ariyoshi, K.,
Automatic summarization of Web content to smaller display devices,
ICDAR01(1064-1068).
IEEE DOI 0109
BibRef

Serradura, L., Slimane, M., Vincent, N.,
Web sites thematic classification using hidden Markov models,
ICDAR01(1094-1098).
IEEE DOI 0109
BibRef

Penn, G., Hu, J.Y.[Jian-Ying], Luo, H.B.[Heng-Bin], McDonald, R.,
Flexible Web document analysis for delivery to narrow-bandwidth devices,
ICDAR01(1074-1078).
IEEE DOI 0109
BibRef

Anjewierden, A.,
AIDAS: incremental logical structure discovery in PDF documents,
ICDAR01(374-378).
IEEE DOI 0109
BibRef

Athitsos, V., Swain, M.J., Frankel, C.,
Distinguishing photographs and graphics on the World Wide Web,
CBAIVL97(10).
IEEE DOI 9706
BibRef

Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Document Retrieval Systems, Databases and Issues, Libraries .


Last update:Oct 22, 2024 at 22:09:59