Diffbot,
2011.
WWW Link.
Vendor, Document Analysis. Structural analysis of web pages, observe when changes occur.
Karatzs, D.[Dimosthenis],
Antonacopoulos, A.[Apostolos],
Colour text segmentation in web images based on human perception,
IVC(25), No. 5, 1 May 2007, pp. 564-577.
Elsevier DOI
0703
BibRef
Earlier:
Text extraction from web images based on a split-and-merge segmentation
method using colour perception,
ICPR04(II: 634-637).
IEEE DOI
0409
BibRef
Earlier:
Two approaches for text segmentation in web images,
ICDAR03(131-137).
IEEE DOI
0311
BibRef
Earlier: A2, A1:
Fuzzy Segmentation of Characters in Web Images Based on Human Colour
Perception,
DAS02(295 ff.).
Springer DOI
0303
Web document image analysis; Colour document analysis;
Character segmentation; Text segmentation; Colour images
BibRef
Ashraf, F.,
Ozyer, T.,
Alhajj, R.,
Employing Clustering Techniques for Automatic Information Extraction
From HTML Documents,
SMC-C(38), No. 5, September 2008, pp. 660-673.
IEEE DOI
0810
BibRef
Carullo, M.[Moreno],
Binaghi, E.[Elisabetta],
Gallo, I.[Ignazio],
An online document clustering technique for short web contents,
PRL(30), No. 10, 15 July 2009, pp. 870-876.
Elsevier DOI
0906
Online clustering; Short documents analysis; Similarity measures
BibRef
Carullo, M.[Moreno],
Binaghi, E.[Elisabetta],
Gallo, I.[Ignazio],
Lamberti, N.[Nicola],
Clustering of short commercial documents for the web,
ICPR08(1-4).
IEEE DOI
0812
BibRef
Borges, K.A.V.[Karla A.V.],
Davis, C.A.[Clodoveu A.],
Laender, A.H.F.[Alberto H.F.],
Medeiros, C.B.[Claudia Bauzer],
Ontology-driven discovery of geospatial evidence in web pages,
GeoInfo(15), No. 4, October 2011, pp. 609-631.
WWW Link.
1110
BibRef
Lu, W.T.[Wen-Ting],
Li, J.X.[Jing-Xuan],
Li, T.[Tao],
Guo, W.D.[Wei-Dong],
Zhang, H.G.[Hong-Gang],
Guo, J.[Jun],
Web Multimedia Object Classification Using Cross-Domain Correlation
Knowledge,
MultMed(15), No. 8, December 2013, pp. 1920-1929.
IEEE DOI
1402
Internet
BibRef
Lu, W.T.[Wen-Ting],
Li, L.[Lei],
Li, T.[Tao],
Zhang, H.G.[Hong-Gang],
Guo, J.[Jun],
Web Multimedia Object Clustering via Information Fusion,
ICDAR11(319-323).
IEEE DOI
1111
BibRef
Embley, D.W.[David W.],
Krishnamoorthy, M.S.[Mukkai S.],
Nagy, G.[George],
Seth, S.[Sharad],
Converting heterogeneous statistical tables on the web to searchable
databases,
IJDAR(19), No. 2, June 2016, pp. 119-138.
Springer DOI
1605
BibRef
Nagy, G.[George],
Seth, S.[Sharad],
Table headers: An entrance to the data mine,
ICPR16(4065-4070)
IEEE DOI
1705
Aggregates, Algorithm design and analysis, HTML, Indexing,
Resource description framework, Syntactics, category hierarchies,
spanning cells, table, headers
BibRef
Wu, O.,
Zuo, H.,
Hu, W.,
Li, B.,
Multimodal Web Aesthetics Assessment Based on Structural SVM and
Multitask Fusion Learning,
MultMed(18), No. 6, June 2016, pp. 1062-1076.
IEEE DOI
1605
Feature extraction
BibRef
Cormier, M.[Michael],
Moffatt, K.[Karyn],
Cohen, R.[Robin],
Mann, R.[Richard],
Purely vision-based segmentation of web pages for assistive
technology,
CVIU(148), No. 1, 2016, pp. 46-66.
Elsevier DOI
1606
Webpage segmentation
BibRef
Cormier, M.[Michael],
Mann, R.[Richard],
Moffatt, K.[Karyn],
Cohen, R.[Robin],
Towards an Improved Vision-Based Web Page Segmentation Algorithm,
CRV17(345-352)
IEEE DOI
1804
Internet, handicapped aids, image segmentation,
visual perception, back-end process, decluttering,
segmentation
BibRef
Mei, T.,
Li, L.,
Tian, X.,
Tao, D.,
Ngo, C.W.,
PageSense: Toward Stylewise Contextual Advertising via Visual
Analysis of Web Pages,
CirSysVideo(28), No. 1, January 2018, pp. 254-266.
IEEE DOI
1801
Advertising, Circuits and systems, Color, Layout,
Visualization, Vocabulary, Contextual advertising,
visual content analysis
BibRef
Kim, N.[Namyun],
Extracting and searching news articles in web portal news pages,
IJCVR(10), No. 3, 2020, pp. 202-212.
DOI Link
2005
BibRef
Huang, X.[Xia],
Chong, K.F.E.[Kai Fong Ernest],
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label
Non-conformity in Web Images Via a New Generalized KL Divergence,
IJCV(131), No. 1, January 2023, pp. 3035-3059.
Springer DOI
2310
BibRef
Zheng, Q.L.[Quan-Long],
Jiao, J.B.[Jian-Bo],
Cao, Y.[Ying],
Lau, R.W.H.[Rynson W. H.],
Task-Driven Webpage Saliency,
ECCV18(XIV: 300-316).
Springer DOI
1810
Where people look on web page.
BibRef
Li, J.,
Su, L.,
Wu, B.,
Pang, J.,
Wang, C.,
Wu, Z.,
Huang, Q.,
Webpage saliency prediction with multi-features fusion,
ICIP16(674-678)
IEEE DOI
1610
Computational modeling
BibRef
Baharum, A.[Aslina],
Jaafar, A.[Azizah],
Identifying the Importance of Web Objects:
A Study of ASEAN Perspectives,
IVIC15(464-475).
Springer DOI
1511
BibRef
Goyal, A.,
Jadon, M.K.,
Pujari, A.K.,
Spectral approach to find number of clusters of short-text documents,
NCVPRIPG13(1-4)
IEEE DOI
1408
Markov processes
BibRef
Marinai, S.[Simone],
Marino, E.[Emanuele],
Soda, G.[Giovanni],
Conversion of PDF Books in ePub Format,
ICDAR11(478-482).
IEEE DOI
1111
BibRef
Karatzas, D.,
Mestre, S.R.[S. Robles],
Mas, J.,
Nourbakhsh, F.,
Roy, P.P.[P. Pratim],
ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in
Born-Digital Images (Web and Email),
ICDAR11(1485-1490).
IEEE DOI
1111
BibRef
Liu, G.[Gang],
Qiu, B.[Bite],
Liu, W.Y.[Wen-Yin],
Automatic Detection of Phishing Target from Phishing Webpage,
ICPR10(4153-4156).
IEEE DOI
1008
BibRef
Hassan, T.[Tamir],
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques,
ICDAR09(631-635).
IEEE DOI
0907
PDF does not have the structure give by html.
BibRef
Ghosh, S.[Saptarshi],
Mitra, P.[Pabitra],
Combining content and structure similarity for XML document
classification using composite SVM kernels,
ICPR08(1-4).
IEEE DOI
0812
BibRef
Hirano, T.[Takashi],
Okano, Y.[Yuichi],
Okada, Y.[Yasuhiro],
Yoda, F.[Fumio],
Text and Layout Information Extraction from Document Files of Various
Formats Based on the Analysis of Page Description Language,
ICDAR07(262-266).
IEEE DOI
0709
BibRef
Burget, R.,
Layout Based Information Extraction from HTML Documents,
ICDAR07(624-628).
IEEE DOI
0709
BibRef
Guo, H.,
Mahmud, J.,
Borodin, Y.,
Stent, A.,
Ramakrishnan, I.,
A General Approach for Partitioning Web Page Content Based on Geometric
and Style Information,
ICDAR07(929-933).
IEEE DOI
0709
BibRef
Yoshida, M.,
Nakagawa, H.,
Web Document Parsing:
A New Approach to Modeling Layout-Language Relations,
ICDAR07(203-207).
IEEE DOI
0709
BibRef
Ferilli, S.[Stefano],
Biba, M.[Marenglen],
Basile, T.M.A.[Teresa M.A.],
Esposito, F.[Floriana],
Incremental machine learning techniques for document layout
understanding,
ICPR08(1-4).
IEEE DOI
0812
BibRef
Esposito, F.,
Ferilli, S.,
di Mauro, N.,
Basile, T.M.A.,
Incremental Learning of First Order Logic Theories for the Automatic
Annotations of Web Documents,
ICDAR07(1093-1097).
IEEE DOI
0709
BibRef
Earlier: A1, A2, A4, A3:
Automatic Content-based Indexing of Digital Documents through
Intelligent Processing Techniques,
DIAL06(204-219).
IEEE DOI
0604
BibRef
Earlier: A1, A2, A4, A3:
Intelligent document processing,
ICDAR05(II: 1100-1104).
IEEE DOI
0508
See also Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques.
BibRef
Watai, Y.[Yasuyuki],
Yamasaki, T.[Toshihiko],
Aizawa, K.[Kiyoharu],
View-Based Web Page Retrieval using Interactive Sketch Query,
ICIP07(VI: 357-360).
IEEE DOI
0709
BibRef
Ma, J.C.[Jun-Chang],
Gu, Z.M.[Zhi-Min],
A Shared Fragments Analysis System for Large Collections of Web Pages,
DAS06(390-401).
Springer DOI
0602
BibRef
Liu, W.Y.[Wen-Yin],
Huang, G.[Guanglin],
Liu, X.Y.[Xiao-Yue],
Deng, X.[Xiaotie],
Min, Z.[Zhang],
Phishing Web page detection,
ICDAR05(II: 560-564).
IEEE DOI
0508
BibRef
Feng, J.,
Haffner, P.,
Gilbert, M.,
A learning approach to discovering Web page semantic structures,
ICDAR05(II: 1055-1059).
IEEE DOI
0508
BibRef
Chao, H.[Hui],
Lin, X.F.[Xiao Fan],
Capturing the layout of electronic documents for reuse in variable data
printing,
ICDAR05(II: 940-944).
IEEE DOI
0508
BibRef
Chao, H.[Hui],
Fan, J.[Jian],
Layout and Content Extraction for PDF Documents,
DAS04(213-224).
Springer DOI
0505
BibRef
Behera, A.,
Lalanne, D.,
Ingold, R.,
Enhancement of layout-based identification of low-resolution documents
using geometrical color distribution,
ICDAR05(I: 468-472).
IEEE DOI
0508
BibRef
Mekhaldi, D.[Dalila],
Lalanne, D.[Denis],
Ingold, R.[Rolf],
From searching to browsing through multimodal documents linking,
ICDAR05(II: 924-928).
IEEE DOI
0508
BibRef
Earlier:
Unity Is Strength: Coupling Media for Thematic Segmentation,
DAS04(559-562).
Springer DOI
0505
BibRef
Rigamonti, M.,
Bloechle, J.L.,
Hadjar, K.,
Lalanne, D.,
Ingold, R.,
Towards a canonical and structured representation of PDF documents
through reverse engineering,
ICDAR05(II: 1050-1054).
IEEE DOI
0508
BibRef
Hadjar, K.,
Rigamonti, M.,
Lalanne, D.,
Ingold, R.,
Xed: a new tool for extracting hidden structures from electronic
documents,
DIAL04(212-224).
IEEE DOI
0404
BibRef
Hadjar, K.,
Ingold, R.,
Logical labeling of Arabic newspapers using artificial neural nets,
ICDAR05(I: 426-430).
IEEE DOI
0508
BibRef
Schenker, A.[Adam],
Bunke, H.[Horst],
Last, M.[Mark],
Kandel, A.[Abraham],
A Graph-Based Framework for Web Document Mining,
DAS04(401-412).
Springer DOI
0505
BibRef
Schenker, A.[Adam],
Last, M.[Mark],
Bunke, H.[Horst],
Kandel, A.[Abraham],
Classification of web documents using a graph model,
ICDAR03(240-244).
IEEE DOI
0311
BibRef
Vitali, F.[Fabio],
di Iorio, A.[Angelo],
Campori, E.V.[Elisa Ventura],
Rule-Based Structural Analysis of Web Pages,
DAS04(425-437).
Springer DOI
0505
BibRef
Hu, J.Y.[Jian-Ying],
Bagga, A.,
Identifying story and preview images in news web pages,
ICDAR03(640-644).
IEEE DOI
0311
BibRef
Ramachandran, S.,
Kashi, R.,
An architecture for ink annotations on web documents,
ICDAR03(256-260).
IEEE DOI
0311
BibRef
Gagneux, A.,
Emptoz, H.,
Web site: a structured document,
ICDAR03(1158-1162).
IEEE DOI
0311
BibRef
Mukherjee, S.,
Yang, G.Z.[Gui-Zhen],
Tan, W.F.[Wen-Fang],
Ramakrishnan, I.V.,
Automatic discovery of semantic structures in HTML documents,
ICDAR03(245-249).
IEEE DOI
0311
BibRef
Alam, H.,
Kumar, A.,
Nakamura, M.,
Rahman, F.,
Tarnikova, Y.,
Wilcox, C.[Che],
Structured and unstructured document summarization: Design of a
commercial summarizer using Lexical chains,
ICDAR03(1147-1152).
IEEE DOI
0311
BibRef
Rahman, F.,
Alam, H.,
A commercial Web based digital library for sharing and distributing
documents,
DIAL04(93-103).
IEEE DOI
0404
BibRef
Alam, H.,
Hartono, R.,
Kumar, A.,
Rahman, F.,
Tarnikova, Y.,
Wilcox, C.[Che],
Web page summarization for handheld devices: a natural language
approach,
ICDAR03(1153-1158).
IEEE DOI
0311
BibRef
Rahman, A.F.R.,
Alam, H.,
Hartono, R.,
Ariyoshi, K.,
Automatic summarization of Web content to smaller display devices,
ICDAR01(1064-1068).
IEEE DOI
0109
BibRef
Serradura, L.,
Slimane, M.,
Vincent, N.,
Web sites thematic classification using hidden Markov models,
ICDAR01(1094-1098).
IEEE DOI
0109
BibRef
Penn, G.,
Hu, J.Y.[Jian-Ying],
Luo, H.B.[Heng-Bin],
McDonald, R.,
Flexible Web document analysis for delivery to narrow-bandwidth devices,
ICDAR01(1074-1078).
IEEE DOI
0109
BibRef
Anjewierden, A.,
AIDAS: incremental logical structure discovery in PDF documents,
ICDAR01(374-378).
IEEE DOI
0109
BibRef
Athitsos, V.,
Swain, M.J.,
Frankel, C.,
Distinguishing photographs and graphics on the World Wide Web,
CBAIVL97(10).
IEEE DOI
9706
BibRef
Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Document Retrieval Systems, Databases and Issues, Libraries .