| _ | text | _ |
| 2D and 3D Video Scene | text | Classification |
| 2LSPE: 2D Learnable Sinusoidal Positional Encoding using Transformer for Scene | text | Recognition |
| 360PanT: Training-Free | text | -Driven 360-Degree Panorama-to-Panorama Translation |
| 3D Highlighter: Localizing Regions on 3D Shapes via | text | Descriptions |
| 3D Human Motion Generation from the | text | Via Gesture Action Classification and the Autoregressive Model |
| 3D-Aware | text | -Driven Talking Avatar Generation |
| 3D-SceneDreamer: | text | -Driven 3D-Consistent Scene Generation |
| 3D-VisTA: Pre-trained Transformer for 3D Vision and | text | Alignment |
| 4D-fy: | text | -to-4D Generation Using Hybrid Score Distillation Sampling |
| A-STAR: Test-time Attention Segregation and Retention for | text | -to-image Synthesis |
| ABCNet v2: Adaptive Bezier-Curve Network for Real-Time End-to-End | text | Spotting |
| ABCNet: Real-Time Scene | text | Spotting With Adaptive Bezier-Curve Network |
| ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene | text | Spotting |
| Ablating Concepts in | text | -to-Image Diffusion Models |
| Accurate Arbitrary-Shaped Scene | text | Detection via Iterative Polynomial Parameter Regression |
| Accurate Detection for Scene | text | s with a Cascaded CNN Networks |
| Accurate Scene | text | Detection Through Border Semantics Awareness and Bootstrapping |
| Accurate Scene | text | Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint |
| Accurate Scene | text | Recognition Based on Recurrent Neural Network |
| Accurate Scene | text | Recognition with Efficient Model Scaling and Cloze Self-Distillation |
| Accurate Segmentation-Based Scene | text | Detector with Context Attention and Repulsive Text Border, An |
| Accurate Segmentation-Based Scene | text | Detector with Context Attention and Repulsive Text Border, An |
| Accurate | text | localization in images based on SVM output scores |
| Accurate Threshold Insensitive Kernel Detector for Arbitrary Shaped | text | , An |
| Accurate video | text | detection through classification of low and high contrast images |
| Accurate, data-efficient, unconstrained | text | recognition with convolutional neural networks |
| ACE: Anti-Editing Concept Erasure in | text | -to-Image Models |
| Acquire and then Adapt: Squeezing out | text | -to-Image Model for Image Restoration |
| ActBERT: Learning Global-Local Video- | text | Representations |
| Active Collection of Land Cover Sample Data from Geo-Tagged Web | text | s |
| Active Contours Network to Straighten Distorted | text | Lines |
| active learning approach to frequent itemset-based | text | clustering, An |
| Active Learning With Complementary Sampling for Instructing Class-Biased Multi-Label | text | Emotion Classification |
| Activity Recognition Applications from Con | text | ual Video-Text Fusion |
| ActivityCLIP: Enhancing group activity recognition by mining complementary information from | text | to supplement image modality |
| Actor and Action Modular Network for | text | -Based Video Segmentation |
| AdaBoost for | text | Detection in Natural Scene |
| Adapting Style and Content for Attended | text | Sequence Recognition |
| Adapting | text | -to-Image Generation with Feature Difference Instruction for Generic Image Restoration |
| Adaptive Algorithm for | text | Detection from Natural Scenes, An |
| Adaptive Boundary Proposal Network for Arbitrary Shape | text | Detection |
| Adaptive Correlation Filtering Method for | text | -Based Person Search, An |
| Adaptive Fuzzy | text | Segmentation in Images with Complex Backgrounds Using Color and Texture |
| Adaptive fuzzy wavelet algorithm for | text | -independent speaker recognition |
| Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured | text | |
| Adaptive Latent Graph Representation Learning for Image- | text | Matching |
| Adaptive method for multi colored | text | binarization |
| Adaptive multi- | text | union for stable text-to-image synthesis learning |
| Adaptive multi- | text | union for stable text-to-image synthesis learning |
| Adaptive Offline Quintuplet Loss for Image- | text | Matching |
| Adaptive Region Growing Color Segmentation for | text | Using Irregular Pyramid |
| Adaptive Scene | text | Detection Based on Transferring Adaboost |
| Adaptive scene- | text | binarisation on images captured by smartphones |
| Adaptive Script-Independent Block-Based | text | Line Extraction, An |
| Adaptive Script-Independent | text | Line Extraction |
| Adaptive | text | Recognition Through Visual Matching |
| Adding Conditional Control to | text | -to-Image Diffusion Models |
| Addressing Information Inequality for | text | -Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments |
| ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene | text | Detection |
| Advance One-Shot Multispectral Instance Detection With | text | 's Supervision |
| Advancing Zero-Shot Digital Human Quality Assessment Through | text | -Prompted Evaluation |
| Adversarial and Isotropic Gradient Augmentation for Image Retrieval With | text | Feedback |
| Adversarial Attribute- | text | Embedding for Person Search With Natural Language Query |
| Adversarial learning based attentional scene | text | recognizer |
| Adversarial Representation Learning for | text | -to-Image Matching |
| Adversarial Robustification via | text | -to-image Diffusion Models |
| Adversarial Synthesis of Human Pose from | text | |
| Adversarial | text | to Continuous Image Generation |
| Adversarial Training Lattice LSTM for Named Entity Recognition of Rail Fault | text | s |
| ADVMIX: Data Augmentation for Accurate Scene | text | Spotting |
| Ae | text | spotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting |
| AEA-FIRM: Adaptive Elastic Alignment with Fine-Grained Representation Mining for | text | -Based Aerial Pedestrian Retrieval |
| Aesthetic | text | Logo Synthesis via Content-aware Layout Inferring |
| Affective Image Editing: Shaping Emotional Factors via | text | Descriptions |
| Affective Image Filter: Reflecting Emotions from | text | to Images |
| Agent-Based Control Prompt Tuning for Video- | text | Retrieval |
| Aggregating Image and | text | Quantized Correlated Components |
| Aggregating Local and Global | text | Features for Linguistic Steganalysis |
| Aggregating Local Con | text | for Accurate Scene Text Detection |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of | text | -to-Video Generation with LMM |
| AITTI: Learning Adaptive Inclusive Token for | text | -to-Image Generation |
| Aletheia: An Advanced Document Layout and | text | Ground-Truthing System for Production Environments |
| Algorithm for Colour-Based Natural Scene | text | Segmentation, An |
| Algorithm for Matching OCR-Generated | text | Strings, An |
| Algorithm for Reducing | text | Line Candidates of Incorrect Orientation, An |
| Algorithm for | text | page up/down orientation determination |
| Algorithms for compressing compound document images with large | text | /background overlap |
| ALIF: A dataset for Arabic embedded | text | recognition in TV broadcast |
| Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With | text | Feedback |
| Align Your Gaussians: | text | -to-4D with Dynamic 3D Gaussians and Composed Diffusion Models |
| Aligning | text | and Document Illustrations: Towards Visually Explainable Digital Humanities |
| Aligning | text | -to-Image Diffusion Models With Constrained Reinforcement Learning |
| AlignIT: Enhancing Prompt Alignment in Customization of | text | -to-Image Models |
| Alignment and Generation Adapter for Efficient Video- | text | Understanding |
| Alignment of Curved | text | Strings for Enhanced OCR Readability |
| Alignment of free layout color | text | s for character recognition |
| Alignment of Paragraphs in Bilingual | text | s Using Bilingual Dictionaries and Dynamic Programming |
| All You Need Is a Second Look: Towards Arbitrary-Shaped | text | Detection |
| ALR-GAN: Adaptive Layout Refinement for | text | -to-Image Synthesis |
| alternative framework for univariate filter based feature selection for | text | categorization, An |
| ALTID : Arabic/Latin | text | Images Database for recognition research |
| AMITA: Attribute-Guided Masked Image- | text | Alignment for Multi-Label Image Representation |
| AMO Sampler: Enhancing | text | Rendering with Overshooting |
| Analysis of Features and Metrics for Alignment in | text | -Dependent Voice Conversion |
| Analysis of the Novel Transformer Module Combination for Scene | text | Recognition |
| Analytical evaluation of term weighting schemes for | text | categorization |
| anchor-free region proposal network for Faster R-CNN-based | text | detection approaches, An |
| Ancient document analysis based on | text | line extraction |
| AniClipart: Clipart Animation with | text | -to-Video Priors |
| Animatabledreamer: | text | -guided Non-rigid 3d Model Generation and Reconstruction with Canonical Score Distillation |
| AniMo: Species-Aware Model for | text | -Driven Animal Motion Generation |
| Annotated Databases for the Recognition of Screen-Rendered | text | |
| ANNP: a neural network parser for real world | text | s |
| Anonymizing Temporal Phrases in Natural Language | text | to be Posted on Social Networking Services |
| Anti-DreamBooth: Protecting users from personalized | text | -to-image synthesis |
| Anycontrol: Create Your Artwork with Versatile Control on | text | -to-image Generation |
| AnyFace++: A Unified Framework for Free-Style | text | -to-Face Synthesis and Manipulation |
| AnyFace: Free-style | text | -to-Face Synthesis and Manipulation |
| AON: Towards Arbitrarily-Oriented | text | Recognition |
| Application of autoregressive models to the study of the temporal structure of a handwritten | text | |
| Application of Cluster Detection to | text | and Picture Processing, An |
| Application of Novel Chaotic Neural Networks to | text | Classification Based on PCA |
| Application of Planar Motion Segmentation for Scene | text | Extraction |
| Apply Hierarchical-Chain-of-Generation to Complex Attributes | text | -to-3D Generation |
| Applying GIS and | text | Mining Methods to Twitter Data to Explore the Spatiotemporal Patterns of Topics of Interest in Kuwait |
| Applying the conjugate gradient method for | text | document categorization |
| approach for detecting and cleaning of struck-out handwritten | text | , An |
| approach for handwritten Chinese | text | recognition unifying character segmentation and recognition, An |
| Approach for Recognizing | text | Labels in Raster Maps, An |
| approach to extracting the target | text | line from a document image captured by a pen scanner, An |
| approach to get overall emotion from comment | text | towards a certain image uploaded to social network using Latent Semantic Analysis, An |
| Approximate String Match for Garbled | text | with Various Accuracies, An |
| Arabic character recognition system: A statistical approach for recognizing cursive typewritten | text | |
| Arabic hand-written | text | -line extraction |
| Arabic handwritten | text | s clusterization based on Feature Relation Graph (FRG) |
| Arabic ligatures: Analysis and application in | text | recognition |
| Arabic | text | detection in videos using neural and boosting-based approaches: Application to video indexing |
| Arbitrarily oriented | text | detection using geodesic distances between corners and skeletons |
| Arbitrarily Shaped Scene | text | Detection With a Mask Tightness Text Detector |
| Arbitrarily Shaped Scene | text | Detection With a Mask Tightness Text Detector |
| Arbitrarily shaped scene | text | detection with dynamic convolution |
| Arbitrarily-Oriented | text | Detection in Low Light Natural Scene Images |
| Arbitrary Shape Scene | text | Detection With Adaptive Text Region Representation |
| Arbitrary Shape Scene | text | Detection With Adaptive Text Region Representation |
| Arbitrary Shape | text | Detection using Transformers |
| Arbitrary Shape | text | Detection via Boundary Transformer |
| Arbitrary Shape | text | Detection via Segmentation with Probability Maps |
| Arbitrary Style Guidance for Enhanced Diffusion-Based | text | -to-Image Generation |
| Arbitrary-Oriented Scene | text | Detection via Rotation Proposals |
| Arbitrary-Shape Scene | text | Detection via Visual-Relational Rectification and Contour Approximation |
| architecture for handwritten | text | recognition systems, An |
| Are 2D-LSTM really dead for offline | text | recognition? |
| Are All Combinations Equal? Combining | text | ual and Visual Features with Multiple Space Learning for Text-based Video Retrieval |
| Are Digraphs Good for Free- | text | Keystroke Dynamics? |
| Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection in | text | |
| ARES: | text | -Driven Automatic Realistic Simulator for Autonomous Traffic |
| ARRPNGAN: | text | -to-image GAN with attention regularization and region proposal networks |
| ArtAdapter: | text | -to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation |
| ArtGlyphDiffuser: | text | -driven artistic glyph generation via Style-to-CLIP Projection and Multi-Level Controlled diffusion |
| ARTIST: Improving the Generation of | text | -Rich Images with Disentangled Diffusion Models and Large Language Models |
| Artistic Style Transfer via Fine-Grained | text | Guidance and Contrastive Semantics Similarity |
| ART•V: Auto-Regressive | text | -to-Video Generation with Diffusion Models |
| ASAYAR: A Dataset for Arabic-Latin Scene | text | Localization in Highway Traffic Panels |
| Assessing Affective Dimensions of Play in Psychodynamic Child Psychotherapy via | text | Analysis |
| Assessing Image and | text | Generation with Topological Analysis and Fuzzy Logic |
| Assessing similarity in handwritten | text | s |
| Assessing User Experience of | text | Readability with Eye Tracking in Virtual Reality |
| Assistive | text | Reading from Complex Background for Blind Persons |
| Associating | text | and graphics for scientific chart understanding |
| ASTER: An Attentional Scene | text | Recognizer with Flexible Rectification |
| ASTS: A Unified Framework for Arbitrary Shape | text | Spotting |
| Asymmetric Cross-Scale Alignment for | text | -Based Person Search |
| ATA: Adaptive Transformation Agent for | text | -Guided Subject-Position Variable Background Inpainting |
| ATM: Attentional | text | Matting |
| AToM: Aligning | text | -to-Motion Model at Event-Level with GPT-4Vision Reward |
| ATT3D: Amortized | text | -to-3D Object Synthesis |
| Attend, Correct and Focus: A Bidirectional Correct Attention Network for Image- | text | Matching |
| Attention Calibration for Disentangled | text | -to-Image Personalization |
| Attention Guidance by Cross-Domain Supervision Signals for Scene | text | Recognition |
| attention-based row-column encoder-decoder model for | text | recognition in Japanese historical documents, An |
| Attention-Bridged Modal Interaction for | text | -to-Image Generation |
| Attentionhand: | text | -driven Controllable Hand Image Generation for 3d Hand Reconstruction in the Wild |
| AttnGAN: Fine-Grained | text | to Image Generation with Attentional Generative Adversarial Networks |
| Attribute-Centric Compositional | text | -to-Image Generation |
| Attribute-Centric Cross-Modal Alignment for Weakly Supervised | text | -Based Person Re-ID |
| AttriDiffuser: Adversarially enhanced diffusion model for | text | -to-facial attribute image synthesis |
| AttT2M: | text | -Driven Human Motion Generation with Multi-Perspective Attention Mechanism |
| Audio Visual Segmentation through | text | Embeddings |
| Audio-Enhanced | text | -to-Video Retrieval using Text-Conditioned Feature Alignment |
| Audio-Enhanced | text | -to-Video Retrieval using Text-Conditioned Feature Alignment |
| Authenticating Binary | text | Documents Using a Localising OMAC Watermark Robust to Printing and Scanning |
| Automated cartographic | text | placement |
| Automated Detection of Adverse Drug Events from Older Patients' Electronic Medical Records Using | text | Mining |
| automatic algorithm for | text | skew estimation in document images using recursive morphological transforms, An |
| Automatic annotation of unique locations from video and | text | |
| Automatic Chinese | text | Classification Using Character-Based and Word-Based Approach |
| Automatic Concept Discovery from Parallel | text | and Visual Corpora |
| Automatic Detection and Localization of Natural Scene | text | in Video |
| Automatic detection and recognition of Korean | text | in outdoor signboard images |
| Automatic diacritization of Arabic | text | using recurrent neural networks |
| Automatic discrimination of | text | and non-text natural images |
| Automatic discrimination of | text | and non-text natural images |
| Automatic document classification using | text | and images |
| Automatic dottization of Arabic | text | (Rasms) using deep recurrent neural networks |
| Automatic extraction of correlation-entropy features for | text | document analysis directly in run-length compressed domain |
| Automatic Feature Extraction and | text | Recognition From Scanned Topographic Maps |
| Automatic identification and skew estimation of | text | lines in real scene images |
| Automatic Identification of | text | in Digital Video Key Frames |
| Automatic image- | text | alignment for large-scale web image indexing and retrieval |
| Automatic Inpainting Scheme for Video | text | Detection and Removal |
| Automatic Labeling for Scene | text | Database |
| Automatic news video segmentation and categorization based on closed-captioned | text | |
| Automatic performance evaluation for video | text | detection |
| Automatic Performance Evaluation Protocol for Video | text | Detection Algorithms, An |
| Automatic recognition of printed arabic | text | using neural network classifier |
| Automatic Recognition of Printed Farsi | text | s |
| Automatic Segmentation of Printed Persian (Farsi) | text | |
| Automatic segmentation of the IAM off-line database for handwritten English | text | |
| Automatic separation of machine-printed and hand-written | text | lines |
| Automatic | text | area segmentation in natural images |
| Automatic | text | Detection and Recognition |
| Automatic | text | detection and removal in video sequences |
| Automatic | text | Detection and Tracking in Digital Video |
| Automatic | text | detection for mobile augmented reality translation |
| Automatic | text | Extraction from Arabic Newspapers |
| Automatic | text | Extraction from Video for Content-Based Annotation and Retrieval |
| Automatic | text | Extraction in Digital Video Based on Motion Analysis |
| Automatic | text | Location in Images and Video Frames |
| Automatic | text | location in natural scene images |
| Automatic | text | location using cluster-based template matching |
| Automatic | text | processing |
| Automatic | text | segmentation from complex background |
| Automatic tracing and extraction of | text | -line and word segments directly in JPEG compressed document images |
| Automatic writer identification from | text | line images |
| Autonomous Document Cleaning: A Generative Approach to Reconstruct Strongly Corrupted Scanned | text | s |
| Autonomous | text | Capturing Robot Using Improved DCT Feature and Text Tracking |
| Autonomous | text | Capturing Robot Using Improved DCT Feature and Text Tracking |
| AutoSplice: A | text | -prompt Manipulated Image Dataset for Media Forensics |
| Autostr: Efficient Backbone Search for Scene | text | Recognition |
| Auxiliary captioning: Bridging image- | text | matching and image captioning |
| AvatarCraft: Transforming | text | into Neural Human Avatars with Parameterized Shape and Pose Control |
| AvatarStudio: High-Fidelity and Animatable 3D Avatar Creation from | text | |
| Awesome Typography: Statistics-Based | text | Effects Transfer |
| Background-Insensitive Scene | text | Recognition with Text Semantic Segmentation |
| Background-Insensitive Scene | text | Recognition with Text Semantic Segmentation |
| Bag of Embedded Words learning for | text | retrieval |
| Bag of features approach for offline | text | -independent Chinese writer identification |
| Balancing Optimization Strategies and Practical Goals: An Efficient Scene | text | Detector |
| BAMG: | text | -based Person Re-identification via Bottlenecks Attention and Masked Graph Modeling |
| Baseline detection of multi-lingual unconstrained handwritten | text | lines |
| BATINeT: Background-Aware | text | to Image Synthesis and Manipulation Network |
| Bayesian Similarity Model Estimation for Approximate Recognized | text | Search |
| Bayesian Super-Resolution of | text | in Video with a Text-Specific Bimodal Prior |
| Bayesian Super-Resolution of | text | in Video with a Text-Specific Bimodal Prior |
| Bayesian-based method of unconstrained handwritten offline Chinese | text | line recognition, A |
| BDNet: A BERT-based dual-path network for | text | -to-image cross-modal person re-identification |
| Be Yourself: Bounded Attention for Multi-subject | text | -to-image Generation |
| Beatrix: A Self-Learning System for Off-Line Recognition of Handwritten | text | s |
| Being Comes from Not-Being: Open-Vocabulary | text | -to-Motion Generation with Wordless Training |
| Belief Mining in Persian | text | s Based on Deep Learning and Users' Opinions |
| Benchmark for Chinese-English Scene | text | Image Super-resolution, A |
| Benchmark for Controllable | text | -Image-to-Video Generation, A |
| Benchmarking Robustness to | text | -Guided Corruptions |
| better fitness measure of a | text | -document for a given set of keywords, A |
| Beyond Coarse-grained Matching in Video- | text | Retrieval |
| Beyond One and Two Tower: Cross-Modal Consensus Learning for Image- | text | Retrieval |
| Beyond | text | QA: Multimedia Answer Generation by Harvesting Web Information |
| Beyond | text | : Frozen Large Language Models in Visual Signal Comprehension |
| Beyond verbs: Understanding actions in videos with | text | |
| Beyond visual semantics: Exploring the role of scene | text | in image understanding |
| Bi-Attention enhanced representation learning for image- | text | matching |
| Bi-Directional Image- | text | Retrieval With Position Attention and Similarity Filtering |
| Bi-Directional Spatial-Semantic Attention Networks for Image- | text | Matching |
| Bi-directional Training for Composed Image Retrieval via | text | Prompt Learning |
| Bi-modal Handwritten | text | Corpus: Baseline Results, A |
| Bi-modal Handwritten | text | Recognition (BiHTR) ICPR 2010 Contest Report |
| Bi-tonal image non- | text | matter removal with run length and connected component analysis |
| Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for | text | Guided Medical Image Segmentation |
| Bidirectional extraction and recognition of scene | text | with layout consistency |
| Bilevel Feature Extraction-Based | text | Mining for Fault Diagnosis of Railway Systems |
| Bilingual | text | Classification |
| Bilingual, Open World Video | text | Dataset and Real-Time Video Text Spotting With Contrastive Learning, A |
| Bilingual, Open World Video | text | Dataset and Real-Time Video Text Spotting With Contrastive Learning, A |
| BiLMa: Bidirectional Local-Matching for | text | -based Person Re-identification |
| Bimodal beta mixture distribution for enhanced OOD inner-differentiation in multi-class | text | classification |
| Binarization and cleanup of handwritten | text | from carbon copy medical form images |
| Binarization of low quality | text | using a Markov random field model |
| Binarization-Free Clustering Approach to Segment Curved | text | Lines in Historical Manuscripts, A |
| Binary | text | image compression using overlapping rectangular partitioning |
| Binary | text | image file preprocessing to account for printer dot gain |
| Biometric Recognition Based on Free- | text | Keystroke Dynamics |
| Biometric recognition using online uppercase handwritten | text | |
| Bipartite Graph Coarsening for | text | Classification Using Graph Neural Networks |
| BiSeR-LMA: A Bidirectional Semantic Reasoning and Large Model Enhancement Approach for | text | -Video Cross-Modal Retrieval |
| BizGen: Advancing Article-level Visual | text | Rendering for Infographics Generation |
| Blended Diffusion for | text | -driven Editing of Natural Images |
| Blending-NeRF: | text | -Driven Localized Editing in Neural Radiance Fields |
| Blind Deblurring of | text | Images Using a Text-Specific Hybrid Dictionary |
| Blind Deblurring of | text | Images Using a Text-Specific Hybrid Dictionary |
| Blind deblurring | text | images via Beltrami regularization |
| blind deconvolution model for scene | text | detection and recognition in video, A |
| Blind Source Separation Techniques for Detecting Hidden | text | s and Textures in Document Images |
| Blind | text | images deblurring based on a generative adversarial network |
| BlobGEN-Vid: Compositional | text | -to-Video Generation with Blob Video Representations |
| Block Segmentation and | text | Extraction in Mixed Text/Image Documents |
| Block Segmentation and | text | Extraction in Mixed Text/Image Documents |
| BLSTM-based handwritten | text | recognition using Web resources |
| BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both | text | and Audio Inputs |
| Boosting SpLSA for | text | Classification |
| Boosting | text | -To-Image Person Re-Identification With Generative Hard Negative |
| Boosting Weakly-Supervised Temporal Action Localization with | text | Information |
| Boosting-based transductive learning for | text | detection |
| Bootstrapping | text | Recognition from Stop Words |
| Bordernet: An Efficient Border-attention | text | Detector |
| BOTH2Hands: Inferring 3D Hands from Both | text | Prompts and Body Dynamics |
| Bottom-Up Scene | text | Detection with Markov Clustering Networks |
| Boundary | text | Spotter: Toward Arbitrary-Shaped Scene Text Spotting |
| Boundary-Aware Arbitrary-Shaped Scene | text | Detector With Learnable Embedding Network |
| Box It to Bind It: Unified Layout Control and Attribute Binding in | text | -to-Image Diffusion Models |
| BoxDiff: | text | -to-Image Synthesis with Training-Free Box-Constrained Diffusion |
| Breaking | text | -Based CAPTCHA with Sparse Convolutional Neural Networks |
| Breaking | text | -based CAPTCHAs with variable word and character orientation |
| Breaking The Limits of | text | -conditioned 3D Motion Synthesis with Elaborative Descriptions |
| BreakingNews: Article Annotation by Image and | text | Processing |
| Breathing Life Into Sketches Using | text | -to-Video Priors |
| Bridge-GAN: Interpretable Representation Learning for | text | -to-Image Synthesis |
| Bridging Different Language Models and Generative Vision Models for | text | -to-image Generation |
| Bridging Synthetic and Real Worlds for Pre-training Scene | text | Detectors |
| Bridging the Gap Between Audio and | text | Using Parallel-Attention for User-Defined Keyword Spotting |
| Bridging the Gap Between End-to-End and Two-Step | text | Spotting |
| Bridging Video and | text | : A Two-Step Polishing Transformer for Video Captioning |
| Bridging Video- | text | Retrieval with Multiple Choice Questions |
| BRsyn-Caps: Chinese | text | Classification Using Capsule Network Based on Bert and Dependency Syntax |
| BTS: A Bi-lingual Benchmark for | text | Segmentation in the Wild |
| Building compact recognizer with recognition rate maintained for on-line handwritten Japanese | text | recognition |
| Building | text | features for object image classification |
| BURSTS: A bottom-up approach for robust spotting of | text | s in scenes |
| ByTheWay: Boost Your | text | -to-Video Generation Model to Higher Quality in a Training-free Way |
| C-CLIP: Contrastive Image- | text | Encoders to Close the Descriptive-Commentative Gap |
| C-Net: A Compression-Based Lightweight Network for Machine-Generated | text | Detection |
| C4Synth: Cross-Caption Cycle-Consistent | text | -to-Image Synthesis |
| Cache-aided cross-modal correlation correction for unsupervised cross-domain | text | -based person search |
| CAETFN: Con | text | Adaptively Enhanced Text-Guided Fusion Network for Multimodal Sentiment Analysis |
| CAMEL: CAusal Motion Enhancement Tailored for Lifting | text | -Driven Video Editing |
| Camera based degraded | text | recognition using grayscale feature |
| Camera | text | Recognition based on Perspective Invariants |
| Camera-based analysis of | text | and documents: a survey |
| CAMP: Cross-Modal Adaptive Message Passing for | text | -Image Retrieval |
| CamType: assistive | text | entry using gaze with an off-the-shelf webcam |
| Can Generative Adversarial Networks Teach Themselves | text | Segmentation? |
| Can | text | -to-Video Generation help Video-Language Alignment? |
| Canny | text | Detector: Fast and Robust Scene Text Localization Algorithm |
| Canny | text | Detector: Fast and Robust Scene Text Localization Algorithm |
| Cap4Video: What Can Auxiliary Captions Do for | text | -Video Retrieval? |
| Capacity of | text | Marking Channel |
| CapsFusion: Rethinking Image- | text | Data at Scale |
| Caption | text | extraction for indexing purposes using a hierarchical region-based image model |
| Caption | text | recognition in video frames by MAP matching |
| cascade detector for | text | detection in natural scene images, A |
| Cascaded Segmentation-Detection Networks for | text | -Based Traffic Sign Detection |
| Caseg: CLIP-Based Action Segmentation with Learnable | text | Prompt |
| CAT-TPT: Class-Agnostic | text | -based Test-time Prompt Tuning for Vision-Language Models |
| CatVersion: Concatenating Embeddings for Diffusion-Based | text | -to-Image Personalization |
| Causality-Driven Explainable Multimodal Fusion With Visual- | text | Parallel Computing for Cloth-Changing Pedestrian Re-Identification |
| CBNet: A Plug-and-Play Network for Segmentation-Based Scene | text | Detection |
| CCDPlus: Towards Accurate Character to Character Distillation for | text | Recognition |
| CDistNet: Perceiving Multi-domain Character Distance for Robust | text | Recognition |
| CD | text | : Scene text detector based on context-aware deformable transformer |
| CE- | text | : A context-Aware and embedded text detector in natural scene images |
| CE- | text | : A context-Aware and embedded text detector in natural scene images |
| CelebV- | text | : A Large-Scale Facial Text-Video Dataset |
| CelebV- | text | : A Large-Scale Facial Text-Video Dataset |
| Center | text | Spotter: A Novel Text Spotter for Autonomous Unmanned Vehicles |
| CFOR: Character-First Open-Set | text | Recognition via Context-Free Learning |
| CGNN: Caption-assisted graph neural network for image- | text | retrieval |
| Challenges in Content-Based Image Indexing of Cultural Heritage Collections: Support vector machine active learning with applications to | text | classification |
| Character Energy and Link Energy-Based | text | Extraction in Scene Images |
| Character extraction in web image for | text | recognition |
| Character feature Alignment-based scene | text | spotter |
| Character Grounding and Re-identification in Story of Videos and | text | Descriptions |
| Character Position-Aware Compression Framework for Screen | text | Image, A |
| Character Region Attention for | text | Spotting |
| Character Region Awareness for | text | Detection |
| Character Segmentation of Handwritten Bangla | text | by Vertex Characterization of Isothetic Covers |
| Character Segmenting Techniques for Handwritten | text | : A Survey |
| Character-Aware Sampling and Rectification for Scene | text | Recognition |
| Character-Level Interaction in Computer-Assisted Transcription of | text | Images |
| Character-Level Interaction in Multimodal Computer-Assisted Transcription of | text | Images |
| Character-like region verification for extracting | text | in scene images |
| Character-Position-Free On-Line Handwritten Japanese | text | Recognition by Two Segmentation Methods |
| Character-Stroke Detection for | text | -Localization and Extraction |
| Characterization and classification of semantic image- | text | relations |
| Characterness: An Indicator of | text | in the Wild |
| Chat-edit-3d: Interactive 3d Scene Editing via | text | Prompts |
| ChatGen: Automatic | text | -to-Image Generation From FreeStyle Chatting |
| ChatTraffic: | text | -to-Traffic Generation via Diffusion Model |
| Check, Locate, Rectify: A Training-Free Layout Calibration System for | text | - to- Image Generation |
| Chinese Street View | text | : Large-Scale Chinese Text Reading With Partially Supervised Learning |
| Chinese Street View | text | : Large-Scale Chinese Text Reading With Partially Supervised Learning |
| Chinese | text | distinction and font identification by recognizing most frequently used characters |
| Chinese | text | Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning |
| Chinese/Kanji | text | and Data Processing |
| Choose What You Need: Disentangled Representation Learning for Scene | text | Recognition, Removal and Editing |
| Circle | text | Expansion as Low-Rank Textures |
| CiteTracker: Correlating Image and | text | for Visual Tracking |
| CKD: Cross-Task Knowledge Distillation for | text | -to-Image Synthesis |
| Class dependent feature scaling method using naive Bayes classifier for | text | datamining |
| Class-agnostic Object Counting with | text | -to-image Diffusion Model |
| Class-Aware Mask-guided feature refinement for scene | text | recognition |
| Class-Balanced | text | to Image Synthesis With Attentive Generative Adversarial Network |
| Class-dependent projection based method for | text | categorization |
| Classification Architecture Based on Connected Components for | text | Detection in Unconstrained Environments, A |
| Classification of Machine Printed and Handwritten | text | s Using Character Block Layout Variance |
| Classification of Noisy Free- | text | Prostate Cancer Pathology Reports Using Natural Language Processing |
| Classification of | text | Documents |
| Classification of | text | documents based on score level fusion approach |
| Classification with reject option in | text | categorisation systems |
| Classifying networked | text | data with positive and unlabeled examples |
| CLEval: Character-Level Evaluation for | text | Detection and Recognition Tasks |
| CLIP is Almost All You Need: Towards Parameter-Efficient Scene | text | Retrieval without OCR |
| CLIP is Also an Efficient Segmenter: A | text | -Driven Approach for Weakly Supervised Semantic Segmentation |
| CLIP-Actor: | text | -Driven Recommendation and Stylization for Animating Human Meshes |
| CLIP-Driven Fine-Grained | text | -Image Person Re-Identification |
| CLIP-Event: Connecting | text | and Images with Event Structures |
| CLIP-Forge: Towards Zero-Shot | text | -to-Shape Generation |
| CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable | text | -to-Image Synthesis |
| CLIP-NeRF: | text | -and-Image Driven Manipulation of Neural Radiance Fields |
| CLIP2GAN: Toward Bridging | text | With the Latent Space of GANs |
| CLIP2Protect: Protecting Facial Privacy Using | text | -Guided Makeup via Adversarial Latent Search |
| Clip2Sam: Enhanced End-to-End | text | -to-Image Segmentation and Image Diffusion System |
| CLIPAG: Towards Generator-Free | text | -to-Image Generation |
| CLIPDraw++: | text | -to-Sketch Synthesis with Simple Primitives |
| CLIPstyler: Image Style Transfer with a Single | text | Condition |
| CLIPTER: Looking at the Bigger Picture in Scene | text | Recognition |
| CLIPtone: Unsupervised Learning for | text | -Based Image Tone Adjustment |
| Cloud of Line Distribution and Random Forest Based | text | Detection from Natural/Video Scene Images |
| Clustering-Based Approach to the Separation of | text | Strings from Mixed Text/Graphics Documents, A |
| Clustering-Based Approach to the Separation of | text | Strings from Mixed Text/Graphics Documents, A |
| CM-Net: Concentric Mask Based Arbitrary-Shaped | text | Detection |
| CMA-CLIP: Cross-Modality Attention Clip for | text | -Image Classification |
| CMFG: Cross-model Fine-grained Feature Interaction for | text | -video Retrieval |
| CMMLoc: Advancing | text | -to-PointCloud Localization with Cauchy-Mixture-Model Based Framework |
| CMPD: Using Cross Memory Network With Pair Discrimination for Image- | text | Retrieval |
| CMT-CO: Contrastive Learning with Character Movement Task for Handwritten | text | Recognition |
| CNN for | text | Detection, Convolutional Neural Network |
| CNN-based | text | image super-resolution tailored for OCR |
| CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video- | text | Dataset |
| coarse-to-fine scene | text | detection method based on Skeleton-cut detector and Binary-Tree-Search based rectification, A |
| Code-Mixing and Code-Switching on Social Media | text | : A Brief Survey |
| CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image- | text | Retrieval |
| Coding with ASCII: compact, yet | text | -based 3D content |
| Cognition Transferring and Decoupling for | text | -Supervised Egocentric Semantic Segmentation |
| Cognitive Themes Emerging from Air Photo Interpretation | text | s Published to 1960 |
| Cogview3: Finer and Faster | text | -to-image Generation via Relay Diffusion |
| Collaborative Vision- | text | Representation Optimizing for Open-vocabulary Segmentation |
| Color Based Image Segmentation and its Application to | text | Segmentation, A |
| Color segmentation for | text | extraction |
| Color structure recovering in strong specular | text | regions |
| Color | text | extraction from camera-based images: The impact of the choice of the clustering distance |
| Color | text | extraction with selective metric-based clustering |
| Color | text | image binarization based on binary texture analysis |
| Coloring with Words: Guiding Image Colorization Through | text | -Based Palette Generation |
| Colour | text | segmentation in web images based on human perception |
| colour | text | /graphics separation based on a graph representation, A |
| Combination of global and local con | text | s for text/non-text classification in heterogeneous online handwritten documents |
| Combination of global and local con | text | s for text/non-text classification in heterogeneous online handwritten documents |
| combined Convolutional Neural Network and Dynamic Programming approach for | text | line normalization, A |
| Combined orientation and skew detection using geometric | text | -line modeling |
| Combining Deep and Ad-hoc Solutions to Localize | text | Lines in Ancient Arabic Document Images |
| Combining diverse on-line and off-line systems for handwritten | text | line recognition |
| Combining diverse systems for handwritten | text | line recognition |
| Combining HMM classifiers in a handwritten | text | recognition system |
| Combining Statistical Measures to Find Image | text | Regions |
| Combining Structure and Parameter Adaptation of HMMs for Printed | text | Recognition |
| Combining | text | and image information in content-based retrieval |
| Combining | text | and prosodic analysis for prominent word detection |
| COME: Clip-OCR and Master ObjEct for | text | image captioning |
| Comic | text | Detection Using Neural Network Approach |
| COMIM-GAN: Improved | text | -to-Image Generation via Condition Optimization and Mutual Information Maximization |
| CoMM: A Coherent Interleaved Image- | text | Dataset for Multimodal Understanding and Generation |
| Commercial Quality | text | : What Does it Take? |
| Commonsense-Guided Semantic and Relational Consistencies for Image- | text | Retrieval |
| comparative study of features for handwritten Bangla | text | recognition, A |
| Comparative Study of HMM and BLSTM Segmentation-Free Approaches for the Recognition of Handwritten | text | -Lines |
| Comparative Study to Evaluate a | text | -Independent Speaker Identification Engine for Arabic Speakers Using a CHMM-Based Approach, A |
| Comparing Data-driven and Phonetic N-gram Systems for | text | -Independent Speaker Verification |
| Comparison of Approaches for Automated | text | Extraction from Scholarly Figures, A |
| Comparison of clustering methods: A case study of | text | -independent speaker modeling |
| Comparison of some thresholding algorithms for | text | /background segmentation in difficult document images |
| Comparison of | text | String Similarity Algorithms for POI Name Harmonisation, A |
| comparison study on multiple binary-class SVM methods for unilabel | text | categorization, A |
| Compass Control: Multi Object Orientation Control for | text | -to-Image Generation |
| Compensating for the Incomplete With the Complete: An Efficient Scene | text | Detector |
| COMPGS: Unleashing 2D Compositionality for Compositional | text | -to-3D via Dynamically Optimizing 3D Gaussians |
| Complementarity-Aware Space Learning for Video- | text | Retrieval |
| complete OCR for printed Hindi | text | in Devanagari script, A |
| Complete Pyramidal Geometrical Scheme for | text | Based Image Description and Retrieval, A |
| Complying with Privacy Legislation: From Legal | text | to Implementation of Privacy-Aware Location-Based Services |
| component-tree based method for user-intention guided | text | extraction, A |
| Components Regulated Generation of Handwritten Chinese | text | -lines in Arbitrary Length |
| Composing Object Relations and Attributes for Image- | text | Matching |
| Composing | text | and Image for Image Retrieval - an Empirical Odyssey |
| Composite Script Identification and Orientation Detection for Indian | text | Images |
| Compositional coding capsule network with k-means routing for | text | classification |
| Compositional Image- | text | Matching and Retrieval by Grounding Entities |
| Compositional Learning of Image- | text | Query for Image Retrieval |
| Compositional Mixture Representations for Vision and | text | |
| comprehensive method for multilingual video | text | detection, localization, and extraction, A |
| comprehensive neural-based approach for | text | recognition in videos using natural language processing, A |
| Comprehensive regional guidance for attention map semantics in | text | -to-image diffusion models |
| comprehensive scheme for tattoo | text | detection, A |
| Comprehensive Study of Decoder-Only LLMs for | text | -to-Image Generation, A |
| comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese | text | recognition, A |
| Comprehensive Survey of Transformers in | text | Recognition: Techniques, Challenges, and Future Directions, A |
| Computational Topology in | text | Mining |
| Computer Assisted Transcription for Ancient | text | Images |
| Computer Assisted Transcription of Handwritten | text | Images |
| Computer Assisted Transcription of | text | Images: Results on the GERMANA Corpus and Analysis of Improvements Needed for Practical Use |
| Computer Interpretation of English | text | and Picture Patterns |
| Con- | text | : Text Detection for Fine-Grained Object Classification |
| Con- | text | : Text Detection for Fine-Grained Object Classification |
| Concept decompositions for short | text | clustering by identifying word communities |
| Concept Weaver: Enabling Multi-Concept Fusion in | text | -to-Image Models |
| ConceptCraft: One-Shot Personalized | text | -to-Image Generation via Object-Background Disentanglement |
| ConceptGuard: Continual Personalized | text | -to-Image Generation with Forgetting and Confusion Mitigation |
| Concepts-Locations-Emotions: Semantic Analysis and Visualization of Climate Change | text | s |
| Conceptual 12M: Pushing Web-Scale Image- | text | Pre-Training To Recognize Long-Tail Visual Concepts |
| Conditional Feature Learning Based Transformer for | text | -Based Person Search |
| Conditional Image- | text | Embedding Networks |
| conditional random field approach for face identification in broadcast news using overlaid | text | , A |
| Conditional random field for | text | segmentation from images with complex background |
| Conditional | text | Image Generation with Diffusion Models |
| Confidence Measures for Error Correction in Interactive Transcription Handwritten | text | |
| Configurable | text | Stamp Identification Tool with Application of Fuzzy Logic |
| CONFORM: Contrast is All You Need For High-Fidelity | text | -to-Image Diffusion Models |
| Connected and Degraded | text | Recognition Using Hidden Markov Model |
| Connected Component Level Discrimination of Handwritten and Machine-Printed | text | Using Eigenfaces |
| Connecting Consistency Distillation to Score Distillation for | text | -to-3d Generation |
| Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned | text | corpora |
| Connecting NeRFs, Images, and | text | |
| Consensus-aware Visual-semantic Embedding for Image- | text | Matching |
| Consistent Partition and Labelling of | text | Blocks |
| Consistent3D: Towards Consistent High-Fidelity | text | -to-3D Generation with Deterministic Sampling Prior |
| Constructing the Discriminative Kernels Using GMM for | text | -Independent Speaker Identification |
| Content and Style Aware Generation of | text | -Line Images for Handwriting Recognition |
| Content Based Image and Video Retrieval Using Embedded | text | |
| Content-based image retrieval with pachinko allocation model and a combination of colour, | text | ure and text features |
| Content-Based Query of Image Databases, Inspirations from | text | Retrieval: Inverted Files, Frequency-based Weights and Relevance Feedback |
| Content-based query of image databases: Inspirations from | text | retrieval |
| Con | text | Driven Text Segmentation and Recognition |
| Con | text | Perception Parallel Decoder for Scene Text Recognition |
| Con | text | Supplied by Text or Language |
| Con | text | -Aware Attention Network for Image-Text Retrieval |
| Con | text | -Aware Hierarchical Transformer for Fine-Grained Video-Text Retrieval |
| Con | text | -aware relation enhancement and similarity reasoning for image-text retrieval |
| Con | text | -Aware Text-Based Binary Image Stylization and Synthesis |
| Con | text | -based text detection in natural scenes |
| Con | text | -CIR: Learning from Concepts in Text for Composed Image Retrieval |
| Con | text | 2Rec: Leveraging comment text semantics and sequential features for enhanced recommendation systems |
| Con | text | ual Text Block Detection Towards Scene Text Understanding |
| Con | text | ual Text Block Detection Towards Scene Text Understanding |
| Con | text | ual text/non-text stroke classification in online handwritten notes with conditional random fields |
| Con | text | ual text/non-text stroke classification in online handwritten notes with conditional random fields |
| Continual Learning for Cross-Modal Image- | text | Retrieval Based on Domain-Selective Attention |
| Continuous approach to segmentation of handwritten | text | |
| Contour Restoration of | text | Components for Recognition in Video/Scene Images |
| contour-based approach to 3D | text | labeling on triangulated surfaces, A |
| Contour-Based Robust Algorithm for | text | Detection in Color Images, A |
| ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene | text | Detection |
| Contra: (con) | text | (tra)nsformer for Cross-modal Video Retrieval |
| Contrastive author-aware | text | clustering |
| Contrastive Denoising Score for | text | -Guided Latent Diffusion Image Editing |
| Contrastive Transformer Learning With Proximity Data Generation for | text | -Based Person Search |
| Contribution of recurrent connectionist language models in improving LSTM-based Arabic | text | recognition in videos |
| Contribution to the Discrimination of the Medieval Manuscript | text | s: Application in the Palaeography |
| Control4D: Efficient 4D Portrait Editing With | text | |
| Controllable Artistic | text | Style Transfer via Shape-Matching GAN |
| Controllable Multi-Lingual Multi-Speaker Multi-Style | text | -to-Speech Synthesis With Multivariate Information Minimization, A |
| Controllable | text | -to-3D Generation via Surface-Aligned Gaussian Splatting |
| Controllable | text | -to-Image Synthesis for Multi-Modality MR Images |
| Controllable Video Generation With | text | -Based Instructions |
| Controlling Human Shape and Pose in | text | -to-Image Diffusion Models via Domain Adaptation |
| Controlnet-xs: Rethinking the Control of | text | -to-image Diffusion Models as Feedback-control Systems |
| Convolutional Neural Network Based | text | Steganalysis |
| Convolutional Neural Network-Based Chinese | text | Detection Algorithm via Text Structure Modeling, A |
| Convolutional Neural Network-Based Chinese | text | Detection Algorithm via Text Structure Modeling, A |
| Convolutional Neural Networks for Direct | text | Deblurring |
| Convolutional Recurrent Neural Network for the Handwritten | text | Recognition of Historical Greek Manuscripts, A |
| Convolutional recurrent neural networks with hidden Markov model bootstrap for scene | text | recognition |
| COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated | text | s |
| CookGAN: Causality Based | text | -to-Image Synthesis |
| Cooperative Game Modeling With Weighted Token-Level Alignment for Audio- | text | Retrieval |
| COPT: Unsupervised Domain Adaptive Segmentation Using Domain-agnostic | text | Embeddings |
| Copyright protection for the electronic distribution of | text | documents |
| Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten | text | |
| Correcting document image warping based on regression of curved | text | lines |
| Correlated Topic Modeling for Short | text | s in Spherical Embedding Spaces |
| CoSER: Towards Consistent Dense Multiview | text | -To-Image Generator for 3D Creation |
| CosmicMan: A | text | -to-Image Foundation Model for Humans |
| CoSMo: Content-Style Modulation for Image Retrieval with | text | Feedback |
| Cost-Effective Adversarial Attacks against Scene | text | Recognition |
| Could scene con | text | be beneficial for scene text detection? |
| Countering Personalized | text | -to-Image Generation with Influence Watermarks |
| Counting Guidance for High Fidelity | text | -to-Image Synthesis |
| Coupled Snakelets for Curled | text | -Line Segmentation from Warped Document Images |
| Coverless Image Steganography Based on Semantic-Controlled | text | -to-Image Generation |
| Cpgan: Content-parsing Generative Adversarial Networks for | text | -to-image Synthesis |
| Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene | text | Spotter |
| Create Your World: Lifelong | text | -to-Image Diffusion |
| Creating generic | text | summaries |
| Creation and Analysis of a Corpus of | text | Rich Indian TV Videos |
| CRF Based Scheme for Overlapping Multi-colored | text | Graphics Separation, A |
| Crime Prediction and Monitoring in Porto, Portugal, Using Machine Learning, Spatial and | text | Analytics |
| CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in | text | -Embedded Images from Russia-Ukraine Conflict |
| Cross Initialization for Face Personalization of | text | -to-Image Models |
| Cross-Dataset Study for | text | -based 3D Human Motion Retrieval, A |
| Cross-Domain detection of AI-Generated | text | : Integrating linguistic richness and lexical pair dispersion via deep learning |
| Cross-Domain Multi-Modal Few-Shot Object Detection via Rich | text | |
| Cross-Lingual | text | Image Recognition via Multi-Hierarchy Cross-Modal Mimic |
| Cross-Lingual | text | Image Recognition via Multi-Task Sequence to Sequence Learning |
| Cross-Modal Adaptive Dual Association for | text | -to-Image Person Retrieval |
| Cross-Modal and Hierarchical Modeling of Video and | text | |
| Cross-Modal Contrastive Learning for | text | -to-Image Generation |
| Cross-modal domain adaptation for | text | -based regularization of image semantics in image retrieval systems |
| Cross-Modal Dynamic Networks for Video Moment Retrieval With | text | Query |
| Cross-Modal Feature Fusion-Based Knowledge Transfer for | text | -Based Person Search |
| Cross-modal feature learning and alignment network for | text | -image person re-identification |
| Cross-Modal Implicit Relation Reasoning and Aligning for | text | -to-Image Person Retrieval |
| Cross-modal independent matching network for image- | text | retrieval |
| Cross-modal knowledge learning with scene | text | for fine-grained image classification |
| Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-Directional | text | -Image Matching |
| Cross-Modal Progressive Perspective Matching Network for Remote Sensing Image- | text | Retrieval |
| Cross-modal Scene Graph Matching for Relationship-aware Image- | text | Retrieval |
| Cross-Modal Semantic Matching Generative Adversarial Networks for | text | -to-Image Synthesis |
| Cross-Modal | text | Steganography Against Synonym Substitution-Based Text Attack |
| Cross-Modal | text | Steganography Against Synonym Substitution-Based Text Attack |
| Cross-Modal Uncertainty Modeling With Diffusion-Based Refinement for | text | -Based Person Retrieval |
| Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and | text | Guidance |
| Crossing the lines: making optimal use of con | text | in line-based Handwritten Text Recognition |
| Crossmodal Translation Based Meta Weight Adaption for Robust Image- | text | Sentiment Analysis |
| Crypto-stego System for Securing | text | and Image Data |
| CSA: Cross-scale alignment with adaptive semantic aggregation and filter for image- | text | retrieval |
| CT-GAN: A conditional Generative Adversarial Network of transformer architecture for | text | -to-image |
| CT-Net: Arbitrary-Shaped | text | Detection via Contour Transformer |
| CTIGEN-CDM: Controlled | text | -to-Image Generation Using Cropped Diffusion Models |
| Ctrl-Room: Controllable | text | -to-3D Room Meshes Generation with Layout Constraints |
| Curriculum learning for printed | text | line recognition of ligature-based scripts |
| Cursive Script, Historical Documents, | text | Line Segmentation, Script Line, Segmentation, Text Line Extraction |
| Cursive Script, Historical Documents, | text | Line Segmentation, Script Line, Segmentation, Text Line Extraction |
| Cursive stroke sequencing for handwritten | text | documents recognition |
| Curved scene | text | detection via transverse and longitudinal sequence connection |
| Customization Assistant for | text | -to-image Generation |
| Customize-a-video: One-shot Motion Customization of | text | -to-video Diffusion Models |
| Customizing 360-Degree Panoramas through | text | -to-Image Diffusion Models |
| CustomListener: | text | -Guided Responsive Interaction for User-Friendly Listening Head Generation |
| CycleMatch: A cycle-consistent embedding network for image- | text | matching |
| DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for | text | -to-Image Generation |
| DAE-GAN: Dynamic Aspect-aware GAN for | text | -to-Image Synthesis |
| DALL-EVAL: Probing the Reasoning Skills and Social Biases of | text | -to-Image Generation Models |
| DART: Disease-aware Image- | text | Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation |
| Data Augmentation for Scene | text | Recognition |
| data base for arabic handwritten | text | recognition research, A |
| Data Embedding in | text | for a Copier System |
| Data-Hiding Capacity Improvement for | text | Watermarking Using Space Coding Method |
| Database for Arabic Handwritten | text | Image Recognition and Writer Identification, A |
| Database for Arabic Printed | text | Recognition Research |
| Database for Handwritten | text | Recognition Research, A |
| Database for Offline Arabic Handwritten | text | Recognition, A |
| dataset for Arabic | text | detection, tracking and recognition in news videos- AcTiV, A |
| Dataset to Support Sexist Content Detection in Arabic | text | , A |
| Datenerf: Depth-aware | text | -based Editing of Nerfs |
| DATID-3D: Diversity-Preserved Domain Adaptation Using | text | -to-Image Diffusion for 3D Generative Model |
| DCDM: Diffusion-conditioned-diffusion Model for Scene | text | Image Super-resolution |
| De-Diffusion Makes | text | a Strong Cross-Modal Interface |
| De-rendering Stylized | text | s |
| Debiased Video- | text | Retrieval via Soft Positive Sample Calibration |
| Debiasing Framework For Attribute Binding In Diffusion-Based | text | -To-Image Generation, A |
| Deblurring | text | Images via L0-Regularized Intensity and Gradient Prior |
| Deep Automated | text | Scoring Model Based on Memory Network |
| Deep Belief Networks Based Toponym Recognition for Chinese | text | |
| Deep BLSTM neural networks for unconstrained continuous handwritten | text | recognition |
| Deep Boosting Learning: A Brand-New Cooperative Approach for Image- | text | Matching |
| Deep Convolutional Deblurring and Detection Neural Network for Localizing | text | in Videos, A |
| Deep correlation for matching images and | text | |
| Deep Cross-Modal Projection Learning for Image- | text | Matching |
| Deep Direct Regression for Multi-oriented Scene | text | Detection |
| Deep feature extraction with tri-channel | text | ual feature map for text classification |
| Deep Features for | text | Spotting |
| Deep Geometric Moments Promote Shape Consistency in | text | -to-3D Generation |
| Deep image compression using scene | text | quality assessment |
| Deep learning and recurrent connectionist-based approaches for Arabic | text | recognition in videos |
| deep learning approach to handwritten | text | recognition in the presence of struck-out text, A |
| deep learning approach to handwritten | text | recognition in the presence of struck-out text, A |
| Deep Learning for Image-to- | text | Generation: A Technical Overview |
| Deep Learning in the Domain of Multi-Document | text | Summarization |
| Deep Matching Prior Network: Toward Tighter Multi-oriented | text | Detection |
| Deep Multi-Scale Con | text | Aware Feature Aggregation for Curved Scene Text Detection |
| Deep Neural Network Based 3D Articulatory Movement Prediction Using Both | text | and Audio Inputs |
| Deep neural network based hidden Markov model for offline handwritten Chinese | text | recognition |
| Deep Neural Network with Attention Model for Scene | text | Recognition |
| Deep Relational Reasoning Graph Network for Arbitrary Shape | text | Detection |
| Deep Reward Supervisions for Tuning | text | -to-image Diffusion Models |
| Deep | text | Spotter: An End-to-End Trainable Scene Text Localization and Recognition Framework |
| DeepErase: Weakly Supervised Ink Artifact Removal in Document | text | Images |
| DeepEraser: Deep Iterative Con | text | Mining for Generic Text Eraser |
| DeepSolo: Let Transformer Decoder with Explicit Points Solo for | text | Spotting |
| DeepWriterID: An End-to-End Online | text | -Independent Writer Identification System |
| Deformable scene | text | detection using harmonic features and modified pixel aggregation network |
| Deformation Robust | text | Spotting with Geometric Prior |
| Deformation-Invariant Networks for Handwritten | text | Recognition |
| Degraded Gray-Scale | text | Recognition Using Pseudo-2D Hidden Markov-Models and N-Best Hypotheses |
| Delaunay triangulation based | text | detection from multi-view images of natural scene |
| DeltaEdit: Exploring | text | -free Training for Text-Driven Image Manipulation |
| DeltaEdit: Exploring | text | -free Training for Text-Driven Image Manipulation |
| Dense Chained Attention Network for Scene | text | Recognition |
| Dense prediction for | text | line segmentation in handwritten document images |
| Dense | text | -to-Image Generation with Attention Modulation |
| density-based approach for | text | extraction in images, A |
| Dependability Feature Learning Based on Sample Generation for Unsupervised | text | -to-Image Person Re-Identification |
| Dependence Models for Searching | text | in Document Images |
| Deriving a Priori Co-occurrence Probability Estimates for Object Recognition from Social Networks and | text | Processing |
| Deriving Symbol Dependent Edit Weights for | text | Correction: The Use of Error Dictionaries |
| Design and Evaluation of Features That Best Define | text | in Complex Scene Images |
| Design and Preliminary Evaluation of a Finger-Mounted Camera and Feedback System to Enable Reading of Printed | text | for the Blind, The |
| DesignDiffusion: High-Quality | text | -to-Design Image Generation with Diffusion Models |
| Detect Arbitrary-Shaped | text | via Adaptive Thresholding and Localization Quality Estimation |
| Detect Visual Spoofing in Unicode-Based | text | |
| Detect-and-Guide: Self-regulation of Diffusion Models for Safe | text | -to-Image Generation via Guideline Token Optimization |
| Detected | text | -Based Image Retrieval Approach for Textual Images |
| Detecting and reading | text | in natural scenes |
| Detecting Arbitrarily Oriented | text | Labels in Early Maps |
| Detecting dense | text | in natural images |
| Detecting Misspelled Words in Turkish | text | Using Syllable n-gram Frequencies |
| Detecting moving | text | in video using temporal information |
| Detecting natural scenes | text | via auto image partition, two-stage grouping and two-layer classification |
| Detecting Oriented | text | in Natural Images by Linking Segments |
| Detecting Origin Attribution for | text | -to-Image Diffusion Models |
| Detecting Signs of Depression Using Social Media | text | s Through an Ensemble of Ensemble Classifiers |
| Detecting Tampered Scene | text | in the Wild |
| Detecting | text | Areas and Decorative Elements in Ancient Manuscripts |
| Detecting | text | in Natural Image with Connectionist Text Proposal Network |
| Detecting | text | in Natural Image with Connectionist Text Proposal Network |
| Detecting | text | in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Color Invariance |
| Detecting | text | in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Text Detection |
| Detecting | text | in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Text Detection |
| Detecting | text | in natural scenes with stroke width transform |
| Detecting | text | in Scene and Traffic Guide Panels With Attention Anchor Mechanism |
| Detecting | text | in the Wild with Deep Character Embedding Network |
| Detecting | text | Lines in Handwritten Documents |
| Detecting | text | s of arbitrary orientations in natural images |
| Detecting Traffic Information From Social Media | text | s With Deep Learning Approaches |
| Detecting Video | text | s Using Spatial-Temporal Wavelet Transform |
| Detection and Interpretation of | text | Information in Noisy Video Sequences |
| Detection and Location of Multicharacter Sequences in Lines of Imaged | text | |
| Detection and rectification of arbitrary shaped scene | text | s by using text keypoints and links |
| Detection and rectification of arbitrary shaped scene | text | s by using text keypoints and links |
| Detection and Segmentation of Antialiased | text | in Screen Images |
| Detection Approaches for Table Semantics in | text | |
| Detection of Curved | text | in Video: Quad Tree Based Method |
| Detection of curved | text | path based on the fuzzy curve-tracing (FCT) algorithm |
| Detection of Data Hiding in Binary | text | Images |
| Detection of | text | marks on moving vehicles |
| Detection of | text | on road signs from video |
| Detection of | text | Region and Segmentation from Natural Scene Images |
| Detection of | text | regions from digital engineering drawings |
| Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for | text | Clustering |
| Deterministic Turing Machine for Con | text | Sensitive Translation of Braille Codes to Urdu Text, A |
| Devanagari and Bangla | text | Extraction from Natural Scene Images |
| Devanagari | text | Recognition: A Transcription Based Formulation |
| Development and Evaluation of | text | Localization Techniques Based on Structural Texture Features and Neural Classifiers |
| Development of a Robust and Compact On-Line Handwritten Japanese | text | Recognizer for Hand-Held Devices |
| Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for | text | -To-Video Generation, The |
| DF-GAN: A Simple and Effective Baseline for | text | -to-Image Synthesis |
| diabetic retinopathy classification method based on image- | text | contrastive learning, A |
| Dial: Dense Image- | text | Alignment for Weakly Supervised Semantic Segmentation |
| DiCTI: Diffusion-based Clothing Designer via | text | -guided Input |
| Dictionary design for | text | image compression with JBIG2 |
| Dictionary-guided Scene | text | Recognition |
| Diff-tracker: | text | -to-image Diffusion Models are Unsupervised Trackers |
| DiffAgent: Fast and Accurate | text | -to-Image API Selection with Large Language Model |
| DiffBoost: Enhancing Medical Image Segmentation via | text | -Guided Diffusion Model |
| Different Approaches to Bilingual | text | Classification Based on Grammatical Inference Techniques |
| Differentiable Duration Refinement Using Internal Division for Non-Autoregressive | text | -to-Speech |
| Differential-Processing Extraction Approach to | text | and Image Segmentation, A |
| Differentiation of alphabets in handwritten | text | s |
| Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between | text | and Vision for Zero-Shot Image Captioning |
| Diffusion for Description or | text | to Image Generation |
| Diffusion for Layout Control in | text | to Image Generation |
| Diffusion in the Dark: A Diffusion Model for Low-Light | text | Recognition |
| Diffusion Models in 3D Synthesis, | text | to 3D Models |
| Diffusion Soup: Model Merging for | text | -to-image Diffusion Models |
| Diffusion-based Blind | text | Image Super-Resolution |
| Diffusion-Enhanced Test-Time Adaptation with | text | and Image Augmentation |
| Diffusion-SDF: | text | -to-Shape via Voxelized Diffusion |
| DiffusionCLIP: | text | -Guided Diffusion Models for Robust Image Manipulation |
| DiffusionGAN3D: Boosting | text | -guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors |
| Diffusionpen: Towards Controlling the Style of Handwritten | text | Generation |
| DiffusionRet: Generative | text | -Video Retrieval with Diffusion Model |
| DiffusionSTR: Diffusion Model for Scene | text | Recognition |
| Digital image analysis to enhance underwritten | text | in the Archimedes palimpsest |
| Digital Ink Recogntion Server for Handwritten Japanese | text | , A |
| DINOv2 Meets | text | : A Unified Framework for Image- and Pixel-Level Vision-Language Alignment |
| Diphone spanish | text | -to-speech synthesizer |
| Direct Regression Scene | text | Detector With Position-Sensitive Segmentation, A |
| Direct | text | to Speech Translation System Using Acoustic Units |
| Direct Unsupervised | text | Line Extraction from Colored Historical Manuscript Images Using DCT |
| DIRECT-3D: Learning Direct | text | -to-3D Generation on Massive Noisy 3D Data |
| Direct2.5: Diverse | text | -to-3D Generation via Multi-view 2.5D Diffusion |
| Discovering Low-Rank Shared Concept Space for Adapting | text | Mining Models |
| Discovering meaningful multimedia patterns with audio-visual concepts and associated | text | |
| DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video- | text | Retrieval |
| Discrete Joint Semantic Alignment Hashing for Cross-Modal Image- | text | Search |
| Discrete-continuous Action Space Policy Gradient-based Attention for Image- | text | Matching |
| Discrimination of machine-printed from handwritten | text | using simple structural characteristics |
| Discriminative Class Tokens for | text | -to-Image Diffusion Models |
| Discriminative Kernel-Based Approach to Rank Images from | text | Queries, A |
| Discriminative Model for On-line Handwritten Japanese | text | Retrieval, A |
| Discriminative Probing and Tuning for | text | -to-Image Generation |
| discriminative semi-Markov model for robust scene | text | recognition, A |
| Disease-Specific Extraction of | text | from Cardiac Echo Videos for Decision Support |
| DisenDreamer: Subject-Driven | text | -to-Image Generation With Sample-Aware Disentangled Tuning |
| Disentangled Clothed Avatar Generation from | text | Descriptions |
| Disentangled Contour Learning for Quadrilateral | text | Detection |
| Disentangling Inter- and Intra-Video Relations for Multi-Event Video- | text | Retrieval and Grounding |
| Disentangling Subject-Irrelevant Elements in Personalized | text | -to-Image Diffusion via Filtered Self-Distillation |
| Dissecting Deep Metric Learning Losses for Image- | text | Retrieval |
| Distilling Knowledge of Bidirectional Language Model for Scene | text | Recognition |
| Distinction between handwritten and machine-printed | text | based on the bag of visual words model |
| Distinguishing between Handwritten and Machine Printed | text | in Bank Cheque Images |
| Distinguishing mathematics notation from English | text | using computational geometry |
| Distinguishing | text | /Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks |
| Distinguishing | text | /Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks |
| Distributional semantics of objects in visual scenes in comparison to | text | |
| Diverse | text | -to-3d Synthesis with Augmented Text Embedding |
| Diverse | text | -to-3d Synthesis with Augmented Text Embedding |
| Diversified | text | -to-image generation via deep mutual information estimation |
| DiZNet: An end-to-end | text | detection and recognition algorithm with detail in text zone |
| DiZNet: An end-to-end | text | detection and recognition algorithm with detail in text zone |
| DM-GAN: Dynamic Memory Generative Adversarial Networks for | text | -To-Image Synthesis |
| DM-PCL: | text | -Driven Dual-Modal Prototype Consistency Learning for Weakly-Supervised Few-Shot Part Segmentation |
| DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for | text | -to-Image Synthesis |
| Do | text | -free Diffusion Models Learn Discriminative Visual Representations? |
| DOC: | text | Recognition via Dual Adaptation and Clustering |
| DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for | text | -rich Document Understanding |
| Document Analysis System Based on | text | Line Matching of Multiple OCR Outputs, A |
| Document dewarping via | text | -line based optimization |
| Document filtering for fast approximate string matching of erroneous | text | |
| Document Image De-warping Based on Detection of Distorted | text | Lines |
| Document Image Dewarping using Robust Estimation of Curled | text | Lines |
| Document image ground truth generation from electronic | text | |
| Document Mining Based on Semantic Understanding of | text | |
| Document Rectification Approach Dealing with Both Perspective Distortion and Warping Based on | text | Flow Curve Fitting, A |
| Document segmentation and classification into musical scores and | text | |
| Document skew detection/control system for printed document images containing a mixture of pure | text | lines and non-text portions |
| Document skew detection/control system for printed document images containing a mixture of pure | text | lines and non-text portions |
| Does | text | attract attention on e-commerce images: A novel saliency prediction dataset and method |
| Domain adaptive multigranularity proposal network for | text | detection under extreme traffic scenes |
| Domain Generalization in CLIP via Learning with Diverse | text | Prompts |
| Domain-Complementary Prior With Fine-Grained Feedback for Scene | text | Image Super-Resolution |
| Don't Forget Me: Accurate Background Recovery for | text | Removal via Modeling Local-Global Context |
| Dot | text | Detection Based on FAST Points |
| Double supervision for scene | text | detection and recognition based on BMINet |
| Doubly Abductive Counterfactual Inference for | text | -Based Image Editing |
| Downtown Osaka Scene | text | Dataset |
| Drag | text | : Rethinking Text Embedding in Point-Based Image Editing |
| Dream-in-Style: | text | -to-3D Generation Using Stylized Score Distillation |
| Dream3D: Zero-Shot | text | -to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models |
| Dream3D: Zero-Shot | text | -to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models |
| DreamArtist: Controllable One-Shot | text | -to-Image Generation via Positive-Negative Adapter |
| DreamAvatar: | text | -and-Shape Guided 3D Human Avatar Generation via Diffusion Models |
| DreamBlend: Advancing Personalized Fine-Tuning of | text | -to-Image Diffusion Models |
| DreamBooth3D: Subject-Driven | text | -to-3D Generation |
| DreamBooth: Fine Tuning | text | -to-Image Diffusion Models for Subject-Driven Generation |
| DreamControl: Control-Based | text | -to-3D Generation with 3D Self-Prior |
| Dreamdissector: Learning Disentangled | text | -to-3d Generation from 2d Diffusion Priors |
| Dreamdrone: | text | -to-image Diffusion Models Are Zero-shot Perpetual View Generators |
| DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent | text | -to-Image Personalization |
| Dreammesh: Jointly Manipulating and | text | uring Triangle Meshes for Text-to-3d Generation |
| DreamPropeller: Supercharge | text | -to-3D Generation with Parallel Sampling |
| Dreamreward: | text | -to-3d Generation with Human Preference |
| Dreamscene360: Unconstrained | text | -to-3d Scene Generation with Panoramic Gaussian Splatting |
| Dreamscene: 3d Gaussian-based | text | -to-3d Scene Generation via Formation Pattern Sampling |
| DreamStone: Image as a Stepping Stone for | text | -Guided 3D Shape Generation |
| Dream | text | : High Fidelity Scene Text Synthesis |
| Dreamview: Injecting View-specific | text | Guidance Into Text-to-3d Generation |
| Dreamview: Injecting View-specific | text | Guidance Into Text-to-3d Generation |
| DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable | text | -to-Image Diffusion Generation |
| DSTA: Reinforcing Vision-Language Understanding for Scene- | text | VQA With Dual-Stream Training Approach |
| DS | text | V2: A comprehensive video text spotting dataset for dense and small text |
| DS | text | V2: A comprehensive video text spotting dataset for dense and small text |
| DTLLM-VLT: Diverse | text | Generation for Visual Language Tracking Based on LLM |
| DU-Net: A Dual U-Net for semantic | text | -guided style transfer |
| Dual Adversarial Inference for | text | -to-Image Synthesis |
| Dual Alignment Unsupervised Domain Adaptation for Video- | text | Retrieval |
| dual branch graphic | text | detection network based on progressive Domain adaptation, A |
| Dual Encoding for Video Retrieval by | text | |
| Dual Relation Network for Scene | text | Recognition |
| Dual Stream Relation Learning Network for Image- | text | Retrieval |
| Dual-branch scale disentanglement for | text | -video retrieval |
| Dual-Level Representation Enhancement on Characteristic and Con | text | for Image-Text Retrieval |
| Dual-path CNN with Max Gated block for | text | -based person re-identification |
| Dual-Path Rare Content Enhancement Network for Image and | text | Matching |
| DUET: Detection Utilizing Enhancement for | text | in Scanned or Captured Documents |
| DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without | text | Alignment |
| DVHMM: variable length | text | recognition error model |
| Dynamic Attention Analysis for Backdoor Detection in | text | -to-Image Diffusion Models |
| Dynamic Contrastive Distillation for Image- | text | Retrieval |
| Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End | text | Spotting |
| Dynamic Prompt Optimizing for | text | -to-Image Generation |
| Dynamic receptive field adaptation for scene | text | recognition |
| Dynamic recognition in the omni-writer frame: Application to hand-printed | text | recognition |
| Dynamic semantic prototype perception for | text | -video retrieval |
| Dynamic sparse and weight allocation-based | text | -driven person retrieval |
| Dynamic | text | Line Segmentation for Real-Time Recognition of Chinese Handwritten Sentences |
| Dynamic | text | s From UAV Perspective Natural Images |
| Dynamic Unilateral Dual Learning for | text | to Image Synthesis |
| Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking for Image- | text | Retrieval |
| Dynamic Word Based | text | Compression |
| DynTypo: Example-Based Dynamic | text | Effects Transfer |
| Dysen-VDM: Empowering Dynamics-Aware | text | -to-Video Diffusion with LLMs |
| E.T. the Exceptional Trajectories: | text | -to-Camera-Trajectory Generation with Character Awareness |
| E2VTS: Energy-Efficient Video | text | Spotting from Unmanned Aerial Vehicles |
| E4C: Enhance Editability for | text | -Based Image Editing by Harnessing Efficient CLIP Guidance |
| EA-VTR: Event-aware Video- | text | Retrieval |
| Eaformer: Scene | text | Segmentation with Edge-aware Transformers |
| Early feature stream integration versus decision level combination in a multiple classifier system for | text | line recognition |
| Earthquake Information Extraction and Comparison from Different Sources Based on Web | text | |
| EAST: An Efficient and Accurate Scene | text | Detector |
| ECLIPSE: A Resource-Efficient | text | -to-Image Prior for Image Generations |
| EDA: Explicit | text | -Decoupling and Dense Alignment for 3D Visual Grounding |
| Edge Approximation | text | Detector |
| Edge Based Binarization for Video | text | Images |
| Edge guided and Fourier attention-based Dual Interaction Network for scene | text | erasing |
| Edge-Based Features for Localization of Artificial Urdu | text | in Video Images |
| Edge-based method for | text | detection from complex document images |
| Edge-based | text | localization and character segmentation algorithms for automatic slab information recognition |
| EdgeRelight360: | text | -Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting |
| Edit Probability for Scene | text | Recognition |
| Editing Implicit Assumptions in | text | -to-Image Diffusion Models |
| Educational video understanding: Mapping handwritten | text | to textbook chapters |
| EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for | text | -based Person Retrieval |
| Effect of Improved Path Evaluation for On-line Handwritten Japanese | text | Recognition |
| Effective 3D | text | Recurrent Voting Generator for Metaverse, An |
| Effective and efficient video | text | extraction using key text points |
| Effective and efficient video | text | extraction using key text points |
| Effective feature descriptor-based new framework for off-line | text | -independent writer identification |
| effective method for | text | line segmentation in historical document images, An |
| effective sentence-extraction technique using con | text | ual information and statistical approaches for text summarization, An |
| Effective shrinkage of large multi-class linear SVM models for | text | categorization |
| Effective | text | localization in natural scene images with MSER, geometry-based grouping and AdaBoost |
| Effective Uyghur Language | text | Detection in Complex Background Images for Traffic Prompt Identification |
| Effective video | text | detection using line features |
| Effectively localize | text | in natural scene images |
| effectiveness of T5, GPT-2, and BERT on | text | -to-image generation task, The |
| Efficiency investigation of manifold matching for | text | document classification |
| Efficient Algorithm for Segmenting Warped | text | -Lines in Document Images, An |
| Efficient and Accurate Arbitrary-Shaped | text | Detection With Pixel Aggregation Network |
| Efficient and flexible | text | extraction from document pages |
| Efficient Automatic | text | Location Method and Content-Based Indexing and Structuring of Video Database |
| Efficient Character Skew Rectification in Scene | text | Images |
| Efficient Exploration of Image Classifier Failures with Bayesian Optimization and | text | -to-Image Models |
| Efficient Exploration of | text | Regions in Natural Scene Images Using Adaptive Image Sampling |
| Efficient graph-based dictionary search and its application to | text | -image searching |
| Efficient Image- | text | Retrieval via Keyword-Guided Pre-Screening |
| Efficient indexing for Query By String | text | retrieval |
| Efficient Industrial System for Vehicle Tyre (Tire) Detection and | text | Recognition Using Deep Learning, An |
| Efficient label-free pruning and retraining for | text | -VQA Transformers |
| Efficient Light Balancing Techniques for | text | Images in Video Presentation Systems |
| Efficient Method for Offline | text | Independent Writer Identification, An |
| Efficient Method for | text | Detection in Video Based on Stroke Width Similarity, An |
| Efficient Multimodal Aggregation Network for Video- | text | Retrieval, An |
| Efficient Scene | text | localization and recognition with local character refinement |
| Efficient side information encoding for | text | hardcopy documents |
| Efficient System for Hazy Scene | text | Detection using a Deep CNN and Patch-NMS, An |
| Efficient | text | analyser with prosody generator-driven approach for Mandarin text-to-speech |
| Efficient | text | analyser with prosody generator-driven approach for Mandarin text-to-speech |
| Efficient | text | Capture Method for Moving Robots Using DCT Feature and Text Tracking, An |
| Efficient | text | Capture Method for Moving Robots Using DCT Feature and Text Tracking, An |
| Efficient | text | Classification Using Tree-structured Multi-linear Principal Component Analysis |
| Efficient | text | independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm |
| Efficient | text | localization in born-digital images by local contrast-based segmentation |
| Efficient | text | Segmentation Technique Based on Naive Bayes Classifier, An |
| Efficient | text | -based Person Search via Single-stage Identity-guided Attribute Parsing and Alignment |
| Efficient | text | -Guided 3D-Aware Generation With Score Distillation on 3D Distribution |
| Efficient | text | -to-Image Generation: An Adaptive Step Schedule Controller for Diffusion Models |
| Efficient Token-Guided Image- | text | Retrieval With Consistent Multimodal Contrastive Training |
| Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with | text | -Image Alignment |
| Efficient video | text | detection using edge features |
| Efficient video | text | recognition using multiple frame integration |
| Efficient Visual Search of Videos Cast as | text | Retrieval |
| EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten | text | recognition |
| Ego | text | VQA: Towards Egocentric Scene-Text Aware Video Question Answering |
| EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free | text | -to-Video Generation |
| Eigenspace method for | text | retrieval in historical document images |
| ELITE: Encoding Visual Concepts into | text | ual Embeddings for Customized Text-to-Image Generation |
| Elucidating Optimal Reward-Diversity Tradeoffs in | text | -to-Image Diffusion Models |
| Embedded Application for Degraded | text | Recognition, An |
| Embedded Bernoulli Mixture HMMs for Continuous Handwritten | text | Recognition |
| embedded method: Improve the relevance of | text | and face image with enhanced face attributes, An |
| Embil: An English-manipuri Bi-lingual Benchmark for Scene | text | Detection and Language Identification |
| Emergent Visual-semantic Hierarchies in Image- | text | Representations |
| EmoGen: Emotional Image Content Generation with | text | -to-Image Diffusion Models |
| EmoLabel: Semi-Automatic Methodology for Emotion Annotation of Social Media | text | |
| EmoSphere++: Emotion-Controllable Zero-Shot | text | -to-Speech Via Emotion-Adaptive Spherical Vector |
| Emotion Correlation Mining Through Deep Learning Models on Natural Language | text | |
| Emotion Recognition in | text | for 3-D Facial Expression Rendering |
| EmotionAlBERTo: Emotion Recognition of Italian Social Media | text | s Through BERT |
| Empathy Detection From | text | , Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods |
| Empirical Study and Analysis of | text | -to-image Generation Using Large Language Model-powered Textual Representation, An |
| Empirical Study of Scaling Law for Scene | text | Recognition, An |
| EMU: Effective Multi-Hot Encoding Net for Lightweight Scene | text | Recognition With a Large Character Set |
| Encapsulated Composition of | text | -to-Image and Text-to-Video Models for High-Quality Video Synthesis |
| Encapsulated Composition of | text | -to-Image and Text-to-Video Models for High-Quality Video Synthesis |
| Encoding Video Narration as | text | |
| End-to-End Approach for Handwriting Recognition: From Handwritten | text | Lines to Complete Pages, An |
| End-to-End Handwritten Paragraph | text | Recognition Using a Vertical Attention Network |
| End-to-end interactive joint model: Clause-phrase multi-task learning for suicidal ideation cause extraction (SICE) in Chinese Weibo | text | |
| end-to-end model for multi-view scene | text | recognition, An |
| End-to-end OCR | text | Re-organization Sequence Learning for Rich-text Detail Image Comprehension, An |
| End-to-end OCR | text | Re-organization Sequence Learning for Rich-text Detail Image Comprehension, An |
| End-to-End page-Level assessment of handwritten | text | recognition |
| End-to-End Pre-Training With Hierarchical Matching and Momentum Contrast for | text | -Video Retrieval |
| End-to-end scene | text | recognition |
| End-to-end scene | text | recognition using tree-structured models |
| End-to-end | text | recognition with convolutional neural networks |
| End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene | text | Recognition, An |
| End-to-End Video | text | Detection with Online Tracking |
| End-to-End Video | text | Spotting with Transformer |
| Enforcing similarity constraints with integer programming for better scene | text | recognition |
| Enhanced Active Contour Method for Locating | text | |
| Enhanced Feature Extraction Framework for Cross-Modal Image- | text | Retrieval, An |
| Enhanced Generative Structure Prior for Chinese | text | Image Super-Resolution |
| Enhanced Motion- | text | Alignment for Image-to-Video Transfer Learning |
| Enhanced Network Embedding with | text | Information |
| Enhanced Probabilistic Neural Network Approach Applied to | text | Classification, An |
| Enhanced Semantic Similarity Learning Framework for Image- | text | Matching |
| Enhanced | text | Extraction from Arabic Degraded Document Images Using EM Algorithm |
| Enhancement and feature extraction for images of incised and ink | text | s |
| Enhancement of camera captured | text | images with specular reflection |
| Enhancement of | text | images using a context based nonlinear interpolative vector quantization method |
| Enhancing 3D Fidelity of | text | -to-3D using Cross-View Correspondences |
| Enhancing Diffusion Models with | text | -encoder Reinforcement Learning |
| Enhancing energy minimization framework for scene | text | recognition with top-down cues |
| Enhancing fine-detail image synthesis from | text | descriptions by text aggregation and connection fusion module |
| Enhancing fine-detail image synthesis from | text | descriptions by text aggregation and connection fusion module |
| Enhancing Handwritten | text | Recognition with N-gram sequence decomposition and Multitask Learning |
| Enhancing knowledge distillation for semantic segmentation through | text | -assisted modular plugins |
| Enhancing Micro Gesture Recognition for Emotion Understanding via Con | text | -Aware Visual-Text Contrastive Learning |
| Enhancing Scene | text | Detection via Fused Semantic Segmentation Network with Attention |
| Enhancing Scene | text | Detectors with Realistic Text Image Synthesis Using Diffusion Models |
| Enhancing Scene | text | Detectors with Realistic Text Image Synthesis Using Diffusion Models |
| Enhancing Semantic Fidelity in | text | -to-image Synthesis: Attention Regulation in Diffusion Models |
| Enhancing Tampered | text | Detection Through Frequency Feature Fusion and Decomposition |
| Enhancing | text | -Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement |
| Enhancing | text | -like edges in digital images |
| Enhancing | text | -Video Retrieval Performance With Low-Salient but Discriminative Objects |
| Enhancing the Video Editing Capabilities of | text | -to-Video Generators Using DDPM Inversion |
| Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided | text | Prompts |
| Enriching Video Captions With Con | text | ual Text |
| Ensemble Methods to Improve the Performance of an English Handwritten | text | Line Recognizer |
| Episodic Learning Network for | text | Detection on Human Bodies in Sports Images, An |
| ER-Chat: A | text | -to-Text Open-Domain Dialogue Framework for Emotion Regulation |
| ER-Chat: A | text | -to-Text Open-Domain Dialogue Framework for Emotion Regulation |
| EraseNet: End-to-End | text | Removal in the Wild |
| Erasing Scene | text | with Weak Supervision |
| ERNIE-ViLG 2.0: Improving | text | -to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts |
| Erp-Aware | text | -To-360 Panorama Diffusion Model |
| ESA: External Space Attention Aggregation for Image- | text | Retrieval |
| Escaping Plato's Cave: Towards the Alignment of 3D and | text | Latent Spaces |
| ESIR: End-To-End Scene | text | Recognition via Iterative Image Rectification |
| Estate: Expert-Guided State | text | Enhancement for Zero-Shot Industrial Anomaly Detection |
| ES | text | Spotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer |
| Estimating the Orientation and Recovery of | text | Planes in a Single Image |
| Estimating the readability of handwritten | text | : A Support Vector Regression based approach |
| Estimating the Semantics via Sector Embedding for Image- | text | Retrieval |
| Estimating urban flooding depth by integrating multimodal image- | text | data: A segment-level direct preference optimization-based multimodal large language model |
| Estimation of Skew Angle in | text | -Image Analysis by SLIDE: Subspace-Based Line Detection |
| Evaluating a Hidden Markov Model of Syntax in a | text | Recognition System |
| Evaluating and Improving Compositional | text | -to-Visual Generation |
| Evaluating Data Attribution for | text | -to-Image Models |
| Evaluating OCR and Non-OCR | text | Representations for Learning Document Classifiers |
| Evaluating | text | -to-Image Matching using Binary Image Selection (BISON) |
| Evaluating | text | -to-Video Alignment: A Hierarchical Benchmark for Video Generation Models |
| Evaluating | text | -to-visual Generation with Image-to-text Generation |
| Evaluating | text | -to-visual Generation with Image-to-text Generation |
| Evaluation of HMM-Based Techniques for the Recognition of Screen Rendered | text | , An |
| Evaluation of Model-Based Retrieval Effectiveness with OCR | text | |
| Evaluation of neural network language models in handwritten Chinese | text | recognition |
| Evaluation of the Concatenative Turkish | text | -to-Speech System |
| Evaluation of the Optimal Topic Classification for Social Media Data Combined with | text | Semantics: A Case Study of Public Opinion Analysis Related to COVID-19 with Microblogs |
| Event-Guided Procedure Planning from Instructional Videos with | text | Supervision |
| evidence-based model of saliency feature extraction for scene | text | analysis, An |
| Evolution Maps for Connected Components in | text | Documents |
| example-based prior model for | text | image super-resolution, An |
| Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image- | text | Multimodal Classification |
| Experimental Evaluation of OCR | text | Representations for Learning Document Classifiers, An |
| Experimental Investigation of | text | -Based CAPTCHA Attacks and Their Robustness, An |
| Experimental Study of Pruning Techniques in Handwritten | text | Recognition Systems, An |
| Experimental System for Office Document Handling and | text | Recognition, An |
| Experiments in | text | Recognition with Binary N-Gram and Viterbi Algorithms |
| Experiments in | text | Recognition with the Modified Viterbi Algorithm |
| Experiments in the Recognition of Handprinted | text | : Part I Character Recognition |
| Explain2Attack: | text | Adversarial Attacks via Cross-Domain Interpretability |
| Explaining Semantic | text | Similarity in Knowledge Graphs |
| Explicitly-Decoupled | text | Transfer With Minimized Background Reconstruction for Scene Text Editing |
| Explicitly-Decoupled | text | Transfer With Minimized Background Reconstruction for Scene Text Editing |
| Exploiting Color Information for Better Scene | text | Recognition |
| Exploiting colour information for better scene | text | detection and recognition |
| Exploiting Unlabeled Videos for Video- | text | Retrieval via Pseudo-Supervised Learning |
| Exploring AIGC Video Quality: A Focus on Visual Harmony, Video- | text | Consistency and Domain Distribution Gap |
| Exploring Effective Interactive | text | -Based Video Search in vitrivr |
| Exploring Fine-Grained Visual- | text | Feature Alignment With Prompt Tuning for Domain-Adaptive Object Detection |
| Exploring Global and Local Linguistic Representations for | text | -to-Image Synthesis |
| Exploring Phrase Grounding without Training: Con | text | ualisation and Extension to Text-Based Image Retrieval |
| Exploring Phrase-level Grounding with | text | -to-image Diffusion Model |
| Exploring Pre-trained | text | -to-video Diffusion Models for Referring Video Object Segmentation |
| Exploring Sparse MoE in GANs for | text | -conditioned Image Synthesis |
| Exploring Sparse Spatial Relation in Graph Inference for | text | -Based VQA |
| Exploring | text | representation impact on K-means based arabic text documents clustering |
| Exploring | text | representation impact on K-means based arabic text documents clustering |
| Exploring | text | -to-Motion Generation with Human Preference |
| Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene | text | Detection |
| Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for | text | -to-Image Synthesis |
| Exploring The Potential of Vision-Language Models for Pure-Image and | text | -Guided-Image Saliency Prediction |
| Exploring the Spatial Distribution Characteristics of Emotions of Weibo Users in Wuhan Waterfront Based on Gender Differences Using Social Media | text | s |
| Exploring the Spatiotemporal Patterns of Residents' Daily Activities Using | text | -Based Social Media Data: A Case Study of Beijing, China |
| Exposing fake images generated by | text | -to-image diffusion models |
| Expressive Image Generation and Editing with Rich | text | |
| Expressive | text | -to-Image Generation with Rich Text |
| Expressive | text | -to-Image Generation with Rich Text |
| Expressive visual | text | -to-speech as an assistive technology for individuals with autism spectrum conditions |
| Expressive Visual | text | -to-Speech Using Active Appearance Models |
| Extended Bi-gram Features in | text | Categorization |
| Extended character defect model for recognition of | text | from maps |
| Extending TrOCR for | text | Localization-Free OCR of Full-Page Scanned Receipt Images |
| External Word Segmentation of Off-Line Handwritten | text | Lines |
| Extracting Curved | text | Lines Using Local Linearity of Text Line |
| Extracting Curved | text | Lines Using Local Linearity of Text Line |
| Extracting Spatio-Temporal Information from Chinese Archaeological Site | text | |
| Extracting | text | From Greyscale Images |
| Extracting | text | from WWW Images |
| Extraction and Recognition of Bangla | text | s from Natural Scene Images Using Cnn |
| Extraction of Handwritten | text | from Carbon Copy Medical Form Images |
| Extraction of line-word-character segments directly from run-length compressed printed | text | -documents |
| Extraction of Lines of | text | s in Unconstrained Handwritten Documents |
| Extraction of Nom | text | Regions from Stele Images Using Area Voronoi Diagram |
| Extraction of Pluvial Flood Relevant Volunteered Geographic Information (VGI) by Deep Learning from User Generated | text | s and Photos |
| Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed | text | -Documents |
| Extraction of special effects caption | text | events from digital video |
| Extraction of Spelling Variations from Language Structure for Noisy | text | Correction |
| Extraction of | text | boxes from Engineering Drawings |
| Extraction of | text | Lines and Text Blocks on Document Images Based on Statistical Modeling |
| Extraction of | text | Lines and Text Blocks on Document Images Based on Statistical Modeling |
| Extraction of | text | Words in Document Images Based on a Statistical Characterization |
| Extraction Of Thematically Relevant | text | From Images |
| extractive | text | summarization technique for Bengali document(s) using K-means clustering algorithm, An |
| Extractive | text | Summarization Using Topological Features |
| Extrapolate azimuth angles: | text | and edge guided ISAR image generation based on foundation model |
| Extremely Low-Light Image Enhancement with Scene | text | Restoration |
| Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to- | text | Transformation |
| FA-GAN: Feature-Aware GAN for | text | to Image Synthesis |
| Face typing: Vision-based perceptual interface for hands-free | text | entry with a scrollable virtual keyboard |
| FaceCLIP: Facial Image-to-Video Translation via a Brief | text | Description |
| FaceCLIPNeRF: | text | -driven 3D Face Manipulation using Deformable Neural Radiance Fields |
| Faces a la Carte: | text | -to-Face Generation via Attribute Disentanglement |
| Faces that Speak: Jointly Synthesising Talking Face and Speech from | text | |
| Facial Action Unit Recognition Enhanced by | text | Descriptions of FACS |
| Facsimile device with skew correction and | text | line direction detection |
| Factorizing | text | -to-video Generation by Explicit Image Conditioning |
| Factors in Emotion Recognition With Deep Learning Models Using Speech and | text | on Multiple Corpora |
| FakeInversion: Learning to Detect Images from Unseen | text | -to-Image Models by Inverting Stable Diffusion |
| Fantasia3D: Disentangling Geometry and Appearance for High-quality | text | -to-3D Content Creation |
| FARNet: Fragmented affinity reasoning network of | text | instances for arbitrary shape text detection |
| FARNet: Fragmented affinity reasoning network of | text | instances for arbitrary shape text detection |
| Fashion Image Retrieval with | text | Feedback by Additive Attention Compositional Learning |
| Fast and accurate scene | text | understanding with image binarization and off-the-shelf OCR |
| Fast and Accurate | text | Detection in Natural Scene Images with User-Intention |
| Fast and effective | text | detection |
| fast and efficient method for extracting | text | paragraphs and graphics from unconstrained documents, A |
| Fast and Efficient | text | Steganalysis Method, A |
| Fast and Flexible Statistical Method for | text | Extraction in Document Pages, A |
| Fast and memory efficient | text | image compression with JBIG2 |
| Fast and robust | text | detection in images and video frames |
| fast and robust | text | spotter, A |
| Fast and simple | text | replacement algorithm for text-based augmented reality |
| Fast and simple | text | replacement algorithm for text-based augmented reality |
| Fast Appearance-Based Full- | text | Search Method for Historical Newspaper Images, A |
| Fast Approximate Modelling of the Next Combination Result for Stopping the | text | Recognition in a Video |
| Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on | text | -Block Recognition for Screen Content Coding |
| fast hierarchical method for multi-script and arbitrary oriented scene | text | extraction, A |
| Fast Lexicon-Based Scene | text | Recognition with Sparse Belief Propagation |
| fast multiresolution | text | line and non text-line structures extraction and discrimination scheme for document image analysis, A |
| fast multiresolution | text | line and non text-line structures extraction and discrimination scheme for document image analysis, A |
| Fast online incremental approach of unseen place classification using disjoint- | text | attribute prediction |
| Fast perspective recovery of | text | in natural scenes |
| Fast scene | text | localization by learning-based filtering and verification |
| Fast Selection of Small and Precise Candidate Sets from Dictionaries for | text | Correction Tasks |
| Fast Supervised Topic Models for Short | text | Emotion Detection |
| Fast | text | categorization using concise semantic analysis |
| Fast | text | line detection by finding linear connected components on Canny edge image |
| Fast | text | line extraction in document images |
| Fast | text | /graphics resolution improvement using wavelet based denoising and chain-code table lookup |
| Fast Uyghur | text | detection in videos based on learning of baseline feature |
| Fast Uyghur | text | Detector for Complex Background Images, A |
| Fast(er) Reconstruction of Shredded | text | Documents via Self-Supervised Deep Asymmetric Metric Learning |
| Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image- | text | Retrieval |
| FAST: Facilitated and Accurate Scene | text | Proposals through FCN Guided Pruning |
| FastCLIPstyler: Optimisation-free | text | -based Image Style Transfer Using Style Representations |
| FastEdit: fast | text | -guided single-image editing via semantic-aware diffusion fine-tuning |
| FASTER: A Font-Agnostic Scene | text | Editing and Rendering Framework |
| FAS | text | : Efficient Unconstrained Scene Text Detector |
| FastFaceCLIP: A lightweight | text | -driven high-quality face image manipulation |
| FastVideoEdit: Leveraging Consistency Models for Efficient | text | -to-Video Editing |
| FateZero: Fusing Attentions for Zero-shot | text | -based Video Editing |
| FC-Render: Adaptive Font- and Color-Aware | text | Diffusion Model |
| FDS: Frequency-Aware Denoising Score for | text | -Guided Latent Diffusion Image Editing |
| Feature Embedding Based | text | Instance Grouping for Largely Spaced and Occluded Text Detection |
| Feature Embedding Based | text | Instance Grouping for Largely Spaced and Occluded Text Detection |
| Feature extracted from wavelet decomposition using biorthogonal Riesz basis for | text | -independent speaker recognition |
| Feature extracted from wavelet eigenfunction estimation for | text | -independent speaker recognition |
| Feature First: Advancing Image- | text | Retrieval Through Improved Visual Features |
| Feature Fusion Network for Scene | text | Detection |
| Feature Representations for Scene | text | Character Recognition: A Comparative Study |
| Feature selection for event extraction in biomedical | text | |
| Feature selection to recognize | text | from palm leaf manuscripts |
| Feature selection using hybrid poor and rich optimization algorithm for | text | classification |
| Feature subset selection using naive Bayes for | text | classification |
| Feature Weight Optimization and Pruning in Historical | text | Recognition |
| FedSH: Towards Privacy-Preserving | text | -Based Person Re-Identification |
| FeedEdit: | text | -Based Image Editing with Dynamic Feedback Regulation |
| FERGI: Automatic Scoring of User Preferences for | text | -to-Image Generation from Spontaneous Facial Expression Reaction |
| FETNet: Feature erasing and transferring network for scene | text | removal |
| Few Could Be Better Than All: Feature Sampling and Grouping for Scene | text | Detection |
| Few shots are all you need: A progressive learning approach for low resource handwritten | text | recognition |
| Few-shot Hierarchical | text | Classification with Bidirectional Path Constraint by label weighting |
| Few-Shot | text | Style Transfer via Deep Feature Similarity |
| Fg-T2M++: LLMs-Augmented Fine-Grained | text | Driven Human Motion Generation |
| Fg-T2M: Fine-Grained | text | -Driven Human Motion Generation via Diffusion Model |
| FHT: An Unconstraint Farsi Handwritten | text | Database |
| Find More Accurate | text | Boundary for Scene Text Detection |
| Find More Accurate | text | Boundary for Scene Text Detection |
| Find | text | in Documents |
| Find | text | in Video Scenes |
| Finding Hidden Semantics of | text | Tables |
| Finding structure in noisy | text | : Topic classification and unsupervised clustering |
| Finding | text | In Images |
| Finding | text | in Natural Scenes by Figure-Ground Segmentation |
| Finding | text | Regions using Localised Statistical Measures |
| Fine-Grained Erasure in | text | -To-Image Diffusion-Based Foundation Models |
| Fine-Grained Image- | text | Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation |
| Fine-grained Image- | text | Matching by Cross-modal Hard Aligning Network |
| Fine-grained Image- | text | Retrieval via Complementary Feature Learning |
| Fine-Grained Image- | text | Retrieval via Discriminative Latent Space Learning |
| Fine-grained semantic oriented embedding set alignment for | text | -based person search |
| Fine-Grained Video- | text | Retrieval With Hierarchical Graph Reasoning |
| Fine-Grained Visual | text | Prompting |
| Fine-Granularity Alignment for | text | -Based Person Retrieval Via Semantics-Centric Visual Division |
| Fine-Tuning | text | -To-Image Diffusion Models for Class-Wise Spurious Feature Generation |
| FineControlNet: Fine-level | text | Control for Image Generation with Spatially Aligned Text Control Injection |
| FineControlNet: Fine-level | text | Control for Image Generation with Spatially Aligned Text Control Injection |
| FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer | text | Inputs |
| Finematch: Aspect-based Fine-grained Image and | text | Mismatch Detection and Correction |
| Fisher Linear Discriminant Analysis for | text | -image combination in multimedia information retrieval |
| FlashEval: Towards Fast and Accurate Evaluation of | text | -to-Image Diffusion Generative Models |
| Flexible | text | Recovery from Degraded Typewritten Historical Documents |
| Flick Typing: A New VR | text | Input System Based on Space Gestures |
| FlipSketch: Flipping Static Drawings to | text | -Guided Sketch Animations |
| Focal | text | : an Accurate Text Detection with Focal Loss |
| Focal | text | : an Accurate Text Detection with Focal Loss |
| Focal Visual- | text | Attention for Memex Question Answering |
| Focal Visual- | text | Attention for Visual Question Answering |
| Focus Entirety and Perceive Environment for Arbitrary-Shaped | text | Detection |
| Focus on Scene | text | Using Deep Reinforcement Learning |
| Focus-N-Fix: Region-Aware Fine-Tuning for | text | -to-Image Generation |
| FocusCLIP: Focusing on Anomaly Regions by Visual- | text | Discrepancies |
| Focusing Attention: Towards Accurate | text | Recognition in Natural Images |
| Font Recognition and Con | text | ual Processing for More Accurate Text Recognition |
| Font Watermarking Network for | text | Images |
| Fontender: Interactive Japanese | text | Design with Dynamic Font Fusion Method for Comics |
| Food3D: | text | -Driven Customizable 3D Food Generation With Gaussian Splatting |
| Foreground and background separated image style transfer with a single | text | condition |
| Foreground and | text | -lines Aware Document Image Rectification |
| Foreground | text | Extraction in Color Document Images for Enhanced Readability |
| Foreground | text | segmentation in complex color document images using Gabor filters |
| Forged | text | detection in video, scene, and document images |
| Forget-Me-Not: Learning to Forget in | text | -to-Image Diffusion Models |
| Formal Distance vs. Association Strength in | text | Processing |
| Formalization of On-Line Handwritten Japanese | text | Recognition Free from Line Direction Constraint, A |
| FOTS: Fast Oriented | text | Spotting with a Unified Network |
| Fourier Contour Embedding for Arbitrary-Shaped | text | Detection |
| Fractals Based Multi-Oriented | text | Detection System for Recognition in Mobile Video Images |
| Fractional poisson enhancement model for | text | detection and recognition in video frames |
| Framework for Detecting and Selecting | text | Line Candidates of Correct Orientation, A |
| Framework for Performance Evaluation of Face, | text | , and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol |
| Free-editor: Zero-shot | text | -driven 3d Scene Editing |
| Free- | text | keystroke dynamics authentication for Arabic language |
| FREE: A Fast and Robust End-to-End Video | text | Spotter |
| FreeControl: Training-Free Spatial Control of Any | text | -to-Image Diffusion Model with Any Condition |
| Freemotion: A Unified Framework for Number-Free | text | -to-Motion Synthesis |
| Frequency-selective countnet: Enhancing | text | -guided object counting with frequency features |
| Frequent Itemsets Methods for | text | Clustering |
| Fringe Map Based | text | Line Segmentation of Printed Telugu Document Images |
| From External to Internal: Structuring Image for | text | -to-Image Attributes Manipulation |
| From senses to | text | s: An all-in-one graph-based approach for measuring semantic similarity |
| From | text | Detection to Text Segmentation: A Unified Evaluation Scheme |
| From | text | Detection to Text Segmentation: A Unified Evaluation Scheme |
| From | text | to Speech: A Multimodal Cross-Domain Approach for Deception Detection |
| From | text | to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification |
| From Two to One: A New Scene | text | Recognizer with Visual Language Modeling Network |
| From Words to Structured Visuals: A Benchmark and Framework for | text | -to-Diagram Generation and Editing |
| FT2TF: First-Person Statement | text | -to-Talking Face Generation |
| Full- | text | Access to Historical Newspapers |
| Full- | text | Search System for Images of Hand-Written Cursive Documents, A |
| Fully convolutional network with dilated convolutions for handwritten | text | line segmentation |
| Fully convolutional recurrent network for handwritten Chinese | text | recognition |
| Fully Shareable Scene | text | Recognition Modeling for Horizontal and Vertical Writing |
| Fundamental Visual Concept Learning From Correlated Images and | text | |
| Furniture-geek: Understanding fine-grained furniture attributes from freely associated | text | and tags |
| Further explorations in | text | alignment with handwritten documents |
| Further reduced form of wavelet feature for | text | independent speaker recognition |
| Fused | text | Segmentation Networks for Multi-oriented Scene Text Detection |
| Fused | text | Segmentation Networks for Multi-oriented Scene Text Detection |
| Fusion Encoder with Multi-Task Guidance for Cross-Modal | text | -Image Retrieval in Remote Sensing, A |
| Fusion of Speech, Faces and | text | for Person Identification in TV Broadcast |
| Fusion Strategy for the Single Shot | text | Detector, A |
| fuzzy find matching tool for image | text | analysis, A |
| Fuzzy Inference-Based Models for Extractive | text | Summarization |
| Fuzzy Semantics for Arbitrary-Shaped Scene | text | Detection |
| Fuzzy | text | /non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector |
| Fuzzy | text | /non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector |
| GA-DAN: Geometry-Aware Domain Adaptation Network for Scene | text | Detection and Recognition |
| Gabor filter based block energy analysis for | text | extraction from digital document images |
| GADNet: Improving image- | text | matching via graph-based aggregation and disentanglement |
| GALIP: Generative Adversarial CLIPs for | text | -to-Image Synthesis |
| Gamma correction acceleration for real-time | text | extraction from complex colored images |
| GAN-TSTEGA: | text | Steganography Based on Generative Adversarial Networks |
| GANFusion: Feed-Forward | text | -to-3D with Diffusion in GAN Space |
| Garmentaligner: | text | -to-garment Generation via Retrieval-augmented Multi-level Corrections |
| Gated Cross Word-visual Attention-driven Generative Adversarial Networks for | text | -to-image Synthesis |
| Gatha: Relational Loss for enhancing | text | -based style transfer |
| Gaussctrl: Multi-view Consistent | text | -driven 3d Gaussian Splatting Editing |
| Gaussian Constrained Attention Network for Scene | text | Recognition |
| Gaussian mixture modeling and learning of neighboring characters for multilingual | text | extraction in images |
| Gaussian Mixture Modeling of Neighbor Characters for Multilingual | text | Extraction in Images |
| GaussianDreamer: Fast Generation from | text | to 3D Gaussians by Bridging 2D and 3D Diffusion Models |
| GaussianEditor: Editing 3D Gaussians Delicately with | text | Instructions |
| Gaussians-to-Life: | text | -Driven Animation of 3D Gaussian Splatting Scenes |
| GCNs-Based Con | text | -Aware Short Text Similarity Model |
| Gender Bias in | text | -to-Video Generation Models: A Case Study of Sora |
| General and domain-specific techniques for detecting and recognizing superimposed | text | in video |
| general approach for multi-oriented | text | line extraction of handwritten documents, A |
| Generalized Interpolative Vector Quantization Method for Jointly Optimal Quantization, Interpolation, and Binarization of | text | Images, A |
| Generalizing Edit Distance to Incorporate Domain Information: Handwritten | text | Recognition as a Case-Study |
| Generalizing to Unseen Domains via | text | -guided Augmentation: A Training-free Approach |
| Generatect: | text | -conditional Generation of 3d Chest CT Volumes |
| Generating Diverse and Natural 3D Human Motions from | text | |
| Generating Holistic 3D Scene Abstractions for | text | -Based Image Retrieval |
| Generating Human Interaction Motions in Scenes with | text | Control |
| Generating Human Motion in 3D Scenes from | text | Descriptions |
| Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised | text | Latent Space |
| Generative Adversarial Approach for Zero-Shot Learning from Noisy | text | s, A |
| Generative Adversarial Network for | text | -to-Face Synthesis and Manipulation with Pretrained BERT Model |
| Generative Adversarial Networks Based on Dynamic Word-Level Update for | text | -to-Image Synthesis |
| Generative and Discriminative Fuzzy Restricted Boltzmann Machine Learning for | text | and Image Classification |
| Generative Image Steganography Based on | text | -to-Image Multimodal Generative Model |
| Generative Negative | text | Replay for Continual Vision-Language Pretraining |
| Generative Photography: Scene-Consistent Camera Control for Realistic | text | -to-Image Synthesis |
| Generative | text | Convolutional Neural Network for Hierarchical Document Representation Learning |
| generic method for determining the up/down orientation of | text | in Roman and non-Roman scripts, A |
| generic method for determining up/down orientation of | text | in Roman and non-Roman scripts, A |
| GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated | text | Corpora |
| Geometry Normalization Networks for Accurate Scene | text | Detection |
| Geometry-Aware Scene | text | Detection with Instance Transformation Network |
| Geospatial Semantics Analysis of the Qinghai-Tibetan Plateau Based on Microblog Short | text | s |
| Geotagging | text | Content With Language Models and Feature Mining |
| Getting it Right: Improving Spatial Consistency in | text | -to-image Models |
| GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and- | text | Contexts |
| GLASS: Global to Local Attention for Scene- | text | Spotting |
| GLIGEN: Open-Set Grounded | text | -to-Image Generation |
| Global-aware Fragment Representation Aggregation Network for image- | text | retrieval |
| Global-local prompts guided image- | text | embedding, alignment and aggregation for multi-label zero-shot learning |
| Global-Shared | text | Representation Based Multi-Stage Fusion Transformer Network for Multi-Modal Dense Video Captioning |
| Glyph-BYT5: A Customized | text | Encoder for Accurate Visual Text Rendering |
| Glyph-BYT5: A Customized | text | Encoder for Accurate Visual Text Rendering |
| GlyphMastero: A Glyph Encoder for High-Fidelity Scene | text | Editing |
| goal-oriented verification-based approach for target | text | line extraction from a document image captured by a pen scanner, A |
| Good Seed Makes a Good Crop: Discovering Secret Seeds in | text | -to-Image Diffusion Models |
| GPT-4V(ision) is a Human-Aligned Evaluator for | text | -to-3D Generation |
| GPT-Based | text | -to-SQL for Spatial Databases |
| GPT4Motion: Scripting Physical Motions in | text | -to-Video Generation via Blender-Oriented GPT Planning |
| GradBias: Unveiling Word Influence on Bias in | text | -to-Image Generative Models |
| Gradient Difference Based Technique for Video | text | Detection, A |
| Gradient Vector Flow and Grouping-Based Method for Arbitrarily Oriented Scene | text | Detection in Video Images |
| Gradient-based approach to offline | text | -independent Persian writer identification |
| GraDual: Graph-based Dual-modal Representation for Image- | text | Matching |
| Granularity-Aware Single-Point Scene | text | Spotting With Sequential Recurrence Self-Attention |
| Graph based method for Arabic | text | summarization |
| Graph Clustering-Based Ensemble Method for Handwritten | text | Line Segmentation |
| Graph Structured Network for Image- | text | Matching |
| Graph-based Method to Remove Interferential Curve From | text | Image, A |
| Graph-Based Segmentation and Feature-extraction Framework for Arabic | text | Recognition, A |
| Graph-Based | text | Segmentation Using a Selected Channel Image |
| Graph-empowered | text | -to-SQL generation on Electronic Medical Records |
| Graphical Figure Classification Using Data Fusion for Integrating | text | and Image Features |
| Graphics and Scene | text | Classification in Video |
| Graphological Analysis of Handwritten | text | Documents for Human Resources Recruitment |
| Grid Diffusion Models for | text | -to-Video Generation |
| Grit: A Generative Region-to- | text | Transformer for Object Understanding |
| Grounded Image | text | Matching with Mismatched Relation Reasoning |
| Grounded | text | -to-Image Synthesis with Attention Refocusing |
| Grounding Visual Representations with | text | s for Domain Generalization |
| Grouping | text | lines in freeform handwritten notes |
| Grouping Using Factor Graphs: An Approach for Finding | text | with a Camera Phone |
| GroupViT: Semantic Segmentation Emerges from | text | Supervision |
| GSAM+Cutie: | text | -Promptable Tool Mask Annotation for Endoscopic Video |
| Guided | text | Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments |
| Guiding Prototype Networks with label semantics for few-shot | text | classification |
| Gvgen: | text | -to-3d Generation with Volumetric Representation |
| HACG: Leveraging Hierarchical Alignment and Caption Generation for | text | -Video Retrieval |
| HairCLIP: Design Your Hair by | text | and Reference Image |
| Hallucination Elimination and | text | Annotation Framework for Large Vision-Language Models in Traffic Scenarios |
| HAM: Hidden Anchor Mechanism for Scene | text | Detection |
| Hand-Gesture Based | text | Input for Wearable Computers |
| Hand-written | text | recognition based on a new formulation |
| HanDiffuser: | text | -to-Image Generation with Realistic Hand Appearances |
| Handwriting Recognition: Tablet PC | text | Input |
| handwritten ancient | text | detector based on improved feature pyramid network, A |
| Handwritten and Machine Printed | text | Separation in Document Images Using the Bag of Visual Words Paradigm |
| Handwritten and Printed | text | Segmentation: A Signature Case Study |
| Handwritten and Printed | text | Separation: Linearity and Regularity Assessment |
| Handwritten and Typewritten | text | Identification and Recognition Using Hidden Markov Models |
| Handwritten Arabic | text | recognition using Deep Belief Networks |
| Handwritten Arabic | text | recognition using multi-stage sub-core-shape HMMs |
| Handwritten Chinese | text | line segmentation by clustering with distance metric learning |
| Handwritten Chinese | text | Recognition by Integrating Multiple Contexts |
| Handwritten Chinese/Japanese | text | Recognition Using Semi-Markov Conditional Random Fields |
| Handwritten document image segmentation into | text | lines and words |
| Handwritten Signature and | text | based User Verification using Smartwatch |
| Handwritten | text | Generation from Visual Archetypes |
| Handwritten | text | Generation via Disentangled Representations |
| Handwritten | text | Line Identification in Indian Scripts |
| Handwritten | text | Line Segmentation by Shredding Text into its Lines |
| Handwritten | text | Line Segmentation by Shredding Text into its Lines |
| Handwritten | text | Localization in Skewed Documents |
| Handwritten | text | Recognition for Marriage Register Books |
| Handwritten | text | recognition through writer adaptation |
| Handwritten | text | Retrieval Using Two-Stage Pattern Matching with Handwritten Query |
| Handwritten | text | segmentation using average longest path algorithm |
| Handwritten | text | Segmentation Using Elastic Shape Analysis |
| Handwritten | text | Separation from Annotated Machine Printed Documents Using Markov Random Fields |
| Handwritten | text | s for Personality Identification Using Convolutional Neural Networks |
| Harivo: Harnessing | text | -to-image Models for Video Generation |
| Harnessing | text | Insights With Visual Alignment for Medical Image Segmentation |
| Harnessing | text | -to-image Diffusion Models for Category-agnostic Pose Estimation |
| Harnessing the Power of MLLMs for Transferable | text | -to-Image Person ReID |
| Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing | text | Spotting Performance |
| Harnessing the Power of | text | -image Contrastive Models for Automatic Detection of Online Misinformation |
| Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity | text | -to-Image Synthesis |
| HD-Fusion: Detailed | text | -to-3D Generation Leveraging Multiple Noise Estimation |
| Head-Mounted Device for Recognizing | text | in Natural Scenes, A |
| HeadEvolver: | text | to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation |
| Headstudio: | text | to Animatable Head Avatars with 3d Gaussian Splatting |
| Heterogeneous Graph to Abstract Syntax Tree Framework for | text | -to-SQL, A |
| HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse | text | -to-3D Generation |
| HFENet: Hybrid Feature Enhancement Network for Detecting | text | s in Scenes and Traffic Panels |
| HGAN: Hierarchical Graph Alignment Network for Image- | text | Retrieval |
| HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene | text | Detection |
| Hi-SAM: Marrying Segment Anything Model for Hierarchical | text | Segmentation |
| Hidden Bawls, Whispers, and Yelps: Can | text | Convey the Sound of Speech, Beyond Words? |
| Hidden Markov Model for Language Syntax in | text | Recognition, A |
| Hidden Markov Model-Based Ensemble Methods for Offline Handwritten | text | Line Recognition |
| Hierarchical Adaptive Filtering Network for | text | Image Specular Highlight Removal |
| Hierarchical Feature Aggregation Based on Transformer for Image- | text | Matching |
| Hierarchical online NMF for detecting and tracking topic hierarchies in a | text | stream |
| Hierarchical Shape Primitive Features for Online | text | -independent Writer Identification |
| Hierarchical Spatio-temporal Decoupling for | text | -to-Video Generation |
| Hierarchical | text | Spotter for Joint Text Spotting and Layout Analysis |
| Hierarchical | text | Spotter for Joint Text Spotting and Layout Analysis |
| Hierarchically-Fused Generative Adversarial Network for | text | to Realistic Image Synthesis |
| HierCode: A lightweight hierarchical codebook for zero-shot Chinese | text | recognition |
| HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic | text | Classification |
| high-capacity | text | watermarking method based on geometric micro-distortion, A |
| High-Dimensional Access Method for Approximated Similarity Search in | text | Mining, A |
| Highly Transparent and Secure Scheme for Concealing | text | Within Audio |
| Histogram-Based Two-Stage Adaptive Character Segmentation for Transcription of Inter-Point Hindi Braille to | text | , A |
| Historical Handwritten | text | Images Word Spotting Through Sliding Window HOG Features |
| HiT: Hierarchical Transformer with Momentum Contrast for Video- | text | Retrieval |
| HMM-Based Approach for | text | Region Detection in Coded Video Bitstreams |
| HMM-Based Multi Oriented | text | Recognition in Natural Scene Image |
| HMM-Based Recognizer with Segmentation-free Strategy for Unconstrained Chinese Handwritten | text | |
| HOI-Diff: | text | -Driven Synthesis of 3D Human-Object Interactions using Diffusion Models |
| HOIAnimator: Generating | text | -Prompt Human-Object Animations Using Novel Perceptive Diffusion Models |
| Holistic Features are Almost Sufficient for | text | -to-Video Retrieval |
| Holistic Vertical Regional Proposal Network for Scene | text | Detection |
| HOVER: Hyperbolic Video- | text | Retrieval |
| How Good Is Good Enough? Establishing Quality Thresholds for the Automatic | text | Analysis of Retro-Digitized Comics |
| How is Visual Attention Influenced by | text | Guidance? Database and Model |
| How Much Handwritten | text | Is Needed for Text-Independent Writer Verification and Identification |
| How Much Handwritten | text | Is Needed for Text-Independent Writer Verification and Identification |
| How to Make Cross Encoder a Good Teacher for Efficient Image- | text | Retrieval? |
| HowTo100M: Learning a | text | -Video Embedding by Watching Hundred Million Narrated Video Clips |
| HRS-Bench: Holistic, Reliable and Scalable Benchmark for | text | -to-Image Models |
| HTD: A Fast Human-centered | text | -locating Method for Auxiliary Reading |
| HTR-VT: Handwritten | text | recognition with vision transformer |
| Human Motion Aware | text | -to-Video Generation with Explicit Camera Control |
| Human Preference Score: Better Aligning | text | -to-image Models with Human Preference |
| Human-centered Interactive Learning via MLLMs for | text | -to-Image Person Re-identification |
| HumanGaussian: | text | -Driven 3D Human Generation with Gaussian Splatting |
| Hybrid Algorithm for Con | text | ual Text Recognition, A |
| Hybrid approach for Farsi/Arabic | text | detection and localisation in video frames |
| Hybrid Approach to Detect and Localize | text | s in Natural Scene Images, A |
| Hybrid Approach to Detect | text | s in Natural Scenes by Integration of a Connected-Component Method and a Sliding-Window Method, A |
| Hybrid approach to efficient | text | extraction in complex color images |
| Hybrid Chinese/English | text | detection in images and video frames |
| Hybrid Con | text | ual Text Recognition with String Matching |
| Hybrid Deep Architecture for Robust Recognition of | text | Lines of Degraded Printed Documents, A |
| hybrid method based on estimation of distribution algorithms to train convolutional neural networks for | text | categorization, A |
| Hybrid Network For End-To-End | text | -Independent Speaker Identification |
| Hybrid R-BILSTM-C Neural Network Based | text | Steganalysis, A |
| Hybrid word/Part-of-Arabic-Word Language Models for arabic | text | document recognition |
| HybridEditDif: | text | and Exemplar Guided Image Editing with Diffusion Models |
| HYPE: Hyperbolic Entailment Filtering for Underspecified Images and | text | s |
| Hyper-3DG: | text | -to-3D Gaussian Generation via Hypergraph |
| HyperDreamBooth: HyperNetworks for Fast Personalization of | text | -to-Image Models |
| HyperStyle3D: | text | -Guided 3D Portrait Stylization via Hypernetworks |
| Hypothesis Preservation Approach to Scene | text | Recognition with Weighted Finite-State Transducer |
| hypothesize-and-verify framework for | text | recognition using deep recurrent neural networks, A |
| Hy | text | : A Scene-Text Extraction Method for Video Retrieval |
| I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for | text | -Guided Multi-Mask Inpainting |
| I2T2I: Learning | text | to image synthesis with textual data augmentation |
| I2T: Image Parsing to | text | Description |
| I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-Shaped Scene | text | Detection |
| IAM-OnDB: An on-line English sentence database acquired from handwritten | text | on a whiteboard |
| IBM Rich Transcription 2007 Speech-to- | text | Systems for Lecture Meetings, The |
| IBN-STR: A Robust | text | Recognizer for Irregular Text in Natural Scenes |
| IBN-STR: A Robust | text | Recognizer for Irregular Text in Natural Scenes |
| ICA Based Approach for Complex Color Scene | text | Binarization, An |
| ICDAR 2005 | text | locating competition results |
| ICDAR 2011 Robust Reading Competition - Challenge 1: Reading | text | in Born-Digital Images (Web and Email) |
| ICDAR 2011 Robust Reading Competition Challenge 2: Reading | text | in Scene Images |
| ICDAR 2011: Arabic Recognition Competition: Multi-font Multi-size Digitally Represented | text | |
| ICDAR 2015 competition HTRtS: Handwritten | text | Recognition on the tranScriptorium dataset |
| ICDAR 2015 competition on | text | line detection in historical documents |
| ICDAR 2015 contest on MultiSpectral | text | Extraction (MS-TEx 2015) |
| ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic | text | |
| ICDAR2015 competition on | text | Image Super-Resolution |
| ICPR 2020 Competition on | text | Block Segmentation on a NewsEye Dataset |
| ICPR 2020 Competition on | text | Block Segmentation on a Newseye Dataset |
| ICPR2016 contest on Arabic | text | detection and Recognition in video frames - AcTiVComp |
| ICPR2020 Competition on | text | Detection and Recognition in Arabic News Video Frames |
| ICPR2020 Competition on | text | Detection and Recognition in Arabic News Video Frames |
| ICT-QA: Question Answering Over Multi-Modal Con | text | s Including Image, Chart, and Text Modalities |
| IDAdapter: Learning Mixed Features for Tuning-Free Personalization of | text | -to-Image Models |
| IDBNet: Improved differentiable binarisation network for natural scene | text | detection |
| IDEA: Inverted | text | with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification |
| Identification of personality traits from handwritten | text | documents using multi-label classification models |
| Identifying Handwritten | text | in Mixed Documents |
| Identifying SCADA Systems and Their Vulnerabilities on the Internet of Things: A | text | -Mining Approach |
| Identifying visual attributes for object recognition from | text | and taxonomy |
| Identity-Preserving | text | -To-Video Generation by Frequency Decomposition |
| iEdit: Localised | text | -guided Image Editing with Weak Supervision |
| Illegible | text | to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks |
| Illegible | text | to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks |
| Illusion of Unlearning: The Unstable Nature of Machine Unlearning in | text | -to-Image Diffusion Models, The |
| iLogBook: Enabling | text | -Searchable Event Query Using Sparse Vehicle-Mounted GPS Data |
| Im2 | text | and Text2Im: Associating Images and Texts for Cross-Modal Retrieval |
| Image and | text | Coupling for Creating Electronic Books from Manuscripts |
| Image and | text | fusion for UPMC Food-101 using BERT and CNNs |
| Image Binarization for End-to-End | text | Understanding in Natural Images |
| Image clustering using generated | text | centroids |
| Image Dataset of | text | Patches in Everyday Scenes, An |
| Image Generation Method of Bird | text | Based on Improved StackGAN |
| Image is Worth Multiple Words: Multi-Attribute Inversion for Constrained | text | -To-Image Synthesis, An |
| Image Over | text | : Transforming Formula Recognition Evaluation with Character Detection Matching |
| Image Overlay | text | Detection Based on JPEG Truncation Error Analysis |
| Image Retrieval for Visual Localization via Scene | text | Detection and Logo Filtering |
| Image Search With | text | Feedback by Visiolinguistic Attention Learning |
| Image Segmentation Using | text | and Image Prompts |
| Image | text | Detection Using a Bandlet-Based Edge Detector and Stroke Width Transform |
| Image-based Document Vectors for | text | Retrieval |
| Image- | text | Co-Decomposition for Text-Supervised Semantic Segmentation |
| Image- | text | Co-Decomposition for Text-Supervised Semantic Segmentation |
| Image- | text | Embedding Learning via Visual and Textual Semantic Reasoning |
| Image- | text | feature learning for unsupervised visible-infrared person re-identification |
| Image- | text | Matching, Image Text Retrieval, Image-Text Retrieval |
| Image- | text | Matching, Image Text Retrieval, Image-Text Retrieval |
| Image- | text | Matching, Image Text Retrieval, Image-Text Retrieval |
| Image- | text | Multimodal Emotion Classification via Multi-View Attentional Network |
| Image- | text | Pre-Training for Logo Recognition |
| Image- | text | Retrieval With Cross-Modal Semantic Importance Consistency |
| Image- | text | -Image Knowledge Transfer for Lifelong Person Re-Identification With Hybrid Clothing States |
| Image-to-Character-to-Word Transformers for Accurate Scene | text | Recognition |
| Image-to- | text | Conversion and Aspect-Oriented Filtration for Multimodal Aspect-Based Sentiment Analysis |
| Image/ | text | filtering system and method |
| Imaged Document | text | Retrieval Without OCR |
| Imagen Editor and EditBench: Advancing and Evaluating | text | -Guided Image Inpainting |
| Imagic: | text | -Based Real Image Editing with Diffusion Models |
| IMMA: Immunizing | text | -to-image Models Against Malicious Adaptation |
| Impact of Character Models Choice on Arabic | text | Recognition Performance |
| Impact of OCR Accuracy and Feature Transformation on Automatic | text | Classification, The |
| Impact of OCR Errors on Automated Classification of OCR Japanese | text | s with Parts-of-Speech Analysis, An |
| Impact of online handwriting recognition performance on | text | categorization |
| Impact of Pre-Processing on Recognition of Cursive Video | text | |
| Imperceptible Backdoor Attacks on | text | -Guided 3D Scene Grounding |
| Implementation of Advanced Encryption Standard for encryption and decryption of images and | text | on a GPU |
| Implementation of Three | text | to Speech Systems for Kurdish Language |
| Implicit Bias Injection Attacks against | text | -to-Image Diffusion Models |
| Implicit Feature Alignment: Learn to Convert | text | Recognizer to Text Spotter |
| Implicit Feature Alignment: Learn to Convert | text | Recognizer to Text Spotter |
| Improved Component Tree Based Approach to User-Intention Guided | text | Extraction from Natural Scene Images, An |
| Improved Document Skew Detection Based on | text | Line Connected-component Clustering |
| Improved Gini-Index Algorithm to Correct Feature-Selection Bias in | text | Classification |
| Improved Legibility of | text | for Multiprojector Tiled Displays |
| Improved localization accuracy by LocNet for Faster R-CNN based | text | detection in natural scene images |
| Improved Method Based on Weighted Grid Micro-structure Feature for | text | -Independent Writer Recognition, An |
| Improved SAR Ship Classification Method Using | text | -to-Image Generation-Based Data Augmentation and Squeeze and Excitation, An |
| Improved Scene | text | Extraction Method Using Conditional Random Field and Optical Character Recognition, An |
| Improved shot boundary detection method based on | text | edges |
| Improved | text | -detection methods for a camera-based text reading system for blind persons |
| Improved | text | -detection methods for a camera-based text reading system for blind persons |
| Improved Zero-Shot Classification by Adapting VLMs with | text | Descriptions |
| Improvement of video | text | recognition by character selection |
| Improving accuracy of arbitrary-shaped | text | detection using ResNet-152 backbone-based pixel aggregation network |
| Improving Cross-Modal Constraints: | text | Attribute Person Search With Graph Attention Networks |
| Improving Cross-Modal Image- | text | Retrieval With Teacher-Student Learning |
| Improving Description-Based Person Re-Identification by Multi-Granularity Image- | text | Alignments |
| Improving distinctiveness in video captioning with | text | -video similarity |
| Improving End-to-End | text | Image Translation From the Auxiliary Text Translation Task |
| Improving End-to-End | text | Image Translation From the Auxiliary Text Translation Task |
| Improving Faithfulness of | text | -to-Image Diffusion Models through Inference Intervention |
| Improving Fine-Grained Understanding for Retrieval in Human Motion and | text | |
| Improving Full- | text | Precision on Short Queries Using Simple Constraints |
| Improving Handwritten Chinese | text | Recognition by Confidence Transformation |
| Improving handwritten Chinese | text | recognition using neural network language models and convolutional neural network shape models |
| Improving Image Recognition by Retrieving from Web-Scale Image- | text | Data |
| Improving image similarity measures for image browsing and retrieval through latent space learning between images and long | text | s |
| Improving Image- | text | Matching by Integrating Word Sense Disambiguation |
| Improving Image- | text | Matching With Bidirectional Consistency of Cross-Modal Alignment |
| Improving Multi-class | text | Classification with Naive Bayes |
| Improving Multiclass | text | Classification with the Support Vector Machine |
| Improving OCR | text | Categorization Accuracy with Electronic Abstracts |
| Improving Offline Handwritten | text | Recognition with Hybrid HMM/ANN Models |
| Improving Open-Vocabulary Scene | text | Recognition |
| Improving patch-based scene | text | script identification with ensembles of conjoined networks |
| Improving Persian | text | Classification Using Persian Thesaurus |
| Improving Scene | text | Detection by Scale-Adaptive Segmentation and Weighted CRF Verification |
| Improving | text | Classifier Performance based on AUC |
| Improving | text | -Based Person Search by Spatial Matching and Adaptive Threshold |
| Improving | text | -guided Object Inpainting with Semantic Pre-inpainting |
| Improving | text | -image Matching with Adversarial Learning and Circle Loss for Multi-modal Steganography |
| Improving Vision-and-language Navigation with Image- | text | Pairs from the Web |
| IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image- | text | Retrieval |
| In-air Handwritten Chinese | text | Recognition with Attention Convolutional Recurrent Network |
| In-air handwritten Chinese | text | recognition with temporal convolutional recurrent network |
| In-Style: Bridging | text | and Uncurated Videos with Style Transfer for Text-Video Retrieval |
| In-Style: Bridging | text | and Uncurated Videos with Style Transfer for Text-Video Retrieval |
| Incorporating Language Syntax in Visual | text | Recognition with a Statistical-Model |
| Incorporating Self-attention Mechanism and Multi-task Learning into Scene | text | Detection |
| Incremental Approach to | text | Representation, Categorization, and Retrieval, An |
| Incremental Detection of | text | on Road Signs |
| Incremental | text | -to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model |
| Indexing On-line Handwritten | text | s Using Word Confusion Networks |
| Indexing | text | Events in Digital Video Databases |
| InducT-GCN: Inductive Graph Convolutional Networks for | text | Classification |
| Industrial Scene | text | Detection With Refined Feature-Attentive Network |
| Inferential Rules for Identifying Answers in TOEFL | text | s |
| Inferring Semantic Layout for Hierarchical | text | -to-Image Synthesis |
| Infinite Liouville mixture models with application to | text | and texture categorization |
| Inflation with Diffusion: Efficient Temporal Adaptation for | text | -to-Video Super-Resolution |
| Influence of | text | line segmentation in Handwritten Text Recognition |
| Influence of | text | line segmentation in Handwritten Text Recognition |
| Information Detection for the Process of Typhoon Events in Microblog | text | : A Spatio-Temporal Perspective |
| Information Extraction and Classification from Free | text | Using a Neural Approach |
| Information extraction from scanned invoice images using | text | analysis and layout features |
| Information fusion for | text | classification an experimental comparison |
| Information Theoretic | text | Classification Using the Ziv-Merhav Method |
| InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot | text | -based Video Editing |
| Initialized and Guided EM-clustering of Sparse Binary Data with Application to | text | Based Documents |
| Initno: Boosting | text | -to-Image Diffusion Models via Initial Noise Optimization |
| Injecting | text | Clues for Improving Anomalous Event Detection From Weakly Labeled Videos |
| InNeRF360: | text | -Guided 3D-Consistent Object Inpainting on 360° Neural Radiance Fields |
| inpainting system for automatic image structure- | text | ure restoration with text removal, An |
| Inspecting the Geographical Representativeness of Images from | text | -to-Image Models |
| Instance-wise distribution control of | text | -to-image diffusion models |
| InstanceCap: Improving | text | -to-Video Generation via Instance-aware Structured Caption |
| Instant3D: Instant | text | -to-3D Generation |
| InstantBooth: Personalized | text | -to-Image Generation without Test-Time Finetuning |
| Instruction-Augmented Multimodal Alignment for Image- | text | and Element Matching |
| Instruction-Guided Scene | text | Recognition |
| Instructive3D: Editing Large Reconstruction Models with | text | Instructions |
| Instrumental Assessment of Prosodic Quality for | text | -to-Speech Signals |
| Integrated Algorithm for | text | Recognition: Comparison with a Cascaded Algorithm, An |
| Integrated | text | and Line-Art Extraction from a Topographic Map |
| Integrating Geometric Con | text | for Text Alignment of Handwritten Chinese Documents |
| Integrating Knowledge Sources in Devanagari | text | Recognition System |
| Integrating Language Guidance Into Image- | text | Matching for Correcting False Negatives |
| Integrating Language Model in Handwritten Chinese | text | Recognition |
| Integrating multiple character proposals for robust scene | text | extraction |
| Integrating Visual, Audio and | text | Analysis for News Video |
| Integrating word level knowledge in | text | recognition |
| Integration of Linguistic and Geospatial Features Using Global Con | text | Embedding for Automated Text Geocoding, The |
| Intelligent Typography: Artistic | text | Style Transfer for Complex Texture and Structure |
| IntelliSearch: Intelligent Search for Images and | text | on the Web |
| Inter-Intra Modal Representation Augmentation With DCT-Transformer Adversarial Network for Image- | text | Matching |
| InteractDiffusion: Interaction Control in | text | -to-Image Diffusion Models |
| Interactive Enhancement of Handwritten | text | through Multi-resolution Gaussian |
| Interactive Image Manipulation with Complex | text | Instructions |
| Interactive Off-Line Handwritten | text | Transcription Using On-Line Handwritten Text as Feedback |
| Interactive Off-Line Handwritten | text | Transcription Using On-Line Handwritten Text as Feedback |
| Interactive System to Extract Structured | text | from a Geometrical Representation, An |
| Interactive | text | books; Embedding Image Processing Operator Demonstrations in Text |
| Interfusion: | text | -driven Generation of 3d Human-object Interaction |
| Interleaved | text | /image Deep Mining on a large-scale radiology database |
| Interpretation of The Function of The Obelisk of Augustus in Rome From Antique | text | s to Present Time Virtual Reconstruction |
| Interword distance changes represented by sine waves for watermarking | text | images |
| Intra-modal consistency for image- | text | retrieval through soft-label distillation |
| Intra-Modal Constraint Loss for Image- | text | Retrieval |
| Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in | text | -to-Image Generative Models |
| Inverse-Like Antagonistic Scene | text | Spotting via Reading-Order Estimation and Dynamic Sampling |
| Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of | text | -to-Video Diffusion Models |
| Investigation on LLMs' Visual Understanding Ability Using SVG for Image- | text | Bridging, An |
| Investigations in Psychological Stress Detection from Social Media | text | using Deep Architectures |
| IOS-Net: An inside-to-outside supervision network for scale robust | text | detection in the wild |
| IPAD: Iterative, Parallel, and Diffusion-Based Network for Scene | text | Recognition |
| Irregular | text | block recognition via decoupling visual, linguistic, and positional information |
| Is An Image Worth Five Sentences? A New Look into Semantics for Image- | text | Matching |
| Is Arabic | text | categorization a solved task? |
| ISL RT-07 Speech-to- | text | System, The |
| ISTD-DLA: Industrial Scene | text | Detection Method Based on Dynamic Local-Aware Aggregation Network |
| It's All About The Scale: Efficient | text | Detection Using Adaptive Scaling |
| ITACLIP: Boosting Training-Free Semantic Segmentation with Image, | text | , and Architectural Enhancements |
| IterVM: Iterative Vision Modeling Module for Scene | text | Recognition |
| ITI-Gen: Inclusive | text | -to-Image Generation |
| JECL: Joint Embedding and Cluster Learning for Image- | text | Pairs |
| JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized | text | -to-Image Generation |
| Joint architecture and knowledge distillation in CNN for Chinese | text | recognition |
| Joint embeddings with multimodal cues for video- | text | retrieval |
| Joint Handwritten | text | Recognition and Word Classification for Tabular Information Extraction |
| Joint Image- | text | News Topic Detection and Tracking by Multimodal Topic And-Or Graph |
| Joint Inference of Objects and Scenes With Efficient Learning of | text | -Object-Scene Relations |
| Joint Intra & Inter-Grained Reasoning: A New Look Into Semantic Consistency of Image- | text | Retrieval |
| Joint representation learning for | text | and 3D point cloud |
| Joint stroke classification and | text | line grouping in online handwritten documents with edge pooling attention networks |
| Joint Token and Feature Alignment Framework for | text | -Based Person Search |
| Joint Video and | text | Parsing for Understanding Events and Answering Queries |
| Joint Visual Semantic Reasoning: Multi-Stage Decoder for | text | Recognition |
| Jointdreamer: Ensuring Geometry Consistency and | text | Congruence in Text-to-3d Generation via Joint Score Distillation |
| Jointdreamer: Ensuring Geometry Consistency and | text | Congruence in Text-to-3d Generation via Joint Score Distillation |
| JPEG2000 Compatible Watermarking of | text | in Images |
| Kanji recognition in scene images without detection of | text | fields: robust against variation of viewpoint, contrast, and background texture |
| KDProR: A Knowledge-decoupling Probabilistic Framework for Video- | text | Retrieval |
| Kernel Adaptive Convolution for Scene | text | Detection via Distance Map Prediction |
| kernel trick for sequences applied to | text | -independent speaker verification systems, A |
| Kernel-Based Mixture Mapping for Image and | text | Association |
| Keystroke Biometric Recognition Studies on Long- | text | Input under Ideal and Application-Oriented Conditions |
| Keyword spotting in handwritten documents based on a generic | text | line HMM and a SVM verification |
| Keyword Spotting in Online Handwritten Documents Containing | text | and Non-text Using BLSTM Neural Networks |
| Keyword Spotting in Online Handwritten Documents Containing | text | and Non-text Using BLSTM Neural Networks |
| KHATT: An open Arabic offline handwritten | text | database |
| KHATT: Arabic Offline Handwritten | text | Database |
| Khmerst: A Low-resource Khmer Scene | text | Detection and Recognition Benchmark |
| Knowing Where to Focus: Attention-Guided Alignment for | text | -based Person Search |
| Knowledge Mining with Scene | text | for Fine-Grained Recognition |
| Knowledge-Driven Generative Adversarial Network for | text | -to-Image Synthesis |
| KOHTD: Kazakh offline handwritten | text | dataset |
| KT-GAN: Knowledge-Transfer Generative Adversarial Network for | text | -to-Image Synthesis |
| K | text | : Arbitrary shape text detection using modified K-Means |
| L-Verse: Bidirectional Generation Between Image and | text | |
| Label embedding for | text | recognition |
| Label Embedding: A Frugal Baseline for | text | Recognition |
| Label Incorporated Graph Neural Networks for | text | Classification |
| Label or Message: A Large-Scale Experimental Survey of | text | s and Objects Co-Occurrence |
| LAM Dataset: A Novel Benchmark for Line-Level Handwritten | text | Recognition, The |
| Language Adaptive Methodology for Handwritten | text | Line Segmentation |
| Language Identification for Printed | text | Independent of Segmentation |
| Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene | text | Detection and Spotting |
| language model using variable length tokens for open-vocabulary Hangul | text | recognition, A |
| Language-Aware Soft Prompting: | text | -to-Text Optimization for Few- and Zero-Shot Adaptation of V&L Models |
| Language-Aware Soft Prompting: | text | -to-Text Optimization for Few- and Zero-Shot Adaptation of V&L Models |
| Language-Independent | text | Lines Extraction Using Seam Carving |
| Language-Independent | text | -Line Extraction Algorithm for Handwritten Documents |
| Laplacian Approach to Multi-Oriented | text | Detection in Video, A |
| Laplacian Method for Video | text | Detection, A |
| Large scalability in document image matching using | text | retrieval |
| Large Scale Scene | text | Verification with Guided Attention |
| Large-Lexicon Attribute-Consistent | text | Recognition in Natural Images |
| Large-Scale | text | -to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator |
| LarTap: A Luminance-Aware Framework With | text | -Correlation Priors for Multi-Exposure Image Fusion |
| LASP: | text | -to-Text Optimization for Language-Aware Soft Prompting of Vision and Language Models |
| LASP: | text | -to-Text Optimization for Language-Aware Soft Prompting of Vision and Language Models |
| Latent Guard: A Safety Framework for | text | -to-image Generation |
| Latenteditor: | text | Driven Local Editing of 3d Scenes |
| LaTeRF: Label and | text | Driven Object Radiance Fields |
| LaTr: Layout-Aware Transformer for Scene- | text | VQA |
| Latte3d: Large-scale Amortized | text | -to-enhanced3d Synthesis |
| LATTECLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic | text | s |
| Layerdiff: Exploring | text | -guided Multi-layered Composable Image Synthesis via Layer-collaborative Diffusion Model |
| layered method for determining manga | text | bubble reading order, A |
| Layout and language: exploring | text | block discovery in tables using linguistic resources |
| Layout-Agnostic Scene | text | Image Synthesis with Diffusion Models |
| Layout-Bridging | text | -to-Image Synthesis |
| LayoutFormer: Hierarchical | text | Detection Towards Scene Text Understanding |
| LayoutFormer: Hierarchical | text | Detection Towards Scene Text Understanding |
| LCM-Lookahead for Encoder-based | text | -to-image Personalization |
| LD-ZNet: A Latent Diffusion Approach for | text | -Based Image Segmentation |
| Learn to Augment: Joint Data Augmentation and Network Optimization for | text | Recognition |
| Learned Image Compression with | text | Quality Enhancement |
| Learning a Limited | text | Space for Cross-Media Retrieval |
| Learning Aligned Image- | text | Representations Using Graph Attentive Relational Network |
| Learning analytics system for assessing students' performance quality and | text | mining in online communication |
| Learning and Integrating Multi-Level Matching Features for Image- | text | Retrieval |
| Learning Audio-guided Video Representation with Gated Attention for Video- | text | Retrieval |
| Learning bottom-up | text | attention maps for text detection using stroke width transform |
| Learning bottom-up | text | attention maps for text detection using stroke width transform |
| Learning by Imagination: A Joint Framework for | text | -Based Image Manipulation and Change Captioning |
| Learning CLIP Guided Visual- | text | Fusion Transformer for Video-based Pedestrian Attribute Recognition |
| Learning Coarse-to-Fine Graph Neural Networks for Video- | text | Retrieval |
| Learning confidence transformation for handwritten Chinese | text | recognition |
| Learning Continuous 3D Words for | text | -to-Image Generation |
| Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using | text | and Sketch |
| Learning Deep Structure-Preserving Image- | text | Embeddings |
| Learning Disentangled Identifiers for Action-Customized | text | -to-Image Generation |
| Learning Dual Semantic Relations With Graph Attention for Image- | text | Matching |
| Learning From Short | text | Streams With Topic Drifts |
| Learning From | text | : A Multimodal Face Inpainting Network for Irregular Holes |
| Learning from Video and | text | via Large-Scale Discriminative Clustering |
| Learning Generative Structure Prior for Blind | text | Image Super-resolution |
| Learning Linguistic Association Towards Efficient | text | -Video Retrieval |
| Learning Markov Clustering Networks for Scene | text | Detection |
| Learning Multi-Dimensional Human Preference for | text | -to-Image Generation |
| Learning multi-view embedding in joint space for bidirectional image- | text | retrieval |
| Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image- | text | Matching |
| Learning Semantic Polymorphic Mapping for | text | -Based Person Retrieval |
| Learning Semantic Relationship among Instances for Image- | text | Matching |
| Learning Semantic | text | Features for Web Text-Aided Image Classification |
| Learning Semantic | text | Features for Web Text-Aided Image Classification |
| Learning Shape-Aware Embedding for Scene | text | Detection |
| Learning Shape-Color Diffusion Priors for | text | -Guided 3D Object Generation |
| Learning Spatial-Semantic Con | text | with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition |
| Learning Spatially-Variable Filters for Super-Resolution of | text | |
| Learning | text | -Line Segmentation Using Codebooks and Graph Partitioning |
| Learning | text | -to-Video Retrieval from Image Captioning |
| Learning the Lexicon from raw | text | s for open-vocabulary Korean word recognition |
| Learning to Detect Scene | text | Using a Higher-Order MRF with Belief Propagation |
| Learning to detect, localize and recognize many | text | objects in document images from few examples |
| Learning to Embed Semantic Similarity for Joint Image- | text | Retrieval |
| Learning to Generate Semantic Layouts for Higher | text | -Image Correspondence in Text-to-Image Synthesis |
| Learning to Generate Semantic Layouts for Higher | text | -Image Correspondence in Text-to-Image Synthesis |
| Learning to Generate | text | -Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs |
| Learning to Generate | text | -Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs |
| Learning to Group | text | Lines and Regions in Freeform Handwritten Notes |
| Learning to Localize Actions in Instructional Videos with Llm-based Multi-pathway | text | -video Alignment |
| Learning to Read L'Infinito: Handwritten | text | Recognition with Synthetic Training Data |
| Learning to Sample Effective and Diverse Prompts for | text | -to-Image Generation |
| Learning to Sort Handwritten | text | Lines in Reading Order through Estimated Binary Order Relations |
| Learning to summarize web image and | text | mutually |
| Learning to Super-Resolve Blurry Face and | text | Images |
| Learning transferable features in meta-learning for few-shot | text | classification |
| Learning Two-Branch Neural Networks for Image- | text | Matching Tasks |
| Learning Visual Compound Models from Parallel Image- | text | Datasets |
| Learning Visual Generative Priors without | text | |
| Lecture Video Enhancement and Editing by Integrating Posture, Gesture, and | text | |
| LEDITS++: Limitless Image Editing Using | text | -to-Image Models |
| LeftRefill: Filling Right Canvas based on Left Reference through Generalized | text | -to-Image Diffusion Model |
| Legit: | text | Legibility for User-Generated Media |
| Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in | text | -to-image Diffusion Models |
| Length Normalization in Degraded | text | Collections |
| Length-sensitive Language-bound Recognition Network for Multilingual | text | Recognition, A |
| Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in | text | -to-Image Alignment |
| Leveraging Smart Devices for Scene | text | Preserved Image Stylization: A Deep Gaming Approach |
| Leveraging Style and Content features for | text | Conditioned Image Retrieval |
| Leveraging surrounding con | text | for scene text detection |
| Leveraging | text | Localization for Scene Text Removal via Text-aware Masked Image Modeling |
| Leveraging | text | Localization for Scene Text Removal via Text-aware Masked Image Modeling |
| Leveraging | text | Localization for Scene Text Removal via Text-aware Masked Image Modeling |
| Leveraging the Mixed- | text | Segmentation Problem to Design Secure Handwritten CAPTCHAs |
| Lexicon based feature extraction for emotion | text | classification |
| Lexicon Generation for Emotion Detection from | text | |
| Lexicon-based offline recognition of Amharic words in unconstrained handwritten | text | |
| LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image- | text | Sparse Retrieval |
| Lifelong Learning for | text | Steganalysis Based on Chronological Task Sequence |
| Light Weight | text | Extraction Technique for Hand-Held Device, A |
| light-weight | text | image processing method for handheld embedded cameras, A |
| Lightweight Attentional Feature Fusion: A New Baseline for | text | -to-Video Retrieval |
| Lightweight dynamic conditional GAN with pyramid attention for | text | -to-image synthesis |
| Lightweight Multi-Grained Image- | text | Retrieval Paradigm via Cascaded Representation Learning and Parameter-Free Feature Aggregation, A |
| Lightweight | text | -Driven Image Editing With Disentangled Content and Attributes |
| LIMITR: Leveraging Local Information for Medical Image- | text | Representation |
| Line Segmentation for Grayscale | text | Images of Khmer Palm Leaf Manuscripts |
| Line-Direction-Free and Character-Orientation-Free On-Line Handwritten Japanese | text | Recognition System, A |
| Linecounter: Learning Handwritten | text | Line Segmentation By Counting |
| LinGen: Towards High-Resolution Minute-Length | text | -to-Video Generation with Linear Computational Complexity |
| Linguistic Hallucination for | text | -Based Video Retrieval |
| Linguistic integration information in the aabatas arabic | text | analysis system |
| Linguistic Steganalysis via | text | Dual Attention Fusing Statistical and Multi-Layer Semantic Features |
| Linguistics-aware Masked Image Modeling for Self-supervised Scene | text | Recognition |
| Link the Head to the Beak: Zero Shot Learning from Noisy | text | Description at Part Precision |
| Linking Image and | text | with 2-Way Nets |
| Linking | text | and visual concepts semantically for cross modal multimedia search |
| LISTER: Neighbor Decoding for Length-Insensitive Scene | text | Recognition |
| LiT: Zero-Shot Transfer with Locked-Image | text | Tuning |
| Livephoto: Real Image Animation with | text | -guided Motion Control |
| Local Action-guided Motion Diffusion Model for | text | -to-motion Generation |
| Local Binary Pattern-Based Features for | text | Identification of Web Images |
| Local Gradient Difference Features for Classification of 2D-3D Natural Scene | text | Images |
| Local Skew Angle Estimation from Background Space in | text | Regions |
| Local variance image-based for scene | text | binarization under illumination effects |
| Local-enhanced representation for | text | -based person search |
| Local-Global Video- | text | Interactions for Temporal Grounding |
| Localization and Manipulation of Immoral Visual Cues for Safe | text | -to-Image Generation |
| Localization, extraction and recognition of | text | in Telugu document images |
| localization/verification scheme for finding | text | in images and video frames based on contrast independent features and machine learning methods, A |
| Localize, Group, and Select: Boosting | text | -VQA by Scene Text Modeling |
| Localize, Group, and Select: Boosting | text | -VQA by Scene Text Modeling |
| Localized Concept Erasure for | text | -to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation |
| Localizing and segmenting | text | in images and videos |
| Localizing blurry and low-resolution | text | in natural images |
| Localizing Object-level Shape Variations with | text | -to-Image Diffusion Models |
| Localizing scene | text | s by fuzzy inference systems and low rank matrix recovery model |
| Localizing | text | in Scene Images by Boundary Clustering, Stroke Segmentation, and String Fragment Classification |
| LOCAT: Localization-Driven | text | Watermarking via Large Language Models |
| Locating | text | in Color Documents |
| Locating | text | in Complex Color Images |
| Locating | text | in Images Based on the Smooth Gray-Level Detection |
| Locating | text | in images using matched wavelets |
| Locating Uniform-colored | text | in Video Frames |
| LocVTP: Video- | text | Pre-training for Temporal Localization |
| LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese | text | Line Recognition |
| LoGoPrompt: Synthetic | text | Images Can Be Good Visual Prompts for Vision-Language Models |
| Long-CLIP: Unlocking the Long- | text | Capability of CLIP |
| Long-FAS: Cross-domain face anti-spoofing with long | text | guidance |
| Longest Common Subsequence Algorithm Suitable for Similar | text | Strings, A |
| Look More Than Once: An Accurate Detector for | text | of Arbitrary Shapes |
| Looking at Words and Points with Attention: A Benchmark for | text | -to-Shape Coherence |
| Looking from a Higher-level Perspective: Attention and Recognition Enhanced Multi-scale Scene | text | Segmentation |
| LoSh: Long-Short | text | Joint Prediction Network for Referring Video Object Segmentation |
| lossy/lossless compression method for printed typeset bi-level | text | images based on improved pattern matching, A |
| Lost in Translation: Latent Concept Misalignment in | text | -to-image Diffusion Models |
| Lost Your Style? Navigating with Semantic-Level Approach for | text | -to-Outfit Retrieval |
| LoTeR: Localized | text | prompt refinement for zero-shot referring image segmentation |
| Low Complexity Sign Detection and | text | Localization Method for Mobile Applications, A |
| LucidDreamer: Towards High-Fidelity | text | -to-3D Generation via Interval Score Matching |
| LuoJiaHOG: A hierarchy oriented geo-aware image caption dataset for remote sensing image- | text | retrieval |
| L_0-Regularized Intensity and Gradient Prior for Deblurring | text | Images and Beyond |
| M-Adaptor: | text | -Driven Whole-Body Human Motion Generation |
| M2d2m: Multi-Motion Generation from | text | with Discrete Diffusion Models |
| M3TTS: Multi-modal | text | -to-speech of multi-scale style control for dubbing |
| MA-CRNN: a multi-scale attention CRNN for Chinese | text | line recognition in natural scenes |
| MAAN: Memory-Augmented Auto-Regressive Network for | text | -Driven 3D Indoor Scene Generation |
| MAC: Masked Contrastive Pre-Training for Efficient Video- | text | Retrieval |
| Machine Learning Approach to Hypothesis Decoding in Scene | text | Recognition, A |
| Machine printed | text | and handwriting identification in noisy document images |
| Machine reading of camera-held low quality | text | images: An ICA-based image enhancement approach for improving OCR accuracy |
| Machine reading of handwritten | text | information in field technician's maps |
| Machine recognition and correction of printed Arabic | text | |
| Machine Recognition of Multi Font Printed Arabic | text | s |
| Machine Recognition of Optically Captured Machine Printed Arabic | text | |
| Machine Recognition of Printed Kannada | text | |
| Machine-printed and hand-written | text | lines identification |
| MADA:Multi-Window Attention and Dual-Alignment for Image- | text | Retrieval |
| MAGAE: Multi-Level Alignment Over Aggregation Semantic Graph With Attribute Enhancement for | text | -Based Vehicle Retrieval |
| Magic3D: High-Resolution | text | -to-3D Content Creation |
| MAGIC: Multi-granularity domain adaptation for | text | recognition |
| MagicFusion: Boosting | text | -to-Image Generation Performance by Fusing Diffusion Models |
| Major Components of a Complete | text | Reading System |
| Make It Count: | text | -to-Image Generation with an Accurate Number of Objects |
| Make It Move: Controllable Image-to-Video Generation with | text | Descriptions |
| Make-A-Scene: Scene-Based | text | -to-Image Generation with Human Priors |
| Make-An-Animation: Large-Scale | text | -conditional 3D Human Motion Generation |
| Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from | text | |
| Making the Most of | text | Semantics to Improve Biomedical Vision-Language Processing |
| Making the V in | text | -VQA Matter |
| Mandarin | text | -to-Speech Front-End With Lightweight Distilled Convolution Network |
| Manga | text | Detection with Manga-specific Data Augmentation and Its Applications on Emotion Analysis |
| ManiCLIP: Multi-attribute Face Manipulation from | text | |
| ManiGAN: | text | -Guided Image Manipulation |
| ManiTrans: Entity-Level | text | -Guided Image Manipulation via Token-wise Semantic Alignment and Generation |
| MANTA: A Large-Scale Multi-View and Visual- | text | Anomaly Detection Dataset for Tiny Objects |
| Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video- | text | Retrieval |
| Marking | text | Documents |
| Marking | text | features of document images to deter illicit dissemination |
| Markov Model Order Optimization for | text | Recognition |
| Markov Random Field Based | text | Identification from Annotated Machine Printed Documents |
| MarkovGen: Structured Prediction for Efficient | text | -to-Image Generation |
| Markovian Engine for | text | Recognition: Cursive Arabic Text, Statistical Features and Interconnected HMMs, A |
| Markovian Engine for | text | Recognition: Cursive Arabic Text, Statistical Features and Interconnected HMMs, A |
| MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity | text | -to-3D Content Creation |
| Mask R-CNN With Pyramid Attention Network for Scene | text | Detection |
| Mask | text | spotter v3: Segmentation Proposal Network for Robust Scene Text Spotting |
| Mask | text | Spotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes |
| MaskDiffusion: Boosting | text | -to-Image Consistency with Conditional Mask |
| Masked and Permuted Implicit Con | text | Learning for Scene Text Recognition |
| Masked | text | Pre-Training for Scene Text Detection |
| Masked | text | Pre-Training for Scene Text Detection |
| MASTER: Multi-aspect non-local network for scene | text | recognition |
| Masterweaver: Taming Editability and Face Identity for Personalized | text | -to-image Generation |
| Mathematical properties of the native integral ratio handwriting and | text | extraction technique |
| Matryoshka Learning With Metric Transfer for Image- | text | Matching |
| Maxfusion: Plug&play Multi-modal Generation in | text | -to-Image Diffusion Models |
| Maximum Likelihood Discriminant Feature for | text | -Independent Speaker Verification |
| Maximum Margin Approach to Learning | text | Classifiers Methods, Theory and Algorithms, The |
| Maximum Spanning Trees For | text | Segmentation |
| maximum-likelihood approach to segmentation-based recognition of unconstrained handwriting | text | , A |
| MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex | text | -to-Image Generation |
| mDRA: A Multimodal Depression Risk Assessment Model Using Audio and | text | |
| MEAN: Multi - Element Attention Network for Scene | text | Recognition |
| Medblip: Bootstrapping Language-image Pretraining from 3d Medical Images and | text | s |
| Medical-Image Retrieval Based on Knowledge-Assisted | text | and Image Indexing |
| MedSyn: | text | -Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images |
| Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image- | text | Retrieval |
| Memory-Efficient Models for Scene | text | Recognition via Neural Architecture Search |
| MER-CAPF: Audio- | text | emotion recognition through cross-attention mechanism and multi-granularity pooling strategy |
| MESA: | text | -Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data |
| Met-MLTS: Leveraging Smartphones for End-to-End Spotting of Multilingual Oriented Scene | text | s and Traffic Signs in Adverse Meteorological Conditions |
| MetaCloak: Preventing Unauthorized Subject-Driven | text | -to-Image Diffusion-Based Synthesis via Meta-Learning |
| MetaHTR: Towards Writer-Adaptive Handwritten | text | Recognition |
| MetaWriter: Personalized Handwritten | text | Recognition Using Meta-Learned Prompt Tuning |
| Method and apparatus for detecting running | text | in an image |
| Method and apparatus for the resolution enhancement of gray-scale images that include | text | and line art |
| Method and system for recognizing a boundary between characters in handwritten | text | |
| Method for automatic recognition of white blocks as well as | text | , graphics and/or gray image areas on a printed master |
| method for detecting | text | of arbitrary shapes in natural scenes that improves text spotting, A |
| method for detecting | text | of arbitrary shapes in natural scenes that improves text spotting, A |
| method for discovering knowledge in | text | s, A |
| Method for Extracting | text | from Stone Inscriptions Using Character Spotting, A |
| Method for identification and compression of facsimile symbols in | text | processing systems |
| Method for identifying word bounding boxes in | text | |
| Method for Semantic Relatedness Based Query Focused | text | Summarization, A |
| Method for | text | Localization and Recognition in Real-World Images, A |
| Method for Transformer Oil Leakage Detection | text | Generation Using Combined Large and Small Models |
| Method for unconstrained | text | detection in natural scene image |
| method for variable quantization in JPEG for improved | text | quality in compound documents, A |
| method of N-grams in large-scale clustering of DNA | text | s, The |
| Method of separating | text | and graphs in digital image data |
| Methods for | text | segmentation from scene images |
| Metric Learning for | text | Documents |
| Mevg: Multi-event Video Generation with | text | -to-video Models |
| MF-GAN: Multi-conditional Fusion Generative Adversarial Network for | text | -to-Image Synthesis |
| MFECLIP: CLIP With Mapping-Fusion Embedding for | text | -Guided Image Editing |
| MicroCinema: A Divide-and-Conquer Approach for | text | -to-Video Generation |
| MIGC: Multi-Instance Generation Controller for | text | -to-Image Synthesis |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video- | text | Retrieval |
| Mimir: Improving Video Diffusion Models for Precise | text | Understanding |
| Minimal Interaction Touchless | text | Input with Head Movements and Stereo Vision |
| Minimum Error Rate Training for PHMM-Based | text | Recognition |
| Minimum Risk Training for Handwritten Chinese/Japanese | text | Recognition Using Semi-Markov Conditional Random Fields |
| Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese | text | recognition |
| Mining conversational | text | for procedures with applications in contact centers |
| Mining False Positive Examples for | text | -Based Person Re-Identification |
| Mining the displacement of max-pooling for | text | recognition |
| Minority-Focused | text | -to-Image Generation via Prompt Optimization |
| MirrorGAN: Learning | text | -To-Image Generation by Redescription |
| Mis?-) Using DRT for Generation of Natural Language | text | from Image Sequences |
| MISL: Multi-grained image- | text | semantic learning for text-guided image inpainting |
| MISL: Multi-grained image- | text | semantic learning for text-guided image inpainting |
| Mismatch Quest: Visual and | text | ual Feedback for Image-Text Misalignment |
| Mita: An Information Extraction Approach to the Analysis of Free-Form | text | in Life-Insurance Applications |
| Mixdq: Memory-efficient Few-step | text | -to-image Diffusion Models with Metric-decoupled Mixed Precision Quantization |
| Mixed-Supervised Scene | text | Detection With Expectation-Maximization Algorithm |
| Mobile visual search on printed documents using | text | and low bit-rate features |
| MobileCLIP: Fast Image- | text | Models through Multi-Modal Reinforced Training |
| Mobilediffusion: Instant | text | -to-image Generation on Mobile Devices |
| Modality Disentangled Discriminator for | text | -to-Image Synthesis |
| Model and Data Integrated Transfer Learning for Unstructured Map | text | Detection |
| Model Based | text | Line Segmentation Method for Off-line Handwritten Documents, A |
| model for detecting and merging vertically spanned table cells in plain | text | documents, A |
| Model of On-line Handwritten Japanese | text | Recognition Free from Line Direction and Writing Format Constraints, A |
| model-based approach to offline | text | -independent Arabic writer identification and verification, A |
| Model-Based System Specification With Tesperanto: Readable | text | From Formal Graphics |
| Modeling Motion with Multi-Modal Features for | text | -Based Video Segmentation |
| Modeling of image, video and | text | fusion quality data packet system for aerospace complex products based on business intelligence |
| Modeling Stroke Mask for End-to-End | text | Erasing |
| Modeling Thousands of Human Annotators for Generalizable | text | -to-Image Person Re-identification |
| Modern vs Diplomatic Transcripts for Historical Handwritten | text | Recognition |
| Moment-Based Image Normalization for Handwritten | text | Recognition |
| Monkey: Image Resolution and | text | Label are Important Things for Large Multi-Modal Models |
| Mono-font Cursive Arabic | text | Recognition Using Speech Recognition System |
| Morality Classification in Natural Language | text | |
| MORAN: A Multi-Object Rectified Attention Network for scene | text | recognition |
| More Grounded Image Captioning by Distilling Image- | text | Matching Model |
| More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image- | text | Matching |
| More than Words: In-the-Wild Visually-Driven Prosody for | text | -to-Speech |
| Morpheus: | text | -Driven 3D Gaussian Splat Shape and Color Stylization |
| MorphNeRF: | text | -Guided 3D-Aware Editing via Morphing Generative Neural Radiance Fields |
| Morphological Approach for | text | -Line Segmentation in Handwritten Documents, A |
| Morphological Approach to | text | String Extraction from Regular Periodic Overlapping Text-Background Images, A |
| Morphological Approach to | text | String Extraction from Regular Periodic Overlapping Text-Background Images, A |
| Morphological | text | Extraction from Images |
| Morphology-based hierarchical representation with application to | text | segmentation in natural images |
| Morphology-based | text | line extraction |
| Morph | text | : Deep Morphology Regularized Accurate Arbitrary-Shape Scene Text Detection |
| Mosaicing-by-recognition for video-based | text | recognition |
| Mosaicing-by-recognition: a technique for video-based | text | recognition |
| MOST: A Multi-Oriented Scene | text | Detector with Localization Refinement |
| MotiF: Making | text | Count in Image Animation with Motion Focal Loss |
| MotionDiffuse: | text | -Driven Human Motion Generation With Diffusion Model |
| Motiondirector: Motion Customization of | text | -to-video Diffusion Models |
| Moto: Enhancing Embedding with Multiple Joint Factors for Chinese | text | Classification |
| Movie fill in the blank by joint learning from video and | text | with adaptive temporal attention |
| Movie/Script: Alignment and Parsing of Video and | text | Transcription |
| MPEG-7 Video | text | Description Scheme for Superimposed Text in Images and Video |
| MRF based | text | binarization in complex images using stroke feature |
| MRF Model for Binarization of Natural Scene | text | , An |
| MRN: Multiplexed Routing Network for Incremental Multilingual | text | Recognition |
| MRP-GAN: Multi-resolution parallel generative adversarial networks for | text | -to-image synthesis |
| MSCap: Multi-Style Image Captioning With Unpaired Stylized | text | |
| MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating | text | and Speech |
| MSER-Based Real-Time | text | Detection and Tracking |
| MSR-Video to | text | dataset with clean annotations, The |
| MSSA: A Multi-Scale Semantic-Aware Method for Remote Sensing Image- | text | Retrieval |
| MTA-CLIP: Language-guided Semantic Segmentation with Mask- | text | Alignment |
| MTADiffusion: Mask | text | Alignment Diffusion Model for Object Inpainting |
| MTGT: Multiscale | text | Feature-Guided Transformer in medical image segmentation |
| MTRNet++: One-stage mask-based scene | text | eraser |
| MUGEN: A Playground for Video-Audio- | text | Multimodal Understanding and GENeration |
| MULAN: A Multi Layer Annotated Dataset for Controllable | text | -to-Image Generation |
| MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned | text | Embedding and Alternating Training |
| Multi language | text | detection using fast stroke width transform |
| Multi scale mirror connection based encoder decoder network for | text | localization |
| Multi-branch Network with Ensemble Learning for | text | Removal in the Wild |
| Multi-Concept Customization of | text | -to-Image Diffusion |
| Multi-dimensional long short-term memory networks for artificial Arabic | text | recognition in news video |
| Multi-Dimensional Quality Assessment for | text | -to-3D Assets: Dataset and Model |
| Multi-event Video- | text | Retrieval |
| Multi-fractal Modeling for On-line | text | -Independent Writer Identification |
| Multi-Grained Vision-and-Language Model for Medical Image and | text | Alignment |
| Multi-Granularity Aggregation Transformer for Joint Video-Audio- | text | Representation Learning |
| Multi-Granularity Matching Transformer for | text | -Based Person Search |
| Multi-granularity Prediction for Scene | text | Recognition |
| Multi-Granularity Prediction with Learnable Fusion for Scene | text | Recognition |
| Multi-Group Proportional Representation for | text | -to-Image Models |
| Multi-head Self-relation Network for Scene | text | Recognition, A |
| Multi-Label Generalized Zero Shot Chest X-Ray Classification by Combining Image- | text | Information With Feature Disentanglement |
| Multi-label | text | Classification Approach for Sentence Level News Emotion Analysis |
| Multi-layer feature fusion based image style transfer with arbitrary | text | condition |
| Multi-Layer Probabilistic Association Reasoning Network for Image- | text | Retrieval |
| Multi-lingual scene | text | detection and language identification |
| Multi-lingual | text | recognition from video frames |
| Multi-Modal Architecture With Spatio-Temporal- | text | Adaptation for Video-Based Traffic Accident Anticipation, A |
| Multi-modal Con | text | ual Graph Neural Network for Text Visual Question Answering |
| Multi-Modal Fusion Network for Rumor Detection with | text | s and Images |
| Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene | text | |
| Multi-modal In-Con | text | Learning Makes an Ego-evolving Scene Text Recognizer |
| Multi-Modal Reasoning Graph for Scene- | text | Based Fine-Grained Image Classification and Retrieval |
| Multi-Modal Reference Learning for Fine-Grained | text | -to-Image Retrieval |
| Multi-Modal Representation Learning with | text | -Driven Soft Masks |
| Multi-modal | text | Recognition Networks: Interactive Enhancements Between Visual and Semantic Features |
| Multi-Modal Topic Model for Image Annotation Using | text | Analysis, A |
| Multi-Orientation Scene | text | Detection with Adaptive Clustering |
| Multi-orientation scene | text | detection with multi-information fusion |
| Multi-Oriented and Multi-Lingual Scene | text | Detection With Direct Regression |
| Multi-oriented Bangla and Devnagari | text | recognition |
| Multi-oriented English | text | Line Identification |
| Multi-oriented Scene | text | Detection via Corner Localization and Region Segmentation |
| Multi-oriented | text | detection from natural scene images based on a CNN and pruning non-adjacent graph edges |
| Multi-oriented | text | Detection with Fully Convolutional Networks |
| Multi-Oriented | text | Extraction in Stylistic Documents |
| Multi-oriented touching | text | character segmentation in graphical documents using dynamic programming |
| Multi-phase recognition of multifont photoscript Arabic | text | |
| multi-plane approach for | text | segmentation of complex document images, A |
| Multi-polarity | text | segmentation using graph theory |
| Multi-resolution form of SVD for | text | -independent speaker recognition |
| Multi-Resolution Pathology-Language Pre-training Model with | text | -Guided Visual Representation |
| Multi-Scale Feature Fusion Based on Piecewise Polynomial Activation Function for Image- | text | Matching |
| Multi-scale sequential network for semantic | text | segmentation and localization |
| Multi-scale | text | Line Segmentation Method in Freestyle Handwritten Documents, A |
| Multi-scale video | text | detection based on corner and stroke width verification |
| multi-scenario | text | generation method based on meta reinforcement learning, A |
| Multi-schema prompting powered token-feature woven attention network for short | text | classification |
| Multi-script and Multi-oriented | text | Localization from Scene Images |
| Multi-script iterative steerable directional filtering for handwritten | text | line extraction |
| Multi-script | text | Extraction from Natural Scenes |
| Multi-script | text | versus non-text classification of regions in scene images |
| Multi-script | text | versus non-text classification of regions in scene images |
| Multi-Script-Oriented | text | Detection and Recognition in Video/Scene/Born Digital Images |
| Multi-sensor | text | classification experiments: A comparison |
| Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained | text | -to-Image Synthesis |
| Multi-Sentence Complementarily Generation for | text | -to-Image Synthesis |
| Multi-Speaker | text | -to-Speech Training With Speaker Anonymized Data |
| Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene | text | Detection in Video Images |
| Multi-stage HMM based Arabic | text | recognition with rescoring |
| Multi-strategy tracking based | text | detection in scene videos |
| Multi-Style Shape Matching GAN for | text | Images |
| Multi- | text | Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model |
| Multi-Track Timeline Control for | text | -Driven 3D Human Motion Generation |
| Multi-View User Preference Modeling for Personalized | text | -to-Image Generation |
| Multi-View Visual Semantic Embedding for Cross-Modal Image- | text | Retrieval |
| Multi3DRefer: Grounding | text | Description to Multiple 3D Objects |
| Multifractal Characterization of | text | s for Pattern Recognition: On the Complexity of Morphological Structures in Modern and Ancient Languages |
| Multigap: Multi-pooled inception network with | text | augmentation for aesthetic prediction of photographs |
| Multilabel | text | Classification With Incomplete Labels: A Safe Generative Model With Label Manifold Regularization and Confidence Constraint |
| Multilateral Semantic Relations Modeling for Image | text | Retrieval |
| Multilevel Semantic Interaction Alignment for Video- | text | Cross-Modal Retrieval |
| Multilevel | text | -Line Segmentation Framework for Handwritten Historical Documents, A |
| Multilingual Artificial | text | Detection Using a Cascade of Transforms |
| Multilingual | text | -to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning |
| Multimodal alignment of event and | text | streams in spiking neural networks for human action recognition |
| Multimodal grid features and cell pointers for scene | text | visual question answering |
| Multimodal interactive transcription of | text | images |
| Multimodal Meme Classification Identifying Offensive Content in Image and | text | |
| Multimodal Neurons in Pretrained | text | -Only Transformers |
| Multimodal Processing and Interaction: Audio, Video, | text | |
| Multimodal Sentiment Analysis With Image- | text | Interaction Network |
| Multimodal Topic Modeling by Exploring Characteristics of Short | text | Social Media |
| Multimodal-LLM Agent For | text | -Driven Multi-Attribute Face Editing |
| Multioriented and Curved | text | Lines Extraction From Indian Documents |
| Multioriented Video Scene | text | Detection Through Bayesian Classification and Boundary Growing |
| multiple agent architecture for handwritten | text | recognition, A |
| Multiple attention encoded cascade R-CNN for scene | text | detection |
| Multiple Classifier Approach for the Recognition of Screen-Rendered | text | , A |
| Multiple Document Datasets Pre-training Improves | text | Line Detection With Deep Neural Networks |
| Multiple Geometry Transform Estimation from Single Camera-Captured | text | Image |
| Multiple Handwritten | text | Line Recognition Systems Derived from Specific Integration of a Language Model |
| Multiple Learned Dictionaries Based Clustered Sparse Coding for the Super-Resolution of Single | text | Image |
| Multiple Positives Enhanced NCE Loss for Image- | text | Retrieval, A |
| Multitwine: Multi-Object Compositing with | text | and Layout Control |
| Multivariate Feedback-Based Image- | text | Joint Learning for Sketch-Less Facial Image Retrieval |
| Multiview | text | Imagination Network Based on Latent Alignment for Image-Text Matching, A |
| Multiview | text | Imagination Network Based on Latent Alignment for Image-Text Matching, A |
| MuLTReNets: Multilingual | text | recognition networks for simultaneous script identification and handwriting recognition |
| MUST-VQA: Multilingual Scene- | text | VQA |
| Mutually Guided Dual-Task Network for Scene | text | Detection |
| Mutually | text | ual and Visual Refinement Network for Image-Text Matching, A |
| MV-Adapter: Multimodal Video Transfer Learning for Video | text | Retrieval |
| MVCM: Enhancing Multi-View and Cross-Modality Alignment for Medical Visual Question Answering and Medical Image- | text | Retrieval |
| MVPortrait: | text | -Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation |
| N-Gram-Based | text | Categorization |
| Name your style: | text | -guided artistic style transfer |
| Narrating the Video: Boosting | text | -Video Retrieval via Comprehensive Utilization of Frame-Level Captions |
| Natural Language Watermarking Using Semantic Substitution for Chinese | text | |
| Natural scene | text | detection based on SWT, MSER and candidate classification |
| Natural Scene | text | Detection with Multi-channel Connected Component Segmentation |
| Natural scene | text | detection with multi-layer segmentation and higher order conditional random field based analysis |
| NaturalSpeech: End-to-End | text | -to-Speech Synthesis With Human-Level Quality |
| Navigating | text | -to-image Generative Bias Across Indic Languages |
| NCAP: Scene | text | Image Super-Resolution with Non-CAtegorical Prior |
| Negative-Aware Attention Framework for Image- | text | Matching |
| Negative-Prompt Inversion: Fast Image Inversion for Editing with | text | -Guided Diffusion Models |
| NEOCR: A Configurable Dataset for Natural Image | text | Recognition |
| neural model for | text | localization, transcription and named entity recognition in full pages, A |
| Neural network-based prediction of the stopping moment for | text | recognition in a video stream |
| Neural Network-based | text | Location for News Video Indexing |
| Neural network-based | text | location in color images |
| Neural Sign Actors: A diffusion model for 3D sign language production from | text | |
| Neuro or Symbolic? Fine-Tuned Transformer With Unsupervised LDA Topic Clustering for | text | Sentiment Analysis |
| Neuro-Symbolic Evaluation of | text | -to-Video Models using Formal Verification |
| Neuron-Based Spiking Transmission and Reasoning Network for Robust Image- | text | Retrieval |
| New Approach Based on | text | ure and Geometric Features for Text Detection |
| New Approach for Overlay | text | Detection and Extraction From Complex Video Scene, A |
| new approach for | text | -independent speaker recognition, A |
| new approach for video | text | detection, A |
| New Approach towards | text | Filtering, A |
| New Arabic Printed | text | Image Database and Evaluation Protocols, A |
| New Binarization Approach Based on | text | Block Extraction |
| New Block Partitioned | text | Feature for Text Verification, A |
| New Block Partitioned | text | Feature for Text Verification, A |
| new deep CNN for 3D | text | localization in the wild through shadow removal, A |
| New Deep Wavefront Based Model for | text | Localization in 3D Video, A |
| new edge-based | text | verification approach for video, A |
| New Fourier-Statistical Features in RGB Space for Video | text | Detection |
| New Fuzzy Hierarchical Classification Based on SVM for | text | Categorization, A |
| New Gradient Based Character Segmentation Method for Video | text | Recognition, A |
| new hybrid method to detect | text | in natural scene, A |
| new instrumented approach for translating American Sign Language into sound and | text | , A |
| New Language-Independent Deep CNN for Scene | text | Detection and Style Transfer in Social Media Images, A |
| New Method for Arabic | text | Detection in Natural Scene Image Based on the Color Homogeneity, A |
| New Method for Arabic | text | Detection in Natural Scene Images, A |
| new method for detection and prediction of occluded | text | in natural scene images, A |
| New Method for Handwritten Scene | text | Detection in Video, A |
| new method for multi-oriented graphics-scene-3D | text | classification in video, A |
| New Method for | text | Verification Based on Random Forests, A |
| New Method for | text | -Line Segmentation for Warped Documents, A |
| New Method for Word Segmentation from Arbitrarily-Oriented Video | text | Lines, A |
| New Method for Writer Identification and Verification Based on Farsi/Arabic Handwritten | text | s, A |
| new multi-modal approach to bib number/ | text | detection and recognition in Marathon images, A |
| New Nearest Neighbor Rule for | text | Categorization, A |
| new robust algorithm for video | text | extraction, A |
| new scheme for unconstrained handwritten | text | -line segmentation, A |
| new segmentation technique for omnifont Farsi | text | , A |
| new segmentation technique of Arabic | text | , A |
| New Smoothing Method for Lexicon-Based Handwritten | text | Keyword Spotting, A |
| New Strategy for Reducing Errors in Scene | text | Detection, A |
| new structural technique for recognizing printed Arabic | text | , A |
| New Symmetry Based on Proximity of Wavelet-Moments for | text | Frame Classification in Video, A |
| New Technique for Multi-Oriented Scene | text | Line Detection and Tracking in Video, A |
| New | text | Extraction Method Incorporating Local Information, A |
| New | text | -Line Alignment Approach Based on Piece-Wise Painting Algorithm for Handwritten Documents, A |
| New Type of Feature: Loose N-Gram Feature in | text | Categorization, A |
| new unified method for detecting | text | from marathon runners and sports players in video (PR-D-19-01078R2), A |
| New Video Images | text | Localization Approach Based on a Fast Hough Transform, A |
| New Wavelet and Color Features for | text | Detection in Video |
| new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video | text | lines, A |
| Newmove: Customizing | text | -to-video Models with Novel Motions |
| News2meme: An Automatic Content Generator from News Based on Word Subspaces from | text | and Image |
| NIVeL: Neural Implicit Vector Layers for | text | -to-Vector Generation |
| Noise Diffusion for Enhancing Semantic Faithfulness in | text | -to-Image Synthesis |
| Noise-aware Learning from Web-crawled Image- | text | Data for Image Captioning |
| NoiseCollage: A Layout-Aware | text | -to-Image Diffusion Model Based on Noise Cropping and Merging |
| Noisy | text | Categorization |
| Noisy-Aware Unsupervised Domain Adaptation for Scene | text | Recognition |
| Noisy-Correspondence Learning for | text | -to-Image Person Re-Identification |
| Non-Local | text | Image Reconstruction |
| Non-negative Sparse Semantic Coding for | text | categorization |
| non-stationary density model to separate overlapped | text | s in degraded documents, A |
| Non-Uniform Slant Correction for Handwritten | text | Line Recognition |
| Not Just | text | : Uncovering Vision Modality Typographic Threats in Image Generation Models |
| Not Only | text | : Exploring Compositionality of Visual Representations in Vision-Language Models |
| Novel Algorithm for | text | Detection and Localization in Natural Scene Images, A |
| novel automated depression detection technique using | text | transcript, A |
| novel binarization approach for | text | in images, A |
| Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to | text | and Speech, A |
| Novel Data Representation for | text | Extraction from Multispectral Historical Document Images |
| novel domain independent scene | text | localizer, A |
| Novel Edge Features for | text | Frame Classification in Video |
| Novel Fuzzy Logic-Based | text | Classification Method for Tracking Rare Events on Twitter, A |
| Novel Illumination-Balance Technique for Improving the Quality of Degraded | text | -Photo Images, A |
| Novel Integrated Framework for Learning both | text | Detection and Recognition, A |
| Novel Method for Embedded | text | Segmentation Based on Stroke and Color, A |
| novel method for straightening curved | text | -lines in stylistic documents, A |
| novel method of | text | line segmentation for historical document image of the uchen Tibetan, A |
| Novel Multi-oriented Chinese | text | Extraction Approach from Videos, A |
| novel mutual nearest neighbor based symmetry for | text | frame classification in video, A |
| novel scene | text | detection algorithm based on convolutional neural network, A |
| Novel Sub-character HMM Models for Arabic | text | Recognition |
| Novel System for Robust | text | Location and Recognition of Book Covers, A |
| Novel | text | Detection System Based on Character and Link Energies, A |
| novel | text | structure feature extractor for Chinese scene text detection and recognition, A |
| novel | text | structure feature extractor for Chinese scene text detection and recognition, A |
| Novel | text | -Independent Speaker Verification System Using Ant Colony Optimization Algorithm, A |
| novel triangulation procedure for thinning hand-written | text | , A |
| novel two-stage algorithm for baseline estimation and correction in Farsi and Arabic handwritten | text | line, A |
| Novel Visual Representation on | text | Using Diverse Conditional GAN for Visual Recognition, A |
| Novice and Expert Performance of KeyScretch: A Gesture-Based | text | Entry Method for Touch-Screens |
| NTIRE 2025 challenge on | text | to Image Generation Model Quality Assessment |
| Null- | text | Inversion for Editing Real Images using Guided Diffusion Models |
| OASIS: Object-guided Attention for | text | -conditional Diffusion Synthesis of Human Interaction Sequences |
| Object proposals for | text | extraction in the wild |
| Object Reading: | text | Recognition for Object Recognition |
| Object-aware Query Perturbation for Cross-modal Image- | text | Retrieval |
| Object-conditioned Energy-based Attention Map Alignment in | text | -to-image Diffusion Models |
| Object-Driven | text | -To-Image Synthesis via Adversarial Training |
| Object-level semantic alignment for enhancing fidelity in | text | -to-image generation with diffusion models |
| Objective Distortion Measure for Binary | text | Image Based on Edge Line Segment Similarity |
| Objective Function Design for MCE-Based Combination of On-line and Off-line Character Recognizers for On-line Handwritten Japanese | text | Recognition |
| Occluded | text | Detection and Recognition in the Wild |
| Occlusion-Aware | text | -Image-Point Cloud Pretraining for Open-World 3D Object Recognition |
| OCR and Voting Shell Fulfilling Specific | text | Analysis Requirements |
| OCR of Printed Telugu | text | with High Recognition Accuracies |
| OCR Pipeline and Semantic | text | Analysis for Comics, An |
| OCR-VQGAN: Taming | text | -within-Image Generation |
| OCRSpell: An Interactive Spelling Correction System for OCR Errors in | text | |
| ODM: A | text | -Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting |
| ODM: A | text | -Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting |
| Off-line Chinese Writer Retrieval System Based on | text | -sensitive Writer Identification, An |
| Offline arabic handwritten | text | recognition: A Survey |
| Offline handwritten Arabic cursive | text | recognition using Hidden Markov Models and re-ranking |
| Offline Recognition of Large Vocabulary Cursive Handwritten | text | |
| Offline recognition of omnifont Arabic | text | using the HMM ToolKit (HTK) |
| Offline Recognition of Unconstrained Handwritten | text | s Using HMMs and Statistical Language Models |
| Offline | text | -independent writer identification using codebook and efficient code extraction methods |
| Old fashion | text | -based image retrieval using FCA |
| Omnifont recognition of | text | using topological recognition techniques |
| OMNIPARSER: A Unified Framework for | text | Spotting, Key Information Extraction and Table Recognition |
| On appearance-based feature extraction methods for writer-independent handwritten | text | recognition |
| On Calibration of Scene- | text | Recognition Models |
| On Combining Multiple Segmentations in Scene | text | Recognition |
| On Manipulating Scene | text | in the Wild with Diffusion Models |
| On optimal stopping strategies for | text | recognition in a video stream as an application of a monotone sequential decision model |
| On partitioning a dictionary for visual | text | recognition |
| On Recognizing | text | s of Arbitrary Shapes with 2D Self-Attention |
| On the Behavior of Contrastive Regularization in Improving Chinese | text | Recognizer |
| On the Detection of Images Generated from | text | |
| On the discriminability of keystroke feature vectors used in fixed | text | keystroke authentication |
| On the Evaluation of Handwritten | text | Line Detection Algorithms |
| On the General Value of Evidence, and Bilingual Scene- | text | Visual Question Answering |
| On the Generalization of Handwritten | text | Recognition Models |
| On the influence of vocabulary size and language models in unconstrained handwritten | text | recognition |
| On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten | text | Recognition |
| On the Processing of Fuzzy Patterns for | text | Independent Phonetic Speech Segmentation |
| On the Scalability of Diffusion-based | text | -to-Image Generation |
| On the Segmentation of | text | in Videos |
| On the use of Bernoulli mixture models for | text | classification |
| On the use of duration-corrected N-best hypotheses for | text | recognition in gray-scale document images |
| On Vocabulary Reliance in Scene | text | Recognition |
| On-Device | text | Image Super Resolution |
| On-Line Handwritten Japanese | text | Recognition Free from Constrains on Line Direction and Character Orientation |
| On-line Handwritten Japanese | text | Recognition System Free from Line Direction and Character Orientation Constraints, An |
| On-Line Handwritten | text | Line Detection Using Dynamic Programming |
| On-line Handwritten | text | Search Method Based on Directional Feature Matching, An |
| On-line recognition of handwritten Renqun shorthand for fast mobile Chinese | text | entry |
| On-line Writing-box-free Recognition of Handwritten Japanese | text | Considering Character Size Variations |
| ONE-DM: One-shot Diffusion Mimicker for Handwritten | text | Generation |
| One-shot Compositional Data Generation for Low Resource Handwritten | text | Recognition |
| One-Shot Doc Snippet Detection: Powering Search in Document Beyond | text | |
| One-Step Diffusion for Real-World Image Super-Resolution via Degradation Removal and | text | Prompts |
| One-Way Ticket: Time-Independent Unified Encoder for Distilling | text | -to-Image Diffusion Models |
| Online Biterm Topic Model based short | text | stream classification using short text expansion and concept drifting detection |
| Online Biterm Topic Model based short | text | stream classification using short text expansion and concept drifting detection |
| Online | text | -Independent Writer Identification Based on Stroke's Probability Distribution Function |
| Online | text | -independent Writer Identification Based on Temporal Sequence and Shape Codes |
| Ontology-Based | text | Mining Method to Develop D-Matrix From Unstructured Text, An |
| Ontology-Based | text | Mining Method to Develop D-Matrix From Unstructured Text, An |
| Opaque Document Imaging: Building Images of Inaccessible | text | s |
| Open set classification of untranscribed handwritten | text | image documents |
| Open-Set | text | Recognition via Character-Context Decoupling |
| Open-Vocabulary 3D Semantic Segmentation with | text | -to-Image Diffusion Models |
| Open-Vocabulary Panoptic Segmentation with | text | -to-Image Diffusion Models |
| Open-vocabulary recognition of machine-printed Arabic | text | using hidden Markov models |
| Open-Vocabulary | text | -Driven Human Image Generation |
| OpenBias: Open-Set Bias Detection in | text | -to-Image Generative Models |
| OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image- | text | Generation |
| Opinion mining from noisy | text | data |
| OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene | text | Detection |
| Optical character correction of large-curvature annular sector | text | in polar coordinate system |
| Optical flow based dynamic curved video | text | detection |
| Optical modelling and language modelling trade-off for Handwritten | text | Recognition |
| Optimal Boxes: Boosting End-to-End Scene | text | Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning |
| Optimal Classification Model for | text | Detection and Recognition in Video Frames |
| Optimal | text | /Background Color Combination of LED Information Boards for Visibility Improvement Based on Psychological Measurements, An |
| Optimal word order for non-causal | text | generation with Large Language Models: The Spanish case |
| Optimizing the class information divergence for transductive classification of | text | s using propagation in bipartite graphs |
| Optimizing the integration of a statistical language model in HMM based offline handwritten | text | recognition |
| Orientation and Scale Invariant | text | Region Extraction in WWW Images |
| Orientation Robust | text | Line Detection in Natural Images |
| OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page | text | Recognition by learning to unfold |
| Oscillating Feature Subset Search Algorithm for | text | Categorization |
| OST: Refining | text | Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition |
| OTE: Exploring Accurate Scene | text | Recognition Using One Token |
| Out of vocabulary word detection and recovery in Arabic handwritten | text | recognition |
| Outline Generation Transformer for Bilingual Scene | text | Recognition |
| Overview of | text | -Based Person Search: Recent Advances and Future Directions, An |
| P-CLIP: Progressive Discrepancy Learning for One-Shot | text | -to-Image Person Re-Identification |
| PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese | text | Recognition |
| Paint-it: | text | -to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering |
| Pair-Copula Based Scheme for | text | Extraction from Digital Images, A |
| PairAug: What Can Augmented Image- | text | Pairs Do for Radiology? |
| Pairwise optimized Rocchio algorithm for | text | categorization |
| PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped | text | |
| Pangu-draw: Advancing Resource-efficient | text | -to-image Synthesis with Time-decoupled Training and Reusable Coop-diffusion |
| PanoDreamer: Consistent | text | to 360-Degree Scene Generation |
| Paragraph | text | segmentation into lines with Recurrent Neural Networks |
| Parameter efficient finetuning of | text | -to-image models with trainable self-attention layer |
| Parametric Spectral-Based Method for Verification of | text | in Videos, A |
| Parco: Part-coordinating | text | -to-motion Synthesis |
| Parrot Captions Teach CLIP to Spot | text | |
| Parrot: Pareto-optimal Multi-reward Reinforcement Learning Framework for | text | -to-image Generation |
| Part-based method on handwritten | text | s |
| Partial Scene | text | Retrieval |
| Parts2Words: Learning Joint Embedding of Point Clouds and | text | s by Bidirectional Matching Between Parts and Words |
| PathLDM: | text | conditioned Latent Diffusion Model for Histopathology |
| Pay attention to what you read: Non-recurrent handwritten | text | -Line recognition |
| Pea-diffusion: Parameter-efficient Adapter with Knowledge Distillation in Non-english | text | -to-image Generation |
| Pen Acoustic Emissions for | text | and Gesture Recognition |
| Perceptive Vision for Headline Localisation in Bangla Handwritten | text | Recognition |
| Performance Analysis of | text | Halftone Modulation |
| Performance Evaluation of | text | Detection and Tracking in Video |
| Person Identification Using | text | and Image Data |
| Person Search by | text | Attribute Query As Zero-Shot Learning |
| PersonaBooth: Personalized | text | -to-Motion Generation |
| Personalised video summarisation using video- | text | multi-modal fusion |
| Personalized Residuals for Concept-Driven | text | -to-Image Generation |
| Personalized | text | snippet extraction using statistical language models |
| Perspective Scene | text | Recognition with Feature Compression and Ranking |
| PETR: Rethinking the Capability of Transformer-Based Language Model in Scene | text | Recognition |
| PFAN++: Bi-Directional Image- | text | Retrieval With Position Focused Attention Network |
| Phenology description is all you need! mapping unknown crop types with remote sensing time-series and LLM generated | text | alignment |
| Photographic | text | -to-Image Synthesis with a Hierarchically-Nested Adversarial Network |
| PhotoOCR: Reading | text | in Uncontrolled Conditions |
| PhyS-EdiT: Physics-aware Semantic Image Editing with | text | Description |
| PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded | text | -to-Video Generation |
| PI3D: Efficient | text | -to-3D Generation with Pseudo-Image Diffusion |
| PIA: Your Personalized Image Animator via Plug-and-Play Modules in | text | -to-Image Models |
| Picture and | text | Query and Archiving System, A |
| Picture is Worth More Than 77 | text | Tokens: Evaluating CLIP-Style Models on Dense Captions, A |
| PIDRo: Parallel Isomeric Attention with Dynamic Routing for | text | -Video Retrieval |
| Piece-wise linearity based method for | text | frame classification in video |
| Pitch Based Segmentation and Recognition of Dot-Matrix | text | |
| Pitman Shorthand inspired model for plain | text | compression |
| Pixart-sigma: Weak-to-strong Training of Diffusion Transformer for 4k | text | -to-image Generation |
| Pixel-Based Evaluation Method for | text | Detection in Color Images, A |
| Pix | text | GAN: structure aware text image synthesis for license plate recognition |
| Plan, Posture and Go: Towards Open-vocabulary | text | -to-motion Generation |
| Platypus: A Generalized Specialist Model for Reading | text | in Various Forms |
| Plda-based system for | text | -prompted password speaker verification |
| Plot: | text | -based Person Search with Part Slot Attention for Corresponding Part Discovery |
| Plug-and-Play Diffusion Features for | text | -Driven Image-to-Image Translation |
| Plug-and-Play Interpretable Responsible | text | -to-Image Generation via Dual-Space Multi-facet Concept Control |
| Plug-and-Play Regulators for Image- | text | Matching |
| Plugnet: Degradation Aware Scene | text | Recognition Supervised by a Pluggable Super-resolution Unit |
| PMMN: Pre-Trained Multi-Modal Network for Scene | text | Recognition |
| PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved | text | -to-Image Diffusion |
| PointCloud- | text | Matching: Benchmark Dataset and Baseline |
| PolygloNet: Multilingual Approach for Scene | text | Recognition Without Language Constraints |
| Polygon-based technique for the automatic classification of | text | and graphics components from digitized paper-based forms |
| Polygon-Free: Unconstrained Scene | text | Detection with Box Annotations |
| pooling based scene | text | proposal technique for scene text reading in the wild, A |
| pooling based scene | text | proposal technique for scene text reading in the wild, A |
| Portable and fast | text | detection |
| Portmanteauing Features for Scene | text | Recognition |
| Position-Guided | text | Prompt for Vision-Language Pre-Training |
| Post-training Quantization with Progressive Calibration and Activation Relaxing for | text | -to-image Diffusion Models |
| PosterMaker: Towards High-Quality Product Poster Generation with Accurate | text | Rendering |
| Powerful and Flexible: Personalized | text | -to-image Generation via Reinforcement Learning |
| PQPP: A Joint Benchmark for | text | -to-Image Prompt and Query Performance Prediction |
| PR-CLIP: Cross-Modal Positional Reconstruction for Remote Sensing Image- | text | Retrieval |
| Pre-Training a Graph Recurrent Network for | text | Understanding |
| PreciseCam: Precise Camera Control for | text | -to-Image Generation |
| Precisecontrol: Enhancing | text | -to-image Diffusion Models with Fine-grained Attribute Control |
| Predicated Diffusion: Predicate Logic-Based Attention Guidance for | text | -to-Image Diffusion Models |
| Predict, Prevent, and Evaluate: Disentangled | text | -Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model |
| Predicting audio-visual salient events based on visual, audio and | text | modalities for movie summarization |
| Predicting Emotional Responses to Long Informal | text | |
| Predicting Motivations of Actions by Leveraging | text | |
| Predicting Visual Features From | text | for Image and Video Caption Retrieval |
| PRESENT: Zero-Shot | text | -to-Prosody Control |
| Preserve or Modify? Con | text | -Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing |
| Preserving privacy without compromising accuracy: Machine unlearning for handwritten | text | recognition |
| PreSTU: Pre-Training for Scene- | text | Understanding |
| Primitive Representation Learning for Scene | text | Recognition |
| Printed | text | Discrimination |
| Printed | text | Featuring Using the Visual Criteria of Legibility and Complexity |
| Printed | text | segmentation using distance transform |
| Prior knowledge guided | text | to image generation |
| Prior Preserved | text | -to-Image Personalization Without Image Regularization |
| Probabilistic Hierarchical Clustering Method for Organising Collections of | text | Documents, A |
| Probabilistic Kernels for Improved | text | -to-Speech Alignment in Long Audio Tracks |
| probabilistic model derived term weighting scheme for | text | classification, A |
| Processing of Binary Images of Handwritten | text | Documents |
| Processing of Off-Line Handwritten | text | : Polygonal-Approximation and Enforcement of Temporal Information |
| Progressive Contour Regression for Arbitrary-Shape Scene | text | Detection |
| Progressive Feature Mining and External Knowledge-Assisted | text | -Pedestrian Image Retrieval |
| Progressive Human Motion Generation Based on | text | and Few Motion Frames |
| Progressive Rendering Distillation: Adapting Stable Diffusion for Instant | text | -to-Mesh Generation without 3D Data |
| Progressive scene | text | erasing with self-supervision |
| Progressive Spatio-Temporal Prototype Matching for | text | -Video Retrieval |
| Progressive | text | -Semantic-Aware Generative Adversarial Network for Image Fusion |
| Progressive | text | -to-Face Synthesis with Generative Adversarial Network |
| Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward | text | -to-3D Scene Generation |
| Prompt Augmentation for Self-supervised | text | -guided Image Manipulation |
| Prompt Switch: Efficient CLIP Adaptation for | text | -Video Retrieval |
| Prompt Tuning Inversion for | text | -Driven Image Editing Using Diffusion Models |
| Prompt-Free Diffusion: Taking | text | Out of Text-to-Image Diffusion Models |
| Prompt-Free Diffusion: Taking | text | Out of Text-to-Image Diffusion Models |
| Prompt2Perturb (P2P): | text | -Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images |
| PromptAD: Zero-shot Anomaly Detection using | text | Prompts |
| Prompting Hard or Hardly Prompting: Prompt Inversion for | text | -to-Image Diffusion Models |
| Proposal for a | text | -Indicated Writer Verification Method, A |
| Proposal of the hybrid spectral gradient method to extract character- | text | regions from general scene images |
| Protip: Probabilistic Robustness Verification on | text | -to-image Diffusion Models Against Stochastic Perturbation |
| Prototype-guided | text | -based person search on rich Chinese descriptions |
| Psg-adapter: Controllable Planning Scene Graph for Improving | text | -to-image Diffusion |
| Pull Pole Points to | text | Contour by Magnetism: A Real-Time Scene Text Detector |
| Pull Pole Points to | text | Contour by Magnetism: A Real-Time Scene Text Detector |
| Pure Transformer with Integrated Experts for Scene | text | Recognition |
| Push the limit of scene | text | recognition using character and text length guided text super-resolution |
| Push the limit of scene | text | recognition using character and text length guided text super-resolution |
| Push the limit of scene | text | recognition using character and text length guided text super-resolution |
| Pushing the Performance Limit of Scene | text | Recognizer without Human Annotation |
| PYRAD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline | text | Recognition Systems |
| Pyrboxes: An efficient multi-scale scene | text | detector with feature pyramids |
| Q-Eval-100K: Evaluating Visual Quality and Alignment Level for | text | -to-Vision Content |
| quad tree based method for blurred and non-blurred video | text | frames classification through quality metrics, A |
| Quadrilateral Scene | text | Detector with Two-Stage Network Architecture, A |
| Quality Assessment for | text | -to-Image Generation: A Survey |
| Quality inspection of printed | text | s |
| Quality-related English | text | classification based on recurrent neural network |
| QWERTY- and 8pen- Based Touchless | text | Input with Hand Movement |
| R-Net: A Relationship Network for Efficient and Accurate Scene | text | Detection |
| R.A.C.E.: Robust Adversarial Concept Erasure for Secure | text | -to-image Diffusion Model |
| R2CNN: Rotational Region CNN for Arbitrarily-Oriented Scene | text | Detection |
| Rail Transit Line-Sign | text | Detection With Patch-Based Region Proposal Network |
| Random Subspace Method in | text | Categorization |
| Ranni: Taming | text | -to-Image Diffusion for Accurate Instruction Following |
| Rapid Evaluation of the Handwriting Performance for Gesture Based | text | Input |
| Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for | text | categorization |
| Re-ranking image- | text | matching by adaptive metric fusion |
| Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene | text | Recognition |
| Read | text | from Signs in General Scenes |
| Reading Arbitrary-shaped Scene | text | from Images Through Spline Regression and Rectification |
| Reading Newspaper | text | |
| Reading | text | in the Wild from Compressed Images |
| Reading | text | in the Wild with Convolutional Neural Networks |
| Reading-Life Log: Technologies to Recognize | text | s That We Read, The |
| Reading-Strategy Inspired Visual Representation Learning for | text | -to-Video Retrieval |
| ReADS: A Rectified Attentional Double Supervised Network for Scene | text | Recognition |
| Real time image enhancement and segmentation for sign/ | text | detection |
| Real time image enhancement for both | text | and color photo images |
| Real-Time Lexicon-Free Scene | text | Localization and Recognition |
| Real-time Lexicon-free Scene | text | Retrieval |
| Real-Time Scene | text | Detection Based on Stroke Model |
| Real-Time Scene | text | Detection With Differentiable Binarization and Adaptive Scale Fusion |
| Real-time scene | text | localization and recognition |
| Real-Time Scene | text | to Speech System, A |
| Real-Time | text | Detection With Similar Mask in Traffic, Industrial, and Natural Scenes |
| Real-Time | text | Steganalysis Based on Multi-Stage Transfer Learning |
| Real-time | text | tracking in natural scenes |
| real-time | text | -independent speaker identification system, A |
| Real-Time Visual Analytics for | text | Streams |
| RealCustom: Narrowing Real | text | Word for Real-Time Open-Domain Text-to-Image Customization |
| RealCustom: Narrowing Real | text | Word for Real-Time Open-Domain Text-to-Image Customization |
| RealDTT: Towards A Comprehensive Real-World Dataset for Tampered | text | Detection |
| RealmDreamer: | text | -Driven 3D Scene Generation with Inpainting and Depth Diffusion |
| Realtime multi-scale scene | text | detection with scale-based region proposal network |
| Reasoning elicitation and multi-granularity contrastive learning for | text | -rich image understanding in large vision-language models |
| Receler: Reliable Concept Erasing of | text | -to-image Diffusion Models via Lightweight Erasers |
| Recipe for Scaling up | text | -to-Video Generation with Text-free Videos, A |
| Recipe for Scaling up | text | -to-Video Generation with Text-free Videos, A |
| Recipe2Video: Synthesizing Personalized Videos from Recipe | text | s |
| ReCo: Region-Controlled | text | -to-Image Generation |
| Recognising | text | in Real Scenes |
| Recognition based | text | localization from natural scene images |
| Recognition of Apparent Personality Traits from | text | and Handwritten Images |
| Recognition of Arabic Machine-Printed Cursive | text | |
| Recognition of Bangla | text | from scene images through perspective correction |
| Recognition of cursive video | text | using a deep learning framework |
| Recognition of Hand-Written Archive | text | Documents |
| Recognition of Handwritten Chinese | text | by Segmentation: A Segment-Annotation-Free Approach |
| Recognition of Indian multi-oriented and curved | text | |
| Recognition of Multi-oriented, Multi-sized, and Curved | text | |
| Recognition of Noise Polyfont Printed | text | Using Combined HMMS, The |
| Recognition of Pornographic Web Pages by Classifying | text | s and Images |
| Recognition of printed arabic | text | based on global features and decision tree learning techniques |
| Recognition of Printed Arabic | text | Using Neural Networks |
| Recognition of printed Devanagari | text | using BLSTM Neural Network |
| Recognition of Printed | text | under Realistic Conditions |
| Recognition of Screen-Rendered | text | |
| Recognition of Video | text | through Temporal Integration |
| Recognition-Based Segmentation of Nom Characters from Body | text | Regions of Stele Images Using Area Voronoi Diagram |
| Recognition-Synergistic Scene | text | Editing |
| Recognize | text | in General Scenes |
| Recognizing Chinese | text | s with 3D Convolutional Neural Network |
| Recognizing irregular entities in biomedical | text | via deep neural networks |
| Recognizing Multiple | text | Sequences from an Image by Pure End-to-End Learning |
| Recognizing perspective scene | text | with context feature |
| Recognizing semantic correlation in image- | text | Weibo via feature space mapping |
| Recognizing | text | Elements for SVG Comic Compression and Its Novel Applications |
| Recognizing | text | in historical maps using maps from multiple time periods |
| Recognizing | text | in raster maps |
| Recognizing | text | with a CNN |
| Recognizing | text | with Perspective Distortion in Natural Scenes |
| Recognizing | text | -Based Traffic Guide Panels with Cascaded Localization Network |
| Recognizing | text | -Based Traffic Signs |
| Recon: Training-free Acceleration for | text | -to-image Synthesis with Retrieval of Concept Prompt Trajectories |
| Reconsidering Tourism Destination Images by Exploring Similarities between Travelogue | text | s and Photographs |
| Rectification and recognition of | text | in 3-D scenes |
| Rectifying Perspective Views of | text | in 3D Scenes Using Vanishing Points |
| Recurrent Affine Transformation for | text | -to-Image Synthesis |
| Recurrent Global Convolutional Network for Scene | text | Detection |
| Recurrent Highway Networks with Attention Mechanism for Scene | text | Recognition |
| Redefining the DCT-based feature for scene | text | detection: Analysis and comparison of spatial frequency-based features |
| Redif Extraction in Handwritten Ottoman Literary | text | s |
| Reduced annotation based on deep active learning for arabic | text | detection in natural scene images |
| Reference-Aware Adaptive Network for Image- | text | Matching |
| Referring Image Segmentation Using | text | Supervision |
| Refine, Control and Distill: A | text | -to-Image Framework for Faithful Image Generation |
| Refining | text | -to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation |
| Region Reinforcement Network With Topic Constraint for Image- | text | Matching |
| Region-Aware Arbitrary-Shaped | text | Detection With Progressive Fusion |
| Region-Based Discriminative Feature Pooling for Scene | text | Recognition |
| Regularizing Visual Semantic Embedding With Contrastive Learning for Image- | text | Matching |
| Reinforcement Shrink-Mask for | text | Detection |
| Rejection Strategies for Offline Handwritten | text | Line Recognition |
| ReLa | text | : Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks |
| Relation Graph Reasoning for Image- | text | Matching |
| Relation Mining and Visualization Framework for Automated | text | Summarization, A |
| Relation-Guided Network for Image- | text | Retrieval |
| Reliable and Efficient Concept Erasure of | text | -to-image Diffusion Models |
| Reliable Phrase Feature Mining for Hierarchical Video- | text | Retrieval |
| Remote Sensing Cross-Modal | text | -Image Retrieval Based on Attention Correction and Filtering |
| Remote Sensing Image Augmentation Based on | text | Description for Waterside Change Detection |
| Remote Sensing Image Generation via Object | text | Decoupling |
| Removing Distributional Discrepancies in Captions Improves Image- | text | Alignment |
| RenAIssance: A Survey Into AI | text | -to-Image Generation in the Era of Large Model |
| Report from the AND 2009 working group on noisy | text | datasets |
| Representation and Recognition of | text | Using Hidden Markov Models, The |
| Representation learning for very short | text | s using weighted word embedding aggregation |
| Representation transfer and data cleaning in multi-views for | text | simplification |
| Residual Dual Scale Scene | text | Spotting by Fusing Bottom-Up and Top-Down Processing |
| ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video- | text | Data Streams |
| ReStGAN: A step towards visually guided shopper experience via | text | -to-image synthesis |
| Retaining Knowledge and Enhancing Long- | text | Representations in CLIP through Dual-Teacher Distillation |
| Rethinking Diffusion for | text | -Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression |
| Rethinking Noisy Video- | text | Retrieval via Relation-aware Alignment |
| Rethinking | text | Segmentation: A Novel Dataset and A Text-Specific Refinement Approach |
| Rethinking | text | Segmentation: A Novel Dataset and A Text-Specific Refinement Approach |
| Rethinking Training for De-biasing | text | -to-Image Generation: Unlocking the Potential of Stable Diffusion |
| Rethinking Video- | text | Understanding: Retrieval from Counterfactually Augmented Data |
| Retrieval Methods for English- | text | with Misrecognized OCR Characters |
| Retrieval Strategies for Noisy | text | |
| Revealing Directions for | text | -Guided 3D Face Editing |
| Review of Cross-Modal Image- | text | Retrieval in Remote Sensing, A |
| Review of Segmentation and Con | text | ual Analysis Techniques for Text Recognition, A |
| Revisiting Scene | text | Recognition: A Data Perspective |
| RIATIG: Reliable and Imperceptible Adversarial | text | -to-Image Generation with Natural Prompts |
| Rich Human Feedback for | text | -to-Image Generation |
| RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in | text | -to-3D |
| Rickrolling the Artist: Injecting Backdoors into | text | Encoders for Text-to-Image Synthesis |
| Rickrolling the Artist: Injecting Backdoors into | text | Encoders for Text-to-Image Synthesis |
| RiFeGAN2: Rich Feature Generation for | text | -to-Image Synthesis From Constrained Prior Knowledge |
| RiFeGAN: Rich Feature Generation for | text | -to-Image Synthesis From Prior Knowledge |
| RLita: A Region-Level Image- | text | Alignment Method for Remote Sensing Foundation Model |
| RLST: A Reinforcement Learning Approach to Scene | text | Detection Refinement |
| RMGNet: The Progressive Relationship-Mining Graph Neural Network for | text | -to-Image Person Re-Identification |
| robust algorithm for | text | detection in color images, A |
| Robust Algorithm for | text | String Separation from Mixed Text/Graphics Images, A |
| Robust Algorithm for | text | String Separation from Mixed Text/Graphics Images, A |
| Robust and Accurate | text | Stroke Segmentation |
| robust and multiscale document image segmentation for block line/ | text | line structures extraction, A |
| Robust and Non-Negative Collective Matrix Factorization for | text | -to-Image Transfer Learning |
| Robust and parallel Uyghur | text | localization in complex background images |
| Robust and Secure Data Hiding for PDF | text | Document |
| robust approach for recognition of | text | embedded in natural scenes, A |
| robust approach for | text | detection from natural scene images, A |
| Robust Approach to Extraction of | text | s from Camera Captured Images, A |
| robust approach to | text | line grouping in online handwritten Japanese documents, A |
| Robust Binarization for Video | text | Recognition |
| Robust Color-Independent | text | Detection Method from Complex Videos, A |
| Robust detection of stylized | text | events in digital video |
| Robust Disaster Assessment from Aerial Imagery Using | text | -to-Image Synthetic Data |
| Robust Extraction of | text | from Camera Images |
| Robust Extraction of | text | in Video |
| Robust Hashing With Bilinear Drift for Image- | text | Retrieval |
| robust hybrid approach for | text | line segmentation in historical documents, A |
| Robust Lexicon-Free Confidence Prediction for | text | Recognition |
| Robust Local Scoring Function for | text | -Independent Speaker Verification |
| Robust Model for On-Line Handwritten Japanese | text | Recognition, A |
| Robust outdoor | text | detection using text intensity and shape features |
| Robust outdoor | text | detection using text intensity and shape features |
| Robust Scene | text | Detection for Multi-script Languages Using Deep Learning |
| Robust Scene | text | Detection for Partially Annotated Training Data |
| Robust scene | text | detection using integrated feature discrimination |
| Robust Scene | text | Detection with Convolution Neural Network Induced MSER Trees |
| Robust Scene | text | Detection with Deep Feature Pyramid Network and CNN based NMS Model |
| Robust Scene | text | Recognition with Automatic Rectification |
| Robust scene | text | understanding with OCR token and word alignment for Text-VQA and text-caption |
| Robust scene | text | understanding with OCR token and word alignment for Text-VQA and text-caption |
| Robust scene | text | understanding with OCR token and word alignment for Text-VQA and text-caption |
| Robust seed-based stroke width transform for | text | detection in natural images |
| Robust Segmentation Technique for Line, Word and Character Extraction from Kannada | text | in Low Resolution Display Board Images, A |
| Robust skew detection in mixed | text | /graphics documents |
| Robust Split-and-Merge | text | Segmentation Approach for Images, A |
| Robust stereo correspondence for documents by matching connected components of | text | -lines with dynamic programming |
| Robust stereo matching for document images using parameter selection of | text | -line extraction |
| robust system for | text | extraction in video, A |
| Robust System For Thresholding And Skew Detection In Mixed | text | /graphics Documents, A |
| robust technique for | text | extraction in mixed-type binary documents, A |
| Robust | text | detection from binarized document images |
| Robust | text | detection in natural images with edge-enhanced Maximally Stable Extremal Regions |
| Robust | text | Detection in Natural Scene Images |
| Robust | text | Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks |
| Robust | text | Detection with Vertically-Regressed Proposal Network |
| Robust | text | Image Recognition via Adversarial Sequence-to-Sequence Domain Adaptation |
| Robust | text | Line Segmentation for Historical Manuscript Images Using Color and Texture |
| Robust | text | Segmentation in Low Quality Images via Adaptive Stroke Width Estimation and Stroke Based Superpixel Grouping |
| Robust | text | segmentation using graph cut |
| Robust | text | watermarking based on average skeleton mass of characters against cross-media attacks |
| Robust Two Level Classification Algorithm for | text | Localization in Documents, A |
| Robust video | text | segmentation and recognition with multiple hypotheses |
| Robust Video- | text | Retrieval Via Noisy Pair Calibration |
| Robust Wavelet Transform Based Technique for Video | text | Detection, A |
| Robustly Recognizing Irregular Scene | text | by Rectifying Principle Irregularities |
| Robustscanner: Dynamically Enhancing Positional Clues for Robust | text | Recognition |
| Rolling bilateral filter-based | text | image deblurring |
| Rotation and script independent | text | detection from video frames using sub pixel mapping |
| Rotation-Sensitive Regression for Oriented Scene | text | Detection |
| Rough-fuzzy based scene categorization for | text | detection and recognition in video |
| RSCA: Real-time Segmentation-based Con | text | -Aware Scene Text Detection |
| RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to- | text | Adversarial Attacks |
| RUArt: A Novel | text | -Centered Solution for Text-Based Visual Question Answering |
| RUArt: A Novel | text | -Centered Solution for Text-Based Visual Question Answering |
| Rule Based Con | text | ual Post-Processing for Devanagari Text Recognition |
| RVMamba: Selective | text | -Vision Mamba for Referring Video Object Segmentation |
| SAC: Semantic Attention Composition for | text | -Conditioned Image Retrieval |
| SAFE: Scale Aware Feature Encoder for Scene | text | Recognition |
| Safeguard | text | -to-image Diffusion Models with Human Feedback Inversion |
| SaHAN: Scale-Aware Hierarchical Attention Network for Scene | text | Recognition |
| SALAD: Skeleton-aware Latent Diffusion for | text | -driven Motion Generation and Editing |
| Salient Guided | text | Detection in E-Commerce Images |
| Salient Object-Aware Background Generation using | text | -Guided Diffusion Models |
| SAM: Self Attention Mechanism for Scene | text | Recognition Based on Swin Transformer |
| Sample-aware Data Augmentor for Scene | text | Recognition |
| SAMWISE: Infusing Wisdom in SAM2 for | text | -Driven Video Segmentation |
| SARAT-a system for the recognition of Arabic printed | text | |
| SAST: Semantic-Aware stylized | text | -to-Image generation |
| SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker | text | -to-Speech Systems |
| Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and | text | Data |
| Scale and Orientation Invariant | text | Segmentation for Born-Digital Compound Images |
| Scale robust deep oriented- | text | detection network |
| Scale Up Composed Image Retrieval Learning via Modification | text | Generation |
| Scale-aware Polar Representation for Arbitrarily-shaped | text | Detection |
| Scale-Invariant Multi-Oriented | text | Detection in Wild Scene Image |
| Scale-Residual Learning Network for Scene | text | Detection |
| Scaledreamer: Scalable | text | -to-3d Synthesis with Asynchronous Score Distillation |
| Scaling Down | text | Encoders of Text-to-Image Diffusion Models |
| Scaling Down | text | Encoders of Text-to-Image Diffusion Models |
| Scaling up GANs for | text | -to-Image Synthesis |
| SCATTER: Selective Con | text | Attentional Scene Text Recognizer |
| ScenarioDiff: | text | -to-video Generation with Dynamic Transformations of Scene Conditions |
| Scene Graph Driven | text | -Prompt Generation for Image Inpainting |
| Scene Retrieval for Video Summarization Based on | text | -to-Image GAN |
| Scene | text | Character Recognition Using Spatiality Embedded Dictionary |
| Scene | text | Deblurring Using Text-Specific Multiscale Dictionaries |
| Scene | text | Deblurring Using Text-Specific Multiscale Dictionaries |
| Scene | text | detection and recognition with advances in deep learning: A survey |
| Scene | text | Detection and Recognition: The Deep Learning Era |
| Scene | text | Detection and Segmentation Based on Cascaded Convolution Neural Networks |
| Scene | text | Detection and Tracking for a Camera-Equipped Wearable Reading Assistant for the Blind |
| Scene | text | detection based on component-level fusion and region-level verification |
| Scene | text | detection based on multi-scale SWT and edge filtering |
| Scene | text | Detection Based on Robust Stroke Width Transform and Deep Belief Network |
| Scene | text | detection based on skeleton-cut detector |
| Scene | text | Detection in Foggy Weather Utilizing Knowledge Distillation of Diffusion Models |
| Scene | text | detection method based on the hierarchical model |
| Scene | text | detection suitable for parallelizing on multi-core |
| Scene | text | detection using adaptive color reduction, adjacent character model and hybrid verification strategy |
| Scene | text | detection using graph model built upon maximally stable extremal regions |
| Scene | text | detection using sequential nontext filtering |
| Scene | text | detection using sparse stroke information and MLP |
| Scene | text | Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification |
| Scene | text | Detection via Connected Component Clustering and Nontext Filtering |
| Scene | text | Detection via Deep Semantic Feature Fusion and Attention-based Refinement |
| Scene | text | Detection via Integrated Discrimination of Component Appearance and Consensus |
| Scene | text | detection via stroke width |
| Scene | text | Detection with Adaptive Line Clustering |
| Scene | text | detection with extremal region based cascaded filtering |
| Scene | text | Detection with Recurrent Instance Segmentation |
| Scene | text | detection with robust character candidate extraction method |
| Scene | text | Detection with Selected Anchors |
| Scene | text | detection with superpixels and hierarchical model |
| Scene | text | Extraction and Translation for Handheld Devices |
| Scene | text | extraction based on edges and support vector regression |
| Scene | text | Extraction by Superpixel CRFs Combining Multiple Character Features |
| Scene | text | Extraction in Complex Images |
| Scene | text | extraction in natural scene images using hierarchical feature combining and verification |
| Scene | text | Extraction Using Focus of Mobile Camera |
| Scene | text | Extraction with Edge Constraint and Text Collinearity |
| Scene | text | Extraction with Edge Constraint and Text Collinearity |
| Scene | text | extraction with local symmetry transform |
| Scene | text | Identification by Leveraging Mid-level Patches and Context Information |
| Scene | text | Image Super-resolution based on Text-conditional Diffusion Models |
| Scene | text | Image Super-resolution based on Text-conditional Diffusion Models |
| Scene | text | Image Super-Resolution in the Wild |
| Scene | text | Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss |
| Scene | text | Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss |
| Scene | text | Localization and Recognition with Oriented Stroke Detection |
| Scene | text | Localization Using Gradient Local Correlation |
| Scene | text | Recognition and Retrieval for Large Lexicons |
| Scene | text | recognition by learning co-occurrence of strokes based on spatiality embedded dictionary |
| Scene | text | Recognition in Mobile Applications by Character Descriptor and Structure Configuration |
| Scene | text | Recognition Models Explainability Using Local Features |
| Scene | text | recognition using a Hough forest implicit shape model and semi-Markov conditional random fields |
| Scene | text | Recognition Using Co-occurrence of Histogram of Oriented Gradients |
| Scene | text | Recognition using Higher Order Language Priors |
| Scene | text | Recognition Using Part-Based Tree-Structured Character Detection |
| Scene | text | Recognition Using Progressive Rectification Network And Spelling Error Correction Language Model |
| Scene | text | recognition using residual convolutional recurrent neural network |
| Scene | text | Recognition Using Similarity and a Lexicon with Sparse Belief Propagation |
| Scene | text | recognition using sparse coding based features |
| Scene | text | Recognition Using Structure-Guided Character Detection and Linguistic Knowledge |
| Scene | text | Recognition with a Hough Forest Implicit Shape Model |
| Scene | text | recognition with CNN classifier and WFST-based word labeling |
| Scene | text | recognition with deeper convolutional neural networks |
| Scene | text | Recognition with Permuted Autoregressive Sequence Models |
| Scene | text | Recognition with Self-supervised Contrastive Predictive Coding |
| Scene | text | Recognition: No Country for Old Men? |
| Scene | text | rectification using glyph and character alignment properties |
| Scene | text | Removal, Text Erasing |
| Scene | text | Removal, Text Erasing |
| Scene | text | Retrieval via Joint Text Detection and Similarity Learning |
| Scene | text | Retrieval via Joint Text Detection and Similarity Learning |
| Scene | text | Script Identification with Convolutional Recurrent Neural Networks |
| Scene | text | Segmentation Based on Local Image Phase Information and MSER Method |
| Scene | text | Segmentation by Paired Data Synthesis |
| Scene | text | Segmentation via Inverse Rendering |
| Scene | text | Segmentation with Multi-level Maximally Stable Extremal Regions |
| Scene | text | Telescope: Text-Focused Scene Image Super-Resolution |
| Scene | text | Telescope: Text-Focused Scene Image Super-Resolution |
| Scene | text | Visual Question Answering |
| Scene | text | , Assistance for Visually Imapired |
| Scene- | text | Oriented Referring Expression Comprehension |
| Scene- | text | Synthesis Engine Achieved Through Learning From Decomposed Real-World Data, A |
| Scene- | text | -Detection Method Robust Against Orientation and Discontiguous Components of Characters |
| SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and | text | |
| Scientometric Full- | text | Analysis of Papers Published in Remote Sensing between 2009 and 2021 |
| SciOL and MuLMS-Img: Introducing A Large-Scale Multimodal Scientific Dataset and Models for Image- | text | Tasks in the Scientific Domain |
| SCOB: Universal | text | Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap |
| SCOB: Universal | text | Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap |
| SCoRD: Subject-Conditional Relation Detection with | text | -Augmented Data |
| ScrabbleGAN: Semi-Supervised Varying Length Handwritten | text | Generation |
| Screen-rendered | text | images recognition using a deep residual network based segmentation-free method |
| Scribble-Guided Diffusion for Training-Free | text | -to-Image Generation |
| Script and nature differentiation for Arabic and Latin | text | images |
| Script-Free | text | Line Segmentation Using Interline Space Model for Printed Document Images |
| Script-Independent | text | Line Segmentation in Freestyle Handwritten Documents |
| Script-independent, HMM-based | text | Line Finding for OCR |
| Sculpt3D: Multi-View Consistent | text | -to-3D Generation with Sparse 3D Prior |
| SCUT-COUCH | text | line_NU: An Unconstrained Online Handwritten Chinese Text Lines Dataset |
| SCUT-HCCDoc: A new benchmark dataset of handwritten Chinese | text | in unconstrained camera-captured documents |
| SD-Prompt: Learnable and Adaptive Prompts for Enhancing Subject-Driven | text | -to-Image Synthesis |
| Search method and apparatus for locating digitally stored content, such as visual images, music and sounds, | text | , or software, in storage devices on a computer network |
| Searching a High Performance Feature Extractor for | text | Recognition Network |
| Searching OCR'ed | text | : An LDA Based Approach |
| Searching through a Speech Memory for | text | -Independent Speaker Verification |
| See Finer, See More: Implicit Modality Alignment for | text | -based Person Retrieval |
| See-Through- | text | Grouping for Referring Image Segmentation |
| SEED: Semantics Enhanced Encoder-Decoder Framework for Scene | text | Recognition |
| Seek Common Ground While Reserving Differences: Semi-Supervised Image- | text | Sentiment Recognition |
| SeeTek: Very Large-Scale Open-set Logo Recognition with | text | -Aware Metric Learning |
| SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural | text | -to-Speech |
| SegLink++: Detecting Dense and Arbitrary-shaped Scene | text | by Instance-aware Component Grouping |
| Segmentation and Classification of Mixed | text | /Graphics/Image Documents |
| Segmentation and Recognition of Continuous Handwriting Chinese | text | |
| Segmentation and Recognition of Dimensioning | text | from Engineering Drawings |
| Segmentation and Word Spotting Methods for Printed and Handwritten Arabic | text | s: A Comparative Study |
| Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese | text | Recognition, A |
| Segmentation of Bangla unconstrained handwritten | text | |
| Segmentation of On-Line Freely Written Japanese | text | Using SVM for Improving Text Recognition |
| Segmentation of On-Line Freely Written Japanese | text | Using SVM for Improving Text Recognition |
| Segmentation of on-line handwritten Japanese | text | of arbitrary line direction by a neural network for improving text recognition |
| Segmentation of on-line handwritten Japanese | text | of arbitrary line direction by a neural network for improving text recognition |
| Segmentation of On-Line Handwritten Japanese | text | Using SVM for Improving Text Recognition |
| Segmentation of On-Line Handwritten Japanese | text | Using SVM for Improving Text Recognition |
| Segmentation of stick | text | based on sub connected area analysis |
| Segmentation of | text | and graphics |
| Segmentation of | text | and Graphics from Document Images |
| Segmentation of | text | From Color Map Images |
| Segmentation of | text | , picture and lines of a document image |
| Segmentation of | text | /image documents using texture approaches |
| Segmentation of Uniform Colored | text | from Color Graphics Background |
| Segmentation of Very Low Resolution Screen-Rendered | text | |
| Segmentation-Aware | text | -Guided Image Manipulation |
| Segmentation-Free Approach to | text | Recognition Recognition with Application to Arabic Text, A |
| Segmentation-Free Approach to | text | Recognition Recognition with Application to Arabic Text, A |
| Segmentation-Free Guidance for | text | -to-Image Diffusion Models |
| Segmentation-free handwritten Chinese | text | recognition with LSTM-RNN |
| Segmented handwritten | text | recognition with recurrent neural network classifiers |
| Segmenting a page of a document into areas which are | text | and areas which are halftone |
| Segmenting Messy | text | : Detecting Boundaries in Text Derived from Historical Newspaper Images |
| Segmenting Messy | text | : Detecting Boundaries in Text Derived from Historical Newspaper Images |
| Segmenting | text | Images with Massively Parallel Machines |
| Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image- | text | Matching |
| Selectively Informative Description can Reduce Undesired Embedding Entanglements in | text | -to-Image Personalization |
| Self-Adaptive Image- | text | Fusion for Medical Image Classification |
| Self-attention based | text | Knowledge Mining for Text Detection |
| Self-attention based | text | Knowledge Mining for Text Detection |
| Self-Cross Diffusion Guidance for | text | -to-Image Synthesis of Similar Subjects |
| Self-Discovering Interpretable Diffusion Latent Directions for Responsible | text | -to-Image Generation |
| Self-learning structure for | text | localization |
| Self-Organized | text | Detection with Minimal Post-processing via Border Learning |
| Self-paced Learning to Improve | text | Row Detection in Historical Documents with Missing Labels |
| Self-supervised adaptation for on-line script | text | recognition |
| Self-supervised Character-to-Character Distillation for | text | Recognition |
| Self-supervised deep reconstruction of mixed strip-shredded | text | documents |
| Self-Supervised Discovery of Cross-Lingual Shared Knowledge for Continual | text | Recognition |
| Self-Supervised Implicit Glyph Attention for | text | Recognition |
| Self-Supervised Learning for | text | Recognition: A Critical Survey |
| Self-Supervised Learning of Visual Features through Embedding Images into | text | Topic Spaces |
| Self-supervised writer adaptation using perceptive concepts: application to on-line | text | recognition |
| Self-Training for Domain Adaptive Scene | text | Detection |
| Self-training for Handwritten | text | Line Recognition |
| SEM-CS: Semantic Clipstyler for | text | -Based Image Style Transfer |
| SEMACOL: Semantic-enhanced multi-scale approach for | text | -guided grayscale image colorization |
| Semantic and Morphological Information Guided Chinese | text | Classification |
| Semantic Controllable Long | text | Steganography Framework Based on LLM Prompt Engineering and Knowledge Graph, A |
| Semantic Correlation Mining between Images and | text | s with Global Semantics and Local Mapping |
| Semantic Distance Adversarial Learning for | text | -to-Image Synthesis |
| Semantic Indexing of Multimedia Content Using Visual, Audio, and | text | Cues |
| Semantic Integration of Information Through Relation Mining: Application to Bio-medical | text | Processing |
| Semantic keyword extraction via adaptive | text | binarization of unstructured unsourced video |
| Semantic Object Accuracy for Generative | text | -to-Image Synthesis |
| Semantic Oriented | text | Clustering Based on RDF |
| Semantic Proximity Based System of Arabic | text | Indexation, A |
| Semantic Role Aware Correlation Transformer for | text | To Video Retrieval |
| Semantic role-based representations in | text | classification |
| Semantic Similarity Distance: Towards better | text | -image consistency metric in text-to-image generation |
| Semantic Similarity Distance: Towards better | text | -image consistency metric in text-to-image generation |
| Semantic | text | Summarization of Long Videos |
| Semantic-Aware Video | text | Detection |
| Semantic-Compensated and Attention-Guided Network for Scene | text | Detection |
| Semantic-Preserving Metric Learning for Video- | text | Retrieval |
| Semantic-Spatial Attention for Refined Object Placement in | text | -to-Image Synthesis |
| Semantically Consistent Hierarchical | text | to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network |
| Semantically consistent | text | to fashion image synthesis with an enhanced attentional generative adversarial network |
| Semantically Invariant | text | -to-Image Generation |
| Semantics Disentangling for | text | -To-Image Generation |
| Semantics-Enhanced Adversarial Nets for | text | -to-Image Synthesis |
| Semi-automatic news video annotation framework for Arabic | text | |
| Semi-Incremental Recognition Method for On-Line Handwritten Japanese | text | , A |
| Semi-Incremental Recognition of On-Line Handwritten Japanese | text | |
| Semi-supervised learning for | text | -line detection |
| Semi-supervised network embedding with | text | information |
| Semi-Supervised Pixel-Level Scene | text | Segmentation by Mutually Guided Network |
| Semi-Supervised Scene | text | Recognition |
| Semi-Supervised | text | Classification With Universum Learning |
| Semi-Supervised | text | Detection With Accurate Pseudo-Labels |
| Semi-Supervised | text | -Based Person Search |
| Semiautomatic Ground Truth Generation for | text | Detection and Recognition in Video Images |
| SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end | text | Spotting |
| SemStyle: Learning to Generate Stylised Image Captions Using Unaligned | text | |
| Sense discovery via co-clustering on images and | text | |
| Sentence level | text | classification in the Kannada language: A classifier's perspective |
| Sentiment analysis based on | text | information enhancement and multimodal feature fusion |
| Sentiment Similarity-oriented Attention Model with Multi-task Learning for | text | -based Emotion Recognition, A |
| Separate Images and Graphics from | text | |
| Separate, Locate, and Align: Determine Con | text | Relation of Scene Text From Multiple Perspectives in TextVQA |
| Separating Content from Style Using Adversarial Learning for Recognizing | text | in the Wild |
| Separating handwritten material from machine printed | text | using hidden Markov models |
| Separating Handwritten | text | from Non-Textual Interference |
| Separating Lines of | text | in Free-Form Handwritten Historical Documents |
| Separating | text | and background in degraded document images: A comparison of global thresholding techniques for multi-stage thresholding |
| Separation of overlapping | text | from graphics |
| Separation of touching and overlapping words in adjacent lines of handwritten | text | |
| Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised | text | Recognition |
| Seq2seq-based Model with Global Semantic Con | text | for Scene Text Recognition, A |
| Sequence as a Whole: A Unified Framework for Video Action Localization With Long-Range | text | Query |
| Sequence to Sequence -- Video to | text | |
| Sequence-to-Sequence Contrastive Learning for | text | Recognition |
| Sequence-To-Sequence Domain Adaptation Network for Robust | text | Image Recognition |
| Sequential alignment attention model for scene | text | recognition |
| Sequential Deformation for Accurate Scene | text | Detection |
| Sequential Monte Carlo video | text | segmentation |
| Sequential | text | s Driven Cohesive Motions Synthesis with Natural Transitions |
| Sequential Transformer for End-to-End Video | text | Detection |
| Sequential visual and semantic consistency for semi-supervised | text | recognition |
| set of benchmarks for Handwritten | text | Recognition on historical documents, A |
| SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene | text | Recognition |
| SGDM: An Adaptive Style-Guided Diffusion Model for Personalized | text | to Image Generation |
| Shape My Moves: | text | -Driven Shape-Aware Synthesis of Human Motions |
| Shape Robust | text | Detection With Progressive Scale Expansion Network |
| Shape-Aware | text | -Driven Layered Video Editing |
| Shape-DNA: Effective Character Restoration and Enhancement for Arabic | text | Documents |
| Shape-Matching GAN++: Scale Controllable Dynamic Artistic | text | Style Transfer |
| ShapeScaffolder: Structure-Aware 3D Shape Generation from | text | |
| ShapeWords: Guiding | text | -to-Image Synthesis with 3D Shape-Aware Prompts |
| Shatter and Gather: Learning Referring Image Segmentation with | text | Supervision |
| SHE-Net: Syntax-Hierarchy-Enhanced | text | -Video Retrieval |
| Sherpa3D: Boosting High-Fidelity | text | -to-3D Generation via Coarse 3D Prior |
| Shifted Diffusion for | text | -to-image Generation |
| ShotAdapter: | text | -to-Multi-Shot Video Generation with Diffusion Models |
| Show-1: Marrying Pixel and Latent Diffusion Models for | text | -to-Video Generation |
| Shuffle and Divide: Contrastive Learning for Long | text | |
| SiamCLIM: | text | -Based Pedestrian Search Via Multi-Modal Siamese Contrastive Learning |
| Sigma-Lognormal Model for Handwritten | text | CAPTCHA Generation, A |
| Sign Detection Based | text | Localization in Mobile Device Captured Scene Images |
| Signing Avatars: Multimodal Challenges for | text | -to-sign Generation |
| Silent Branding Attack: Trigger-free Data Poisoning Attack on | text | -to-Image Diffusion Models |
| SILMM: Self-Improving Large Multimodal Models for Compositional | text | -to-Image Generation |
| SimAC: A Simple Anti-Customization Method for Protecting Face Privacy Against | text | -to-Image Synthesis of Diffusion Models |
| SimAN: Exploring Self-Supervised Representation Learning of Scene | text | via Similarity-Aware Normalization |
| Similarity Search on Semantic Trajectories Using | text | Processing |
| Similarity Shuffled Criss-Cross Transformer With Angle Loss for Image- | text | Matching |
| SimMotionEdit: | text | -Based Human Motion Editing with Motion Similarity Prediction |
| Simple and Effective Multi-word Query Spotting in Handwritten | text | Images |
| Simple and Robust Correlation Filtering Method for | text | -Based Person Search, A |
| Simple Framework for | text | -Supervised Semantic Segmentation, A |
| Simulated Annealing Clustering of Chinese Words for Con | text | ual Text Recognition |
| Simulated annealing-based | text | clustering |
| SINE: SINgle Image Editing with | text | -to-Image Diffusion Models |
| Single Shot Scene | text | Retrieval |
| Single Shot | text | Detector with Regional Attention |
| Single-frame | text | super-resolution: a bayesian approach |
| Single-Line | text | Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition |
| Single-Line | text | Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition |
| Six-CD: Benchmarking Concept Removals for | text | -to-image Diffusion Models |
| SKED: Sketch-guided | text | -based 3D Editing |
| skeleton based descriptor for detecting | text | in real scene images, A |
| Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy | text | Images |
| Sketch and | text | Guided Diffusion Model for Colored Point Cloud Generation |
| Sketch is Worth a Thousand Words: Image Retrieval with | text | and Sketch, A |
| SketchBird: Learning to Generate Bird Sketches from | text | |
| Skew Angle Detection and Correction in | text | Images Using RGB Gradient |
| Skew correction and line extraction in binarized printed | text | images |
| Skew Detection and | text | Line-Position Determination in Digitized Documents |
| Skew detection for complex document images using robust borderlines in both | text | and non-text regions |
| Skew detection for complex document images using robust borderlines in both | text | and non-text regions |
| Skew detection of | text | in a noisy digitized image |
| Skewed | text | correction based on the improved Hough transform |
| Skews in the Phenomenon Space Hinder Generalization in | text | -to-image Generation |
| SleeperMark: Towards Robust Watermark against Fine-Tuning | text | -to-image Diffusion Models |
| Sliding Line Point Regression for Shape Robust Scene | text | Detection |
| SLOAN: Scale-Adaptive Orientation Attention Network for Scene | text | Recognition |
| SMAN: Stacked Multimodal Attention Network for Cross-Modal Image- | text | Retrieval |
| SmartBrush: | text | and Shape Guided Object Inpainting with Diffusion Model |
| Smile: Sequence-to-Sequence Domain Adaptation with Minimizing Latent Entropy for | text | Image Recognition |
| SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker | text | -to-Speech |
| Snap Video: Scaled Spatiotemporal Transformers for | text | -to-Video Synthesis |
| SnapGen: Taming High-Resolution | text | -To-Image Models for Mobile Devices with Efficient Architectures and Training |
| Snooper | text | : A multiresolution system for text detection in complex visual scenes |
| Snooper | text | : A text detection system for automatic indexing of urban scenes |
| Snoopertrack: | text | detection and tracking for outdoor videos |
| SNP-S3: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video- | text | Tasks |
| So Many Heads, So Many Wits: Multimodal Graph Reasoning for | text | -Based Visual Question Answering |
| Social Image- | text | Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation |
| Sounding Video Generator: A Unified Framework for | text | -Guided Sounding Video Generation |
| Source-Free Image- | text | Matching via Uncertainty-Aware Learning |
| Space-Time Diffusion Features for Zero-Shot | text | -Driven Motion Transfer |
| sparse version of the ridge logistic regression for large-scale | text | categorization, A |
| Sparsectrl: Adding Sparse Controls to | text | -to-video Diffusion Models |
| Spatial and Color Spaces Combination for Natural Scene | text | Extraction |
| Spatial and Spectral Based Segmentation of | text | in Multispectral Images of Ancient Documents |
| Spatial con | text | -based Self-Supervised Learning for Handwritten Text Recognition |
| Spatial Transport Optimization by Repositioning Attention Map for Training-Free | text | -to-Image Synthesis |
| Spatially Prioritized and Persistent | text | Detection and Decoding |
| Spatio-Temporal Relevance Classification from Geographic | text | s Using Deep Learning |
| Spatiotemporal Typhoon Damage Assessment: A Multi-Task Learning Method for Location Extraction and Damage Identification from Social Media | text | s |
| SPCL: Semantic Polymorphism and Commonality Learning for | text | -Based Person Retrieval |
| Special issue on camera-based | text | and document recognition |
| Special issue on deep learning for video | text | analysis |
| Special Issue on Noisy | text | Analytics |
| Special Issue on Noisy | text | Analytics, II |
| Special Issue on Noisy | text | Analytics, III |
| Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of | text | -to-Image Diffusion Models to Learn Any Unseen Style |
| Specific Category Region Proposal Network for | text | Detection in Natural Scene |
| Specific Diverse | text | -to-Image Synthesis via Exemplar Guidance |
| Spectral approach to find number of clusters of short- | text | documents |
| Spectral Fluctuation Method: A | text | ure-Based Method to Extract Text Regions in General Scene Images |
| SpectralCLIP: Preventing Artifacts in | text | -Guided Style Transfer from a Spectral Perspective |
| SpeechPalette: A Comprehensive Speech Editing Method for | text | -Based Speech Editing, One-Shot TTS and Attributes Editing |
| Speedupnet: A Plug-and-play Adapter Network for Accelerating | text | -to-image Diffusion Models |
| SPEye: A Calibration-Free Gaze-Driven | text | Entry Technique Based on Smooth Pursuit |
| Spherical Linear Interpolation and | text | -Anchoring for Zero-Shot Composed Image Retrieval |
| Split-net: Dual transformer encoder with splitting scene | text | image for script identification |
| Spontaneous Handwriting | text | Recognition and Classification Using Finite-State Models |
| Spotlight | text | Detector: Spotlight on Candidate Regions Like a Camera |
| Spotting Phrases in Lines of Imaged | text | |
| SPS-SQL: Enhancing | text | -to-SQL generation onr small-scale LLMs with pre-synthesized queries |
| SPTS v2: Single-Point Scene | text | Spotting |
| ST-LDM: A Universal Framework for | text | -grounded Object Generation in Real Images |
| Stable Preference: Redefining Training Paradigm of Human Preference Model for | text | -to-image Synthesis |
| Stable | text | line detection |
| StableID: Multimodal learning for stable identity in personalized | text | -to-Face generation |
| StableVideo: | text | -driven Consistency-aware Diffusion Video Editing |
| Stacked Cross Attention for Image- | text | Matching |
| StackGAN: | text | to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks |
| StacMR: Scene- | text | Aware Cross-Modal Retrieval |
| STAN: A sequential transformation attention-based network for scene | text | recognition |
| STAR-Net: A SpaTial Attention Residue Network for Scene | text | Recognition |
| STARS: Semantics-Aware | text | -guided Aerial Image Refinement and Synthesis |
| StarVector: Generating Scalable Vector Graphics Code from Images and | text | |
| State Estimation in a Document Image and Its Application in | text | Block Identification and Text Line Extraction |
| State Estimation in a Document Image and Its Application in | text | Block Identification and Text Line Extraction |
| State-of-the-Art in Action: Unconstrained | text | Detection |
| Static | text | region detection in video sequences using color and orientation consistencies |
| Statistical Approach for Phrase Location and Recognition within a | text | Line: An Application to Street Name Recognition, A |
| Statistical modeling for the detection, localization and extraction of | text | from heterogeneous textual images using combined feature scheme |
| Statistical | text | Line Analysis in Handwritten Documents |
| Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic | text | Lines, A |
| STEFANN: Scene | text | Editor Using Font Adaptive Neural Network |
| Steganalysis for | text | , Documents |
| STEP - Towards Structured Scene- | text | Spotting |
| STEPS: Sequential Probability Tensor Estimation for | text | -to-Image Hard Prompt Search |
| STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from | text | -to-Image Diffusion Models |
| Stochastic | text | Models for Music Categorization |
| Stop Word Location and Identification for Adaptive | text | Recognition |
| Store classification using | text | -Exemplar-Similarity and Hypotheses-Weighted-CNN |
| Story Segmentation in News Videos Using Visual and | text | Cues |
| Story Visualization by Online | text | Augmentation with Context Memory |
| StoryDALL-E: Adapting Pretrained | text | -to-Image Transformers for Story Continuation |
| STPNet: Scale-Aware | text | Prompt Network for Medical Image Segmentation |
| Straight-Line Approximation and 1D Representation of Off-Line Handwritten | text | |
| Straightening warped | text | lines using polynomial regression |
| Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene | text | Recognition |
| Stratified Multi-Task Learning for Robust Spotting of Scene | text | s |
| StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from | text | |
| StreamMel: Real-Time Zero-Shot | text | -to-Speech Via Interleaved Continuous Autoregressive Modeling |
| Street View | text | Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems |
| Stretching deep architectures for | text | recognition |
| String Matching, | text | Matching |
| String-level learning of confidence transformation for Chinese handwritten | text | recognition |
| STRIVE: Scene | text | Replacement In Videos |
| stroke filter and its application to | text | localization, A |
| Stroke Filter for | text | Localization in Video Images |
| Stroke Segmentation and Recognition from Bangla Online Handwritten | text | |
| Stroke Verification with Gray-level Image for Hangul Video | text | Recognition |
| Stroke-Based Scene | text | Erasing Using Synthetic Data for Training |
| Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene | text | Recognition |
| Strokelets: A Learned Multi-scale Representation for Scene | text | Recognition |
| Structural feature-based event clustering for short | text | streams |
| Structure-Aware Generative Adversarial Network for | text | -to-Image Generation |
| Structured Human Assessment of | text | -to-Image Generative Models |
| Structuring low-quality videotaped lectures for cross-reference browsing by video | text | analysis |
| Study on Automatic Chinese | text | Classification, A |
| Style Transformer With Common Knowledge Optimization for Image- | text | Retrieval, The |
| Style-A-Video: Agile Diffusion for Arbitrary | text | -Based Video Style Transfer |
| Style-Editor: | text | -driven object-centric style editing |
| Style-Preserving Diffusion for Scene | text | Editing |
| StyleCLIP: | text | -Driven Manipulation of StyleGAN Imagery |
| StyleMC: Multi-Channel Based Fast | text | -Guided Image Generation and Manipulation |
| StyleStudio: | text | -Driven Style Transfer with Selective Control of Style Elements |
| StyleT2I: Toward Compositional and High-Fidelity | text | -to-Image Synthesis |
| Stylized | text | -to-Fashion Image Generation |
| Sub-structure Learning Based Handwritten Chinese | text | Recognition |
| subtractive clustering scheme for | text | -independent online writer identification, A |
| Super-resolution Enhancement of | text | Image Sequences |
| Super-Resolved Binarization of | text | Based on the FAIR Algorithm |
| Superresolution-based Enhancement of | text | in Digital Video |
| supervised algorithm with a new differentiated-weighting scheme for identifying the author of a handwritten | text | , A |
| Supervised and Traditional Term Weighting Methods for Automatic | text | Categorization |
| Supervised Domain Adaptation from Scene | text | Recognition for Licence Plate Recognition |
| Supervised semantic relation mining from linguistically noisy | text | documents |
| support vector approach for cross-modal search of images and | text | s, A |
| Support vector machine-based approach for | text | description from the video |
| Support vector machine-based | text | detection in digital video |
| Suppression of non- | text | components in handwritten document images |
| Surgical | text | -to-image generation |
| Surprisingly Straightforward Scene | text | Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis, The |
| Survey of | text | Watermarking in the Era of Large Language Models, A |
| survey on camera-captured scene | text | detection and extraction: towards Gurmukhi script, A |
| survey on methods, datasets and implementations for scene | text | spotting, A |
| survey on | text | generation using generative adversarial networks, A |
| SVGDreamer++: Advancing Editability and Diversity in | text | -Guided SVG Generation |
| SVGDreamer: | text | Guided SVG Generation with Diffusion Model |
| SViTT: Temporal Learning of Sparse Video- | text | Transformers |
| Swap Attention in Spatiotemporal Diffusions for | text | -to-Video Generation |
| Swap | text | : Image Based Texts Transfer in Scenes |
| SwiftBrush: One-Step | text | -to-Image Diffusion Model with Variational Score Distillation |
| SwiftEdit: Lightning Fast | text | -Guided Image Editing via One-Step Diffusion |
| Swin | text | Spotter v2: Towards Better Synergy for Scene Text Spotting |
| Swin | text | Spotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition |
| Swin | text | Spotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition |
| Swin | text | Spotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition |
| SWT voting-based color reduction method for detecting | text | in natural scene images |
| Symbolic Subtraction from Fixed Formatted Graphics and | text | from Filled in Forms |
| Symbolization of Regional Elements Based on Local-Chronicle | text | Mining and Image-Feature Extraction, The |
| Symmetric-key block cipher for image and | text | cryptography |
| Symmetry-based object proposal for | text | detection |
| Symmetry-based | text | line detection in natural scenes |
| Symmetry-Constrained Rectification Network for Scene | text | Recognition |
| Syn3DTxt: Embedding 3D Cues for Scene | text | Generation |
| Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to- | text | Translation |
| Synthesizing Talking Faces from | text | and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network |
| Synthetic Data for | text | Localisation in Natural Images |
| Synthetic-to-real Unsupervised Domain Adaptation for Scene | text | Detection in the Wild |
| Synthetically Supervised Feature Learning for Scene | text | Recognition |
| System and method for automatically distinguishing between graphic information and | text | information of image data |
| System for Bangla Online Handwritten | text | , A |
| System for Handwritten and Machine-Printed | text | Separation in Bangla Document Images, A |
| system for the off-line recognition of handwritten | text | , A |
| T-HOG: An effective gradient-based descriptor for single line | text | regions |
| T-REX2: Towards Generic Object Detection via | text | -visual Prompt Synergy |
| T-Skeleton: Accurate scene | text | detection via instance-aware skeleton embedding |
| t-SS3: A | text | classifier with dynamic n-grams for early risk detection over text streams |
| t-SS3: A | text | classifier with dynamic n-grams for early risk detection over text streams |
| t-Test feature selection approach based on term frequency for | text | categorization |
| T-VSL: | text | -Guided Visual Sound Source Localization in Mixtures |
| T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional | text | -to-Image Generation |
| T2ishield: Defending Against Backdoors on | text | -to-image Diffusion Models |
| T2TD: | text | -3D Generation Model Based on Prior Knowledge Guidance |
| T2V-CompBench: A Comprehensive Benchmark for Compositional | text | -to-video Generation |
| T2V2T: | text | -to-Video-to-Text Fusion for Text-to-Video Retrieval |
| T2V2T: | text | -to-Video-to-Text Fusion for Text-to-Video Retrieval |
| T2V2T: | text | -to-Video-to-Text Fusion for Text-to-Video Retrieval |
| T2VBench: Benchmarking Temporal Dynamics for | text | -to-Video Generation |
| T2VLAD: Global-Local Sequence Alignment for | text | -Video Retrieval |
| TA2V: | text | -Audio Guided Video Generation |
| TAB: | text | -Align Anomaly Backbone Model for Industrial Inspection Tasks |
| Tablet identification using support vector machine based | text | recognition and error correction by enhanced n-grams algorithm |
| TACMT: | text | -aware cross-modal transformer for visual grounding on high-resolution SAR images |
| TACo: Token-aware Cascade Contrastive Learning for Video- | text | Alignment |
| TACT: | text | attention based CNN-Transformer network for polyp segmentation |
| TADA! | text | to Animatable Digital Avatars |
| Tag2Pix: Line Art Colorization Using | text | Tag With SECat and Changing Loss |
| Tag: | text | Prompt Augmentation for Zero-shot Out-of-distribution Detection |
| Tagging Webcast | text | in Baseball Videos by Video Segmentation and Text Alignment |
| Tagging Webcast | text | in Baseball Videos by Video Segmentation and Text Alignment |
| Tailored Visions: Enhancing | text | -to-Image Generation with Personalized Prompt Rewriting |
| Tailoring | text | for automatic layouting of newspaper pages |
| TalkCLIP: Talking Head Generation with | text | -Guided Expressive Speaking Styles |
| TAM-TR: | text | -guided attention multi-modal transformer for object detection in UAV images |
| Taming Mode Collapse in Score Distillation for | text | -to-3D Generation |
| Taming Stable Diffusion for | text | to 360° Panorama Image Generation |
| TAP: | text | -Aware Pre-training for Text-VQA and Text-Caption |
| TAP: | text | -Aware Pre-training for Text-VQA and Text-Caption |
| TAP: | text | -Aware Pre-training for Text-VQA and Text-Caption |
| TAPS3D: | text | -Guided 3D Textured Shape Generation from Pseudo Supervision |
| Target-level Sentiment Analysis Based on Image and | text | Fusion |
| TASDF-Stega: High Capacity Secure | text | -Audio Joint Steganography Using Diffusion Latent Space |
| Task Grouping for Multilingual | text | Recognition |
| TC4D: Trajectory-Conditioned | text | -to-4D Generation |
| TCATD: | text | Contour Attention for Scene Text Detection |
| TCATD: | text | Contour Attention for Scene Text Detection |
| TCFF-Adapter: | text | -Driven Adaption of CLIP for Few-Shot Image Classification |
| TCP: | text | -Guided Cascade Network for Pedestrian Crossing Intention Prediction |
| TE141K: Artistic | text | Benchmark for Text Effect Transfer |
| TE141K: Artistic | text | Benchmark for Text Effect Transfer |
| Teach | text | : CrossModal Generalized Distillation for Text-Video Retrieval |
| Teach | text | : CrossModal text-video retrieval through generalized distillation |
| TECA: | text | -Guided Generation and Editing of Compositional 3D Avatars |
| TeCH: | text | -Guided Reconstruction of Lifelike Clothed Humans |
| Technique for Segmentation of Gurmukhi | text | , A |
| Tecm-clip: | text | -based Controllable Multi-attribute Face Image Manipulation |
| TediGAN: | text | -Guided Diverse Face Image Generation and Manipulation |
| TEDRA: | text | -Based Editing of Dynamic and Photoreal Actors |
| Tela: | text | to Layer-wise 3d Clothed Human Generation |
| Tell Me What Happened: Unifying | text | -guided Video Completion via Multimodal Masked Video Generation |
| Tell Your Story: | text | -Driven Face Video Synthesis with High Diversity via Adversarial Learning |
| Tem-adapter: Adapting Image- | text | Pretraining for Video Question Answer |
| TeMO: Towards | text | -Driven 3D Stylization for Multi-Object Meshes |
| Template Based Segmentation of Touching Components in Handwritten | text | Lines |
| Temporal Multimodal Graph Transformer With Global-Local Alignment for Video- | text | Retrieval |
| Temporal prompt guided visual- | text | -object alignment for zero-shot video captioning |
| Temporal video segmentation with natural language using | text | -video cross attention and Bayesian order-priors |
| TEMSA: | text | enhanced modal representation learning for multimodal sentiment analysis |
| Tensor representation learning based image patch analysis for | text | identification and recognition |
| Tensor Voting Based | text | Localization in Natural Scene Images |
| Term relevance dependency model for | text | classification |
| TETFN: A | text | enhanced transformer fusion network for multimodal sentiment analysis |
| TeViR: | text | -to-Video Reward With Diffusion Models for Efficient Reinforcement Learning |
| TEXDC: | text | -driven Disease-aware 4d Cardiac Cine MRI Images Generation |
| TexFusion: Synthesizing 3D | text | ures with Text-Guided Image Diffusion Models |
| Texgen: | text | -guided 3d Texture Generation with Multi-view Sampling and Resampling |
| text | alignment in early printed books combining deep learning and dynamic programming |
| text | alignment with handwritten documents |
| text | analysis using local energy |
| text | and Documents in the Deep Learning Era |
| text | and Image Guided 3D Avatar Generation and Manipulation |
| text | and Image Sharpening of Scanned Images in the JPEG Domain |
| text | and Layout Information Extraction from Document Files of Various Formats Based on the Analysis of Page Description Language |
| text | and Non-Text Latent Feature Disentanglement for Screen Content Image Compression |
| text | and Non-Text Latent Feature Disentanglement for Screen Content Image Compression |
| text | and non-text segmentation based on connected component features |
| text | and non-text segmentation based on connected component features |
| text | and non-text separation in offline document images: a survey |
| text | and non-text separation in offline document images: a survey |
| text | and picture segmentation by the distribution analysis of wavelet coefficients |
| text | and User Generic Model for Writer Verification Using Combined Pen Pressure Information From Ink Intensity and Indented Writing on Paper |
| text | Area Detection in Digital Documents Images Using Textural Features |
| text | area localization under complex-background using wavelet decomposition |
| text | Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution, A |
| text | Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution, A |
| text | Augmented Correlation Transformer For Few-shot Classification & Segmentation |
| text | baseline detection, a single page trained system |
| text | Baseline Recognition Using a Recurrent Convolutional Neural Network |
| text | binarization in color documents |
| text | Block Segmentation in Comic Speech Bubbles |
| text | box proposals for handwritten word spotting from documents |
| text | Categorization Approach for Music Style Recognition, A |
| text | Categorization: A Symbolic Approach |
| text | Change Detection in Multilingual Documents Using Image Comparison |
| text | Classification and Document Layout Analysis of Paper Fragments |
| text | classification with the support of pruned dependency patterns |
| text | Co-Detection in Multi-View Scene |
| text | Compression-Aided Transformer Encoding |
| text | data extraction from microfilm images of punched cards |
| text | degradations and OCR training |
| text | Detection and Character Recognition in Scene Images with Unsupervised Feature Learning |
| text | Detection and Localization in Complex Scene Images using Constrained AdaBoost Algorithm |
| text | Detection and Recognition in Imagery: A Survey |
| text | detection and recognition in images and video frames |
| text | detection and recognition in natural scene with edge analysis |
| text | Detection and Recognition in Real World Images |
| text | detection and recognition in urban scenes |
| text | Detection and Recognition on Traffic Panels From Street-Level Imagery Using Visual Appearance |
| text | detection and restoration in natural scene images |
| text | Detection and Translation from Natural Scenes |
| text | detection based on convolutional neural networks with spatial pyramid pooling |
| text | Detection for Video Analysis |
| text | detection from natural scene images using topographic maps and sparse representations |
| text | Detection from Natural Scene Images: Towards a System for Visually Impaired Persons |
| text | detection from scene images using sparse representation |
| text | detection in color scene images based on unsupervised clustering of multi-channel wavelet features |
| text | detection in continuous tone image segments |
| text | Detection in Digital Images Captured with Low Resolution Under Nonuniform Illumination Conditions |
| text | detection in images based on unsupervised classification of edge-based features |
| text | detection in images based on unsupervised classification of high-frequency wavelet coefficients |
| text | detection in images using sparse representation with discriminative dictionaries |
| text | detection in manga by combining connected-component-based and region-based classifications |
| text | Detection in Natural Images Using Bio-inspired Models |
| text | Detection in Natural Images Using Localized Stroke Width Transform |
| text | Detection in Natural Scene Images by Stroke Gabor Words |
| text | detection in natural scene images with user-intention |
| text | detection in natural scene with edge analysis |
| text | detection in natural scenes using Gradient Vector Flow-Guided symmetry |
| text | detection in nature scene images using two-stage nontext filtering |
| text | detection in scene images based on exhaustive segmentation |
| text | detection in stores using a repetition prior |
| text | Detection of Two Major Indian Scripts in Natural Scene Images |
| text | detection on camera acquired document images using supervised classification of connected components in wavelet domain |
| text | Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, A |
| text | Detection Using Edge Gradient and Graph Spectrum |
| text | Detection, Find Text in General Scenes, Scene Text |
| text | Detection, Find Text in General Scenes, Scene Text |
| text | Detection, Find Text in General Scenes, Scene Text |
| text | detection, localization, and tracking in compressed video |
| text | Detection, Scene Text, Curved Text, Arbitrary Orientation |
| text | Detection, Scene Text, Curved Text, Arbitrary Orientation |
| text | Detection, Scene Text, Curved Text, Arbitrary Orientation |
| text | Detection, Tracking and Recognition in Video: A Comprehensive Survey |
| text | Detector Based on the Specific Text Prompt, A |
| text | Detector Based on the Specific Text Prompt, A |
| text | discrimination method and related apparatus |
| text | Driven Face-Video Synthesis Using GMM and Spatial Correlation |
| text | Driven Temporal Segmentation of Cricket Videos |
| text | effects transfer via distribution-aware texture synthesis |
| text | Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps |
| text | Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps |
| text | Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps |
| text | Encryption: Hybrid cryptographic method using Vigenere and Hill Ciphers. |
| text | Enhancement by PDE's Based Methods |
| text | Enhancement for Laser Copiers |
| text | Enhancement Network for Cross-Domain Scene Text Detection |
| text | Enhancement Network for Cross-Domain Scene Text Detection |
| text | enhancement with asymmetric filter for video OCR |
| text | Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model |
| text | extraction from color documents-clustering approaches in three and four dimensions |
| text | Extraction from Colored Book and Journal Covers |
| text | extraction from degraded document images |
| text | extraction from gray scale document images using edge information |
| text | extraction from gray scale historical document images using adaptive local connectivity map |
| text | Extraction from Grey Scale Page Images by Simple Edge Detectors |
| text | extraction from images captured via mobile and digital devices |
| text | extraction from name cards with complex design |
| text | extraction from scene images by character appearance and structure modeling |
| text | Extraction from Street Level Images |
| text | Extraction from Video Using Conditional Random Fields |
| text | extraction from web images based on a split-and-merge segmentation method using colour perception |
| text | extraction in complex color documents |
| text | Extraction in Digital News Video Using Morphology |
| text | Extraction in MPEG Compressed Video for Content-based Indexing |
| text | extraction in real scene images on planar planes |
| text | Extraction Using Component Analysis and Neuro-fuzzy Classification on Complex Backgrounds |
| text | Extraction Using Pyramid |
| text | Flow: A Unified Text Detection System in Natural Scene Images |
| text | Flow: A Unified Text Detection System in Natural Scene Images |
| text | From Corners: A Novel Approach to Detect Text and Caption in Videos |
| text | From Corners: A Novel Approach to Detect Text and Caption in Videos |
| text | generation and multi-modal knowledge transfer for few-shot object detection |
| text | Generation, Text Synthesis, Text Placement on Maps |
| text | Generation, Text Synthesis, Text Placement on Maps |
| text | Generation, Text Synthesis, Text Placement on Maps |
| text | Geolocation Prediction via Self-Supervised Learning |
| text | Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis |
| text | Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis |
| text | Growing on Leaf |
| text | Guided Person Image Synthesis |
| text | Identification for Document Image Analysis Using a Neural Network |
| text | Identification in Complex Background Using SVM |
| text | identification in noisy document images using Markov random field |
| text | Image Classifier Using Image-Wise Annotation |
| text | Image Clean-Up and Thresholding: A Comparative Study |
| text | Image Compression Using Soft Pattern Matching |
| text | Image Deblurring Using Kernel Sparsity Prior |
| text | Image Deblurring Using Text-Specific Properties |
| text | Image Deblurring Using Text-Specific Properties |
| text | Image Deblurring via Intensity Extremums Prior |
| text | in Everything |
| text | in Scenes, Stroke Based, Contour Based |
| text | in the dark: Extremely low-light text image enhancement |
| text | in the dark: Extremely low-light text image enhancement |
| text | independent speaker gender recognition using lip movement |
| text | Independent Writer Identification for Bengali Script |
| text | independent writer identification of Arabic manuscripts and the effects of writers increase |
| text | independent writer recognition using redundant writing patterns with contour-based orientation and curvature features |
| text | information extraction in images and video: a survey |
| text | Input System Using Online Overlapped Handwriting Recognition for Mobile Devices |
| text | Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval |
| text | Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval |
| text | is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation |
| text | is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation |
| text | is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation |
| text | line and word segmentation of handwritten documents |
| text | line bounding system |
| text | Line Characterization by Connected Component Transformations |
| text | Line Detection for Heterogeneous Documents |
| text | Line Detection in Corrupted and Damaged Historical Manuscripts |
| text | Line Detection in Document Images: Towards a Support System for the Blind |
| text | line detection in handwritten documents |
| text | Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach |
| text | Line Detection Method for Mathematical Formula Recognition, A |
| text | line extraction for historical document images |
| text | line extraction from multi-skewed handwritten documents |
| text | line extraction in document images |
| text | Line Extraction in Documents |
| text | line extraction in graphical documents using background and foreground information |
| text | Line Extraction in Handwritten Document with Kalman Filter Applied on Low Resolution Image |
| text | Line Extraction Method Using Domain-Based Active Contour Model |
| text | line extraction strategy for palm leaf manuscripts |
| text | Line Extraction Using Adaptive Partial Projection for Palm Leaf Manuscripts from Thailand |
| text | Line Extraction Using DMLP Classifiers for Historical Manuscripts |
| text | Line Extraction Using Fully Convolutional Network and Energy Minimization |
| text | line segmentation and word recognition in a system for general writer independent handwriting recognition |
| text | Line Segmentation Based on Morphology and Histogram Projection |
| text | Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis |
| text | line segmentation in Chinese handwritten text images |
| text | line segmentation in Chinese handwritten text images |
| text | Line Segmentation in Handwritten Documents Using Mumford-Shah Model |
| text | Line Segmentation in Images of Handwritten Historical Documents |
| text | Line Segmentation of Historical Arabic Documents |
| text | line segmentation of historical documents: a survey |
| text | line segmentation using a fully convolutional network in handwritten document images |
| text | Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis |
| text | Localization and Extraction from Complex Color Images |
| text | Localization and Extraction from Complex Gray Images |
| text | Localization and Recognition in Complex Scenes Using Local Features |
| text | Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions |
| text | Localization in Born-Digital Images of Advertisements |
| text | Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors |
| text | Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors |
| text | Localization in Natural Scene Images Based on Conditional Random Field |
| text | Localization in Real-World Images Using Efficiently Pruned Exhaustive Search |
| text | Localization in Web Images Using Probabilistic Candidate Selection Model |
| text | localization using image cues and text line information |
| text | localization using image cues and text line information |
| text | localization, enhancement and binarization in multimedia documents |
| text | locating from natural scene images using image intensities |
| text | location in complex images |
| text | Mining in Remotely Sensed Phenology Studies: A Review on Research Development, Main Topics, and Emerging Issues |
| text | Mining the Contributors to Rail Accidents |
| text | Motion Translator: A Bi-directional Model for Enhanced 3d Human Motion Generation from Open-vocabulary Descriptions |
| text | OCR by Solving a Cryptogram |
| text | only Analysis, Natural Language |
| text | optimization with latent inversion for non-rigid image editing |
| text | Page Recognition Using Grey-Level Features and Hidden Markov-Models |
| text | Parsing Using Spatial Information for Recognizing Addresses in Mail Pieces |
| text | Particles Multi-band Fusion for Robust Text Detection |
| text | Particles Multi-band Fusion for Robust Text Detection |
| text | Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the Wild |
| text | Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the Wild |
| text | Prior Guided Scene Text Image Super-Resolution |
| text | Prior Guided Scene Text Image Super-Resolution |
| text | Prompt Region Decomposition for Effective Facial Expression Recognition |
| text | Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection |
| text | Query based Traffic Video Event Retrieval with Global-Local Fusion Embedding |
| text | Query to Web Image to Video: A Comprehensive Ad-hoc Video Search |
| text | reading algorithm for natural images, A |
| text | Recognition - Real World Data and Where to Find Them |
| text | Recognition and Retrieval System for e-Business Image Management, A |
| text | recognition from grey level images using hidden Markov models |
| text | Recognition in Images Based on Transformer with Hierarchical Attention |
| text | recognition in multimedia documents: A study of two neural-based OCRs using and avoiding character segmentation |
| text | Recognition in Real Scenarios with a Few Labeled Samples |
| text | Recognition in the Wild: A Survey |
| text | recognition of low-resolution document images |
| text | Recognition System for Japanese Documents |
| text | recognition using deep BLSTM networks |
| text | Recognition: From Pixels to Meaning |
| text | Region Conditional Generative Adversarial Network for Text Concealment in the Wild |
| text | Region Conditional Generative Adversarial Network for Text Concealment in the Wild |
| text | region extraction and text segmentation on camera-captured document style images |
| text | region extraction and text segmentation on camera-captured document style images |
| text | Region Extraction from Quality Degraded Document Images |
| text | Region Extraction From Scene Images Using AGF and MSER |
| text | region extraction in a document image based on the Delaunay tessellation |
| text | retrieval from early printed books |
| text | scanner with text detection technology on image sequences |
| text | scanner with text detection technology on image sequences |
| text | search for medieval manuscript images |
| text | segmentation and recognition in complex background based on markov random field |
| text | Segmentation by Clustering Cohesion |
| text | Segmentation for MRC Document Compression |
| text | Segmentation from Complex Background Using Sparse Representations |
| text | segmentation in color images using tensor voting |
| text | Segmentation in Colour Posters from the Spanish Civil War Era |
| text | segmentation in natural scenes using Toggle-Mapping |
| text | Segmentation in Unconstrained Hand-Drawings in Whiteboard Photos |
| text | Segmentation of Consumer Magazines in PDF Format |
| text | Segmentation Using Gabor Filters for Automatic Document Processing |
| text | segmentation using superpixel clustering |
| text | selection by structured light marking for hand-held cameras |
| text | Separation from Mixed Documents Using a Tree-Structured Classifier |
| text | Similarity Measurement Method Based on BiLSTM-SECapsNet Model |
| text | Spotting Transformers |
| text | String Detection From Natural Scenes by Structure-Based Partition and Grouping |
| text | String Extraction from Images of Color-Printed Documents |
| text | Synopsis Generation for Egocentric Videos |
| text | to 3D Synthesis, Text to 3D Generation |
| text | to 3D Synthesis, Text to 3D Generation |
| text | to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning |
| text | to Image Generation with Semantic-Spatial Aware GAN |
| text | To Image Synthesis With Erudite Generative Adversarial Networks |
| text | to image synthesis with multi-granularity feature aware enhancement Generative Adversarial Networks |
| text | to Image, Image Based Rendering |
| text | to photo-realistic image synthesis via chained deep recurrent generative adversarial network |
| text | to Video Synthesis, Text to Motion |
| text | to Video Synthesis, Text to Motion |
| text | to visual synthesis with appearance models |
| text | Verification in an Automated System for the Extraction of Bibliographic Data |
| text | vs. Non-Text Regions |
| text | vs. Non-Text Regions |
| text | watermarking algorithm based on word classification and inter-word space statistics, A |
| text | with Knowledge Graph Augmented Transformer for Video Captioning |
| text | zone classification using unsupervised feature learning |
| text | - and speech-based phonotactic models for spoken language identification of Basque and Spanish |
| text | -anchored Score Composition: Tackling Condition Misalignment in Text-to-image Diffusion Models |
| text | -anchored Score Composition: Tackling Condition Misalignment in Text-to-image Diffusion Models |
| text | -Attentional Convolutional Neural Network for Scene Text Detection |
| text | -Attentional Convolutional Neural Network for Scene Text Detection |
| text | -augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification |
| text | -aware balloon extraction from manga |
| text | -aware image dehazing using stroke width transform |
| text | -Based Audio Retrieval by Learning From Similarities Between Audio Captions |
| text | -Based Fine-Grained Emotion Prediction |
| text | -based Geometric Normalization for Robust Watermarking of Digital Maps |
| text | -based image retrieval using progressive multi-instance learning |
| text | -Based Localization of Moments in a Video Corpus |
| text | -based Person Search via Attribute-aided Matching |
| text | -Based Person Search via Cross-Modal Alignment Learning |
| text | -based person search via fine-grained cross-modal semantic alignment |
| text | -Based Temporal Localization of Novel Events |
| text | -based visual context modulation neural model for multimodal machine translation, A |
| text | -Centric multimodal sentiment analysis with asymmetric fine-tuning |
| text | -Conditional Attribute Alignment Across Latent Spaces for 3D Controllable Face Image Synthesis |
| text | -Conditioned Generative Model of 3D Strand-Based Human Hairstyles |
| text | -conditioned Resampler For Long Form Video Understanding |
| text | -Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models |
| text | -Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models |
| text | -Controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion |
| text | -Controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion |
| text | -Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation |
| text | -Driven Automatic Frame Generation Using MPEG-4 Synthetic/Natural Hybrid Coding for 2-D Head-and-Shoulder Scene |
| text | -Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction |
| text | -Driven Generative Domain Adaptation with Spectral Consistency Regularization |
| text | -Driven Image Editing via Learnable Regions |
| text | -Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge |
| text | -driven Stylization of Video Objects |
| text | -Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos |
| text | -Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method |
| text | -Edge-Box: An Object Proposal Approach for Scene Texts Localization |
| text | -Edge-Box: An Object Proposal Approach for Scene Texts Localization |
| text | -Enhanced Data-Free Approach for Federated Class-Incremental Learning |
| text | -Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal Attention |
| text | -Enriched Air Traffic Flow Modeling and Prediction Using Transformers |
| text | -free diffusion inpainting using reference images for enhanced visual fidelity |
| text | -graphics separation to detect logo and stamp from color document images: A spectral approach |
| text | -Guided 3D Face Synthesis: From Generation to Editing |
| text | -guided camouflaged object detection |
| text | -Guided Coarse-to-Fine Fusion Network for robust remote sensing visual question answering |
| text | -guided distillation learning to diversify video embeddings for text-video retrieval |
| text | -guided distillation learning to diversify video embeddings for text-video retrieval |
| text | -Guided Explorable Image Super-Resolution |
| text | -Guided Eyeglasses Manipulation With Spatial Constraints |
| text | -Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning |
| text | -Guided Facial Image Manipulation for Wild Images via Manipulation Direction-Based Loss |
| text | -guided Fourier Augmentation for long-tailed recognition |
| text | -Guided Generation and Refinement Model for Image Captioning, A |
| text | -Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks |
| text | -Guided Human Image Manipulation via Image-Text Shared Space |
| text | -Guided Human Image Manipulation via Image-Text Shared Space |
| text | -Guided Multi-Class Multi-Object Tracking for Fine-Grained Maritime Rescue |
| text | -Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine |
| text | -Guided Object Detector for Multi-modal Video Question Answering |
| text | -Guided Patch Scoring and Local Distortion Guidance for Image Quality Assessment |
| text | -Guided Prototype Generation for Occluded Person Re-Identification |
| text | -Guided Reconstruction Network for Sentiment Analysis With Uncertain Missing Modalities |
| text | -Guided Semantic Alignment Network With Spatial-Frequency Interaction for Infrared-Visible Image Fusion Under Extreme Illumination |
| text | -guided Sparse Voxel Pruning for Efficient 3D Visual Grounding |
| text | -Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation |
| text | -Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation |
| text | -guided Video Masked Autoencoder |
| text | -guided visual representation learning for medical image retrieval systems |
| text | -guided weakly supervised framework for dynamic facial expression recognition |
| text | -IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion |
| text | -IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion |
| text | -Image Alignment for Diffusion-Based Perception |
| text | -image separation in Devanagari documents |
| text | -image super-resolution through anchored neighborhood regression with multiple class-specific dictionaries |
| text | -Independent Online Writer Identification Using Hidden Markov Models |
| text | -independent Persian writer identification based on feature relation graph (FRG), A |
| text | -independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram |
| text | -independent speaker recognition using graph matching |
| text | -independent speaker verification with ant colony optimization feature selection and support vector machine |
| text | -independent voice conversion using deep neural network based phonetic level features |
| text | -Independent Writer Identification and Verification on Offline Arabic Handwriting |
| text | -Independent Writer Identification and Verification Using Textural and Allographic Features |
| text | -Independent Writer Identification Based on Fusion of Dynamic and Static Features |
| text | -Independent Writer Identification on Online Arabic Handwriting |
| text | -independent writer identification using convolutional neural network |
| text | -independent writer identification using SIFT descriptor and contour-directional feature |
| text | -independent writer recognition using multi-script handwritten texts |
| text | -independent writer recognition using multi-script handwritten texts |
| text | -indicated writer verification using hidden Markov models |
| text | -Injected Discriminative Model for Remote Sensing Visual Grounding |
| text | -instance graph: Exploring the relational semantics for text-based visual question answering |
| text | -instance graph: Exploring the relational semantics for text-based visual question answering |
| text | -Line Detection in Camera-Captured Document Images Using the State Estimation of Connected Components |
| text | -line examination for document forgery detection |
| text | -line Extraction and Character Recognition of Document Headlines with Graphical Designs Using Complementary Similarity Measure |
| text | -Line Extraction and Character Recognition of Japanese Newspaper Headlines With Graphical Designs |
| text | -Line Extraction in Handwritten Chinese Documents Based on an Energy Minimization Framework |
| text | -Line Extraction Using a Convolution of Isotropic Gaussian Filter with a Set of Line Filters |
| text | -mining based journal splitting |
| text | -only weakly supervised learning framework for text spotting via text-to-polygon generator, A |
| text | -only weakly supervised learning framework for text spotting via text-to-polygon generator, A |
| text | -only weakly supervised learning framework for text spotting via text-to-polygon generator, A |
| text | -Pose Estimation in 3D Using Edge-Direction Distributions |
| text | -RGNNs: Relational Modeling for Heterogeneous Text Graphs |
| text | -RGNNs: Relational Modeling for Heterogeneous Text Graphs |
| text | -Scene Retrieval for Driving Scenes in Transportation Cyber-Physical Systems |
| text | -to-3D Generation with Bidirectional Diffusion Using Both 2D and 3D Priors |
| text | -to-3D using Gaussian Splatting |
| text | -to-Floorplan Synthesis via Graph-Conditioned Diffusion Processes |
| text | -to-Image Diffusion Models are Great Sketch-Photo Matchmakers |
| text | -to-image Editing by Image Information Removal |
| text | -to-Image Generation Grounded by Fine-Grained User Attention |
| text | -to-Image Generation via Semi-Supervised Training |
| text | -to-Image Models for Counterfactual Explanations: A Black-Box Approach |
| text | -to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network |
| text | -to-Image Synthesis based on Object-Guided Joint-Decoding Transformer |
| text | -to-Image Synthesis for Domain Generalization in Face Anti-Spoofing |
| text | -to-image synthesis with self-supervised bi-stage generative adversarial network |
| text | -to-image synthesis with self-supervised learning |
| text | -to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark |
| text | -to-image via mask anchor points |
| text | -to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction |
| text | -to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction |
| text | -to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression |
| text | -to-Traffic Generative Adversarial Network for Traffic Situation Generation |
| text | -to-video: a semantic search engine for internet videos |
| text | -Tracking Wearable Camera System for the Blind |
| text | -tracking wearable camera system for visually-impaired people |
| text | -Video Completion Using Structure Repair and Texture Propagation |
| text | -Video Knowledge Guided Prompting for Weakly Supervised Temporal Action Localization |
| text | -video retrieval re-ranking via multi-grained cross attention and frozen image encoders |
| text | -Video Retrieval With Global-LocalSemantic Consistent Learning |
| text | -Visual Prompting for Efficient 2D Temporal Video Grounding |
| text | /continuous tone image decision processor |
| text | /graphic labelling of ancient printed documents |
| text | /graphic separation using a sparse representation with multi-learned dictionaries |
| text | /Graphics Segmentation in Architectural Floor Plans |
| text | /Graphics Separation Revisited |
| text | /image separation method |
| text | /Non-Text Image Classification in the Wild with Convolutional Neural Networks |
| text | /Non-Text Image Classification in the Wild with Convolutional Neural Networks |
| text | /Non-text Ink Stroke Classification in Japanese Handwriting Based on Markov Random Fields |
| text | /Non-text Ink Stroke Classification in Japanese Handwriting Based on Markov Random Fields |
| text | /shape classifier for mobile applications with handwriting input |
| text | 2Avatar: Articulated 3D Avatar Creation With Text Instructions |
| text | 2Concept: Concept Activation Vectors Directly from Text |
| text | 2HOI: Text-Guided 3D Motion Generation for Hand-Object Interaction |
| text | 2LiDAR: Text-guided Lidar Point Cloud Generation via Equirectangular Transformer |
| text | 2LIVE: Text-Driven Layered Image and Video Editing |
| text | 2Mesh: Text-Driven Neural Stylization for Meshes |
| text | 2Performer: Text-Driven Human Video Generation |
| text | 2place: Affordance-aware Text Guided Human Placement |
| text | 2Pos: Text-to-Point-Cloud Cross-Modal Localization |
| text | 2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation |
| text | 2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models |
| text | 2Scene: Text-driven Indoor Scene Stylization with Part-Aware Details |
| text | 2Sketch: Learning Face Sketch from Facial Attribute Text |
| text | 2Tex: Text-driven Texture Synthesis via Diffusion Models |
| text | 2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators |
| text | 2Video: An End-to-end Learning Framework for Expressing Text With Videos |
| text | AdaIN: Paying Attention to Shortcut Learning in Text Recognizers |
| text | Adapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition |
| text | Aug: Test Time Text Augmentation for Multimodal Person Re-Identification |
| text | Boxes++: A Single-Shot Oriented Scene Text Detector |
| text | Catcher: a method to detect curved and challenging text in natural scenes |
| text | ContourNet: A Flexible and Effective Framework for Improving Scene Text Detection Architecture With a Multi-Task Cascade |
| text | Craftor: Your Text Encoder can be Image Quality Controller |
| text | DCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask |
| text | Diff: Enhancing scene text image super-resolution with mask-guided residual diffusion models |
| text | diffuser-2: Unleashing the Power of Language Models for Text Rendering |
| text | Dragon: An End-to-End Framework for Arbitrary Shaped Text Spotting |
| text | Face: Text-to-Style Mapping Based Face Generation and Manipulation |
| text | Field: Learning a Deep Direction Field for Irregular Scene Text Detection |
| text | Finder: An Automatic System to Detect and Recognize Text in Images |
| text | invision: Text and Prompt Complexity Driven Visual Text Generation Benchmark |
| text | invision: Text and Prompt Complexity Driven Visual Text Generation Benchmark |
| text | ManiA: Enriching Visual Feature by Text-driven Manifold Augmentation |
| text | Mesh: Generation of Realistic 3D Meshes From Text Prompts |
| text | Mountain: Accurate scene text detection via instance segmentation |
| text | NeRF: A Novel Scene-Text Image Synthesis Method Based on Neural Radiance Fields |
| text | Net: Irregular Text Reading from Images with an End-to-End Trainable Network |
| text | OCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text |
| text | Place: Visual Place Recognition and Topological Localization Through Reading Scene Texts |
| text | Proposals: A text-specific selective search algorithm for word spotting in the wild |
| text | ron: Weakly Supervised Multilingual Text Detection through Data Programming |
| text | RS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images |
| text | s as Images in Prompt Tuning for Multi-Label Image Recognition |
| text | s as points: Scene text detection with point supervision |
| text | s as points: Scene text detection with point supervision |
| text | SLAM: Visual SLAM With Semantic Planar Text Features |
| text | Snake: A Flexible Representation for Detecting Text of Arbitrary Shapes |
| text | SRNet: Scene Text Super-Resolution Based on Contour Prior and Atrous Convolution |
| text | StyleBrush: Transfer of Text Aesthetics From a Single Example |
| text | ual Alchemy: CoFormer for Scene Text Understanding |
| text | ual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching |
| text | ual Visual Semantic Dataset for Text Spotting |
| text | ual-visual Logic Challenge: Understanding and Reasoning in Text-to-image Generation |
| text | ure-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm |
| TFRGAN: Leveraging | text | Information for Blind Face Restoration with Extreme Degradation |
| TF²: Few-Shot | text | -Free Training-Free Defect Image Generation for Industrial Anomaly Inspection |
| TG-TSGNet: A | text | -Guided Arbitrary-Resolution Terrain Scene Generation Network |
| There and Back Again: 3D Sign Language Generation from | text | Using Back-Translation |
| Thinking Fast and Slow: Efficient | text | -to-Visual Retrieval with Transformers |
| Three decision levels strategy for Arabic and Latin | text | s differentiation in printed and handwritten natures |
| Three-Dimensional Lip Motion Network for | text | -Independent Speaker Recognition |
| Thresholding video images for | text | detection |
| TI2V-Zero: Zero-Shot Image Conditioning for | text | -to-Video Diffusion Models |
| TIAM - A Metric for Evaluating Alignment in | text | -to-Image Generation |
| Tibet: Identifying and Evaluating Biases in | text | -to-image Generative Models |
| Ticker: An Adaptive Single-Switch | text | Entry Method for Visually Impaired Users |
| TIED: A Cycle Consistent Encoder-Decoder Model for | text | -to-Image Retrieval |
| TieNet: | text | -Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays |
| TIFA: Accurate and Interpretable | text | -to-Image Faithfulness Evaluation with Question Answering |
| Tightness-Aware Evaluation Protocol for Scene | text | Detection |
| TIMTQE: Benchmarking Machine Translation Quality Estimation for | text | Images |
| TIPS: | text | -Induced Pose Synthesis |
| TISE: Bag of Metrics for | text | -to-Image Synthesis Evaluation |
| TJCMNet: An Efficient Vision- | text | Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification |
| Tk- | text | : Multi-shaped Scene Text Detection via Instance Segmentation |
| Tk- | text | : Multi-shaped Scene Text Detection via Instance Segmentation |
| TKDN: Scene | text | Detection via Keypoints Detection |
| TLDR: | text | Based Last-Layer Retraining for Debiasing Image Classifiers |
| TlTScore: Towards Long-Tail Effects in | text | -to-Visual Evaluation with Generative Foundation Models |
| TLWSR: Weakly supervised real-world scene | text | image super-resolution using text label |
| TLWSR: Weakly supervised real-world scene | text | image super-resolution using text label |
| TM2D: Bimodality Driven 3D Dance Generation via Music- | text | Integration |
| TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and | text | s |
| TMR: | text | -to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis |
| To Speak or to | text | : Effects of Display Type and I/O Style on Mobile Virtual Humans Nurse Training |
| To | text | or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging |
| To | text | or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging |
| To | text | or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging |
| Token-Mixer: Bind Image and | text | in One Embedding Space for Medical Image Reporting |
| TokenBinder: | text | -Video Retrieval with One-to-Many Alignment Paradigm |
| TokenCompose: | text | -to-Image Diffusion with Token-Level Supervision |
| Tokenfocus-VQA: Enhancing | text | -to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs |
| Tool for Ground-Truthing | text | Lines and Characters in Off-Line Handwritten Chinese Documents, A |
| Top-down and bottom-up cues for scene | text | recognition |
| Topic Language Model Adaption for Recognition of Homologous Offline Handwritten Chinese | text | Image |
| Total- | text | : toward orientation robustness in scene text detection |
| Total- | text | : toward orientation robustness in scene text detection |
| Toward Automation in | text | -Based Video Retrieval with LLM Assistance |
| Toward Integrated Scene | text | Reading |
| Toward Open-World | text | -Driven Face Generation and Manipulation via StyleGAN3 |
| Toward real | text | manipulation detection: New dataset and new solution |
| Toward | text | -independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset |
| Toward Understanding WordArt: Corner-Guided Transformer for Scene | text | Recognition |
| Toward Verifiable and Reproducible Human Evaluation for | text | -to-Image Generation |
| Towards Accurate Scene | text | Recognition With Semantic Reasoning Networks |
| Towards Accurate | text | -based Image Captioning with Content Diversity Exploration |
| Towards an Extensible and | text | -Oriented Analytical Semantic Trajectory Framework |
| Towards Automated Transcription of Label | text | from Pinned Insect Collections |
| Towards Cycle-Consistent Models for | text | and Image Retrieval |
| Towards Effective Usage of Human-Centric Priors in Diffusion Models for | text | -based Human Image Generation |
| Towards End-to-End | text | Spotting in Natural Scenes |
| Towards End-to-End | text | Spotting with Convolutional Recurrent Neural Networks |
| Towards End-to-End Unified Scene | text | Detection and Layout Analysis |
| Towards Fast and Accurate Image- | text | Retrieval With Self-Supervised Fine-Grained Alignment |
| Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image- | text | Pre-Training |
| Towards Generic | text | -Line Extraction |
| Towards High-Fidelity | text | -Guided 3D Face Generation and Manipulation Using only Images |
| Towards Implicit | text | -Guided 3D Shape Generation |
| Towards Improved | text | -Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text |
| Towards Improved | text | -Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text |
| Towards Improved | text | -Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text |
| Towards Interactive Facial Image Inpainting by | text | or Exemplar Image |
| Towards Language-Free Training for | text | -to-Image Generation |
| Towards Modelling an Attention-Based | text | Localization Process |
| Towards Open Domain | text | -driven Synthesis of Multi-person Motions |
| Towards open-set | text | recognition via label-to-prototype learning |
| Towards robust and efficient | text | sign reading from a mobile phone |
| Towards Robust Curve | text | Detection With Conditional Spatial Expansion |
| Towards Robust Tampered | text | Detection in Document Image: New Dataset and New Solution |
| Towards Robust | text | -Guided Image Compression Under Modality Missing |
| Towards Scalable Human-aligned Benchmark for | text | -guided Image Editing |
| Towards Specific Domain Prompt Learning via Improved | text | Label Optimization |
| Towards | text | -guided 3D Scene Composition |
| Towards the Unseen: Iterative | text | Recognition by Distilling from Errors |
| Towards Unconstrained End-to-End | text | Spotting |
| Towards Understanding and Quantifying Uncertainty for | text | -to-Image Generation |
| Towards Understanding Cross and Self-Attention in Stable Diffusion for | text | -Guided Image Editing |
| Towards Unified Scene | text | Spotting Based on Sequence Generation |
| Towards Weakly Supervised | text | -to-Audio Grounding |
| Towards Weakly-Supervised | text | Spotting using a Multi-Task Transformer |
| Towards Zero-Shot Multi-Speaker Multi-Accent | text | -to-Speech Synthesis |
| TP-LReID: Lifelong person re-identification using | text | prompts |
| TP2O: Creative | text | Pair-to-object Generation Using Balance Swap-Sampling |
| TPA-Seg: Multi-Class Nucleus Segmentation Using | text | Prompts and Cross-Attention |
| TPA3D: Triplane Attention for Fast | text | -to-3d Generation |
| TPD-STR: | text | Polygon Detection with Split Transformers |
| TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue | text | Cues |
| TPWGAN: Wavelet-aware | text | prior guided super-resolution for scene text images |
| TPWGAN: Wavelet-aware | text | prior guided super-resolution for scene text images |
| Trace Controlled | text | to Image Generation |
| Tracking Based Multi-Orientation Scene | text | Detection: A Unified Framework With Dynamic Programming |
| Traffic Video Event Retrieval via | text | Query using Vehicle Appearance and Motion Attributes |
| Training on severely degraded | text | -line images |
| Training-Free Color-Style Disentanglement for Constrained | text | -to-Image Synthesis |
| Training-Free Location-Aware | text | -to-Image Synthesis |
| Training-free subject-enhanced attention guidance for compositional | text | -to-image generation |
| TrAME: Trajectory-Anchored Multi-View Editing for | text | -Guided 3D Gaussian Manipulation |
| Transcript Mapping for Handwritten | text | Lines Using Conditional Random Fields |
| Transferable Adversarial Attacks for Deep Scene | text | Detection |
| Transferring Image-CLIP to Video- | text | Retrieval via Temporal Relations |
| Transferring Knowledge From | text | to Video: Zero-Shot Anticipation for Procedural Actions |
| Transform invariant | text | extraction |
| Transformation of arc-form- | text | to linear-form-text suitable for OCR |
| Transformation of arc-form- | text | to linear-form-text suitable for OCR |
| Transformer models for enhancing AttnGAN based | text | to image generation |
| Transformer Reasoning Network for Image- | text | Matching and Retrieval |
| Transformer-based | text | Detection in the Wild |
| Transparent | text | Detection and Background Recovery |
| TransPixeler: Advancing | text | -to-Video Generation with Transparency |
| Trans | text | Net: Transducing Text for Recognizing Unseen Visual Relationships |
| Tree structure for word extraction from handwritten | text | lines |
| TriCoLo: Trimodal Contrastive Loss for | text | to Shape Retrieval |
| TriMatch: Triple Matching for | text | -to-Image Person Re-Identification |
| Trinity Detector: | text | -Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection |
| TRIS: A multimodal and multitask framework for unifying | text | -image retrieval and referring image segmentation |
| TRTST: Arbitrary High-Quality | text | -Guided Style Transfer With Transformers |
| True color distributions of scene | text | and background |
| TS-RNN: | text | Steganalysis Based on Recurrent Neural Networks |
| TS2-Net: Token Shift and Selection Transformer for | text | -Video Retrieval |
| TSA-SCC: | text | Semantic-Aware Screen Content Coding With Ultra Low Bitrate |
| TSINIT: A Two-Stage Inpainting Network for Incomplete | text | |
| TTD: | text | -tag Self-distillation Enhancing Image-text Alignment in CLIP to Alleviate Single Tag Bias |
| TTD: | text | -tag Self-distillation Enhancing Image-text Alignment in CLIP to Alleviate Single Tag Bias |
| TTDNet: An End-to-End Traffic | text | Detection Framework for Open Driving Environments |
| TTS: Hilbert Transform-Based Generative Adversarial Network for Tattoo and Scene | text | Spotting |
| Tune-A-Video: One-Shot Tuning of Image Diffusion Models for | text | -to-Video Generation |
| Tuning-Free Image Customization with Image and | text | Guidance |
| Turbo3D: Ultra-fast | text | -to-3D Generation |
| Turboedit: Instant | text | -based Image Editing |
| TurboFill: Adapting Few-step | text | -to-image Model for Fast Image Inpainting |
| Turning a CLIP Model into a Scene | text | Detector |
| Turning a CLIP Model Into a Scene | text | Spotter |
| TV Commercial Detection Based on Shot Change and | text | Extraction |
| TV Program Classification Based on Face and | text | Processing |
| TV program segmentation using | text | -visual analysis |
| TVI-MFAN: A | text | -Visual Interaction Multilevel Feature Alignment Network for Visual Grounding in Remote Sensing |
| TVMTrailer: A | text | -Video-Music AIGC Framework for Film Trailer Generation |
| TWD: A New Deep E2E Model for | text | Watermark/Caption and Scene Text Detection in Video |
| TWD: A New Deep E2E Model for | text | Watermark/Caption and Scene Text Detection in Video |
| Twitter Stream Analysis, Tweets, | text | s, SMS, Internet |
| Two approaches for | text | segmentation in web images |
| Two combination stages of clustered One-Class Classifiers for writer identification from | text | fragments |
| Two Stage SVM and kNN | text | Documents Classifier |
| Two-Level Rectification Attention Network for Scene | text | Recognition, A |
| Two-Pass Clustering Technique for Orientation-Invariant and Language-Independent | text | Localization |
| Two-stage hybrid binarization around fringe map based | text | line segmentation for document images |
| two-stage method for | text | line detection in historical documents, A |
| Two-stage Multimodality Fusion for High-performance | text | -based Visual Question Answering |
| Two-stage partial image- | text | clustering (TPIT-C) |
| two-stage scheme for | text | detection in video images, A |
| Two-Stage Seamless | text | Erasing on Real-World Scene Images |
| Txt2Img-MHN: Remote Sensing Image Generation From | text | Using Modern Hopfield Networks |
| Type-2 Fuzzy GMMs for Robust | text | -Independent Speaker Verification in Noisy Environments |
| Type-R: Automatically Retouching Typos for | text | -to-Image Generation |
| Typing in Mid Air: Assessing One- and Two-Handed | text | Input Methods of the Microsoft HoloLens 2 |
| Typographical Features for Scene | text | Recognition |
| Typography With Decor: Intelligent | text | Style Transfer |
| UATST: Towards unpaired arbitrary | text | -guided style transfer with cross-space modulation |
| UATVR: Uncertainty-Adaptive | text | -Video Retrieval |
| Udiff | text | : A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models |
| UFineBench: Towards | text | -based Person Retrieval with Ultra-Fine Granularity |
| UFOGen: You Forward Once Large Scale | text | -to-Image Generation via Diffusion GANs |
| UHaT: Urdu handwritten | text | dataset |
| Ump: Unified Modality-Aware Prompt Tuning for | text | -Video Retrieval |
| Unambiguous Scene | text | Segmentation With Referring Expression Comprehension |
| Unambiguous | text | Localization and Retrieval for Cluttered Scenes |
| Unambiguous | text | Localization, Retrieval, and Recognition for Cluttered Scenes |
| Unconstrained end-to-end | text | reading with feature rectification |
| Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from | text | to Image via CLIP Inversion |
| Uncorrelated Geo- | text | Inhibition Method Based on Voronoi K-Order and Spatial Correlations in Web Maps |
| Uncovering the Disentanglement Capability in | text | -to-Image Diffusion Models |
| Uncurated Image- | text | Datasets: Shedding Light on Demographic Bias |
| Understand Layout and Translate | text | : Unified Feature-Conductive End-to-End Document Image Translation |
| Understanding and Mitigating Toxicity in Image- | text | Pretraining Datasets: A Case Study on LLaVA |
| Understanding Handwritten | text | in a Structured Environment |
| Understanding Plane Geometry Problems by Integrating Relations Extracted from | text | and Diagram |
| Understanding Video Scenes through | text | : Insights from Text-based Video Question Answering |
| Understanding Video Scenes through | text | : Insights from Text-based Video Question Answering |
| UniCanvas: Affordance-Aware Unified Real Image Editing via Customized | text | -to-Image Generation |
| Unidream: Unifying Diffusion Priors for Relightable | text | -to-3d Generation |
| Unified Adaptive Relevance Distinguishable Attention Network for Image- | text | Matching |
| Unified Approach for | text | -and Image-Guided 4D Scene Generation, A |
| Unified Coarse-to-Fine Alignment for Video- | text | Retrieval |
| Unified Contrastive Learning in Image- | text | -Label Space |
| Unified Framework for Multioriented | text | Detection and Recognition, A |
| Unified Framework for Tracking Based | text | Detection and Recognition from Web Videos, A |
| unified framework of data augmentation using large language models for | text | -based cross-modal retrieval, A |
| Unified learning for image- | text | alignment via multi-scale feature fusion |
| unified method for augmented incremental recognition of online handwritten Japanese and English | text | , A |
| Unified Performance Evaluation for OCR Zoning: Calculating Page Segmentation's Score, That Includes | text | Zones, Tables and Non-text Objects |
| Unified Performance Evaluation for OCR Zoning: Calculating Page Segmentation's Score, That Includes | text | Zones, Tables and Non-text Objects |
| Unified Pre-training with Pseudo | text | s for Text-To-Image Person Re-identification |
| Unified Pre-training with Pseudo | text | s for Text-To-Image Person Re-identification |
| Unified Prompt Attack Against | text | -to-Image Generation Models |
| Unified | text | Extraction Method for Instructional Videos, A |
| Unifying Vision, | text | , and Layout for Universal Document Processing |
| UniMultNet: Action recognition method based on multi-scale feature fusion and video- | text | constraint guidance |
| Uniprocessor: A | text | -induced Unified Low-level Image Processor |
| unique approach in | text | independent speaker recognition using MFCC feature sets and probabilistic neural network, A |
| Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing | text | -to-Speech System in Hindi |
| UniTAB: Unifying | text | and Box Outputs for Grounded Vision-Language Modeling |
| Uniter: Universal Image- | text | Representation Learning |
| UniTMGE: Uniform | text | -Motion Generation and Editing Model via Diffusion |
| Unleashing | text | -to-Image Diffusion Models for Visual Perception |
| Unleashing | text | -to-image Diffusion Prior for Zero-shot Image Captioning |
| Unlocking | text | ual and Visual Wisdom: Open-vocabulary 3d Object Detection Enhanced by Comprehensive Guidance from Text and Image |
| Unpaired Image- | text | Matching via Multimodal Aligned Conceptual Knowledge |
| Unsupervised Alignment of News Video and | text | Using Visual Patterns and Textual Concepts |
| Unsupervised Approach for Video | text | Localization, An |
| Unsupervised Block Covering Analysis for | text | -Line Segmentation of Arabic Ancient Handwritten Document Images |
| Unsupervised categorization of heterogeneous | text | images based on fractals |
| Unsupervised clustering of | text | entities in heterogeneous grey level documents |
| Unsupervised Co-Generation of Foreground-Background Segmentation from | text | -to-Image Synthesis |
| Unsupervised Compositional Concepts Discovery with | text | -to-Image Generative Models |
| Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image- | text | Correspondences in Remote Sensing, An |
| Unsupervised deep learning for | text | line segmentation |
| Unsupervised Domain Adaptation via Class Aggregation for | text | Recognition |
| Unsupervised Domain Adaptation with Imbalanced Character Distribution for Scene | text | Recognition |
| Unsupervised Image and | text | Fusion for Travel Information Enhancement |
| Unsupervised language model adaptation for handwritten Chinese | text | recognition |
| Unsupervised Prompt Tuning for | text | -Driven Object Detection |
| Unsupervised refinement of color and stroke features for | text | binarization |
| Unsupervised Segmentation of | text | Fragments in Real Scenes |
| Unsupervised Speech | text | Localization in Comic Images |
| Unsupervised | text | Segmentation Using Color and Wavelet Features |
| Unsupervised | text | -to-image synthesis |
| Unsupervised writer adaptation applied to handwritten | text | recognition |
| Unveiling and Mitigating Memorization in | text | -to-image Diffusion Models Through Cross Attention |
| UP-Person: Unified Parameter-Efficient Transfer Learning for | text | -Based Person Retrieval |
| Urdu handwritten | text | recognition: a survey |
| Usage-Oriented Performance Evaluation for | text | Localization Algorithms |
| Use of a Dictionary in Conjunction with a Handwritten | text | s Recognizer |
| Use of an Evolutive Base of Models in a System for Reading Printed | text | s |
| Use of Captions and Other Collateral | text | in Understanding Photographs |
| Use of Collateral | text | in Image Interpretation |
| Use of Collateral | text | in Understanding Photos in Documents |
| Use of Global Con | text | in Text Recognition, The |
| Use of the Hough transform to separate merged | text | /graphics in forms |
| USER: Unified Semantic Enhancement With Momentum Contrast for Image- | text | Retrieval |
| Using a boosted tree classifier for | text | segmentation in hand-annotated documents |
| Using a Probabilistic Syllable Model to Improve Scene | text | Recognition |
| Using Adaptive Run Length Smoothing Algorithm for Accurate | text | Localization in Images |
| Using an Exact Performance of Hough Transform for Image | text | Segmentation |
| Using Biographical | text | s as Linked Data for Prosopographical Research and Applications |
| Using double attention for | text | tattoo localisation |
| Using Hidden Markov Models as a Tool for Handwritten | text | Line Segmentation |
| Using histogram representation and Earth Mover's Distance as an evaluation tool for | text | detection |
| Using irregular pyramid for | text | segmentation and binarization of gray scale images |
| Using Kernel Density Classifier with Topic Model and Cost Sensitive Learning for Automatic | text | Categorization |
| Using Large | text | To Image Models with Structured Prompts for Skin Disease Identification: A Case Study |
| Using Mouse Feedback in Computer Assisted Transcription of Handwritten | text | Images |
| Using Multimodal Contrastive Knowledge Distillation for Video- | text | Retrieval |
| Using Multiple Frame Integration for the | text | Recognition of Video |
| Using Object Information for Spotting | text | |
| Using pyramid of histogram of oriented gradients on natural scene | text | recognition |
| Using Readers' Highlighting on Monochromatic Documents for Automatic | text | Transcription and Summarization |
| Using Scale-Space Anisotropic Smoothing for | text | Line Extraction in Historical Documents |
| Using Shape and Layout Information to Find Signatures, | text | , and Graphics |
| Using | text | to Teach Image Retrieval |
| Using Typical Testors for Feature Selection in | text | Categorization |
| Using web search engines to improve | text | recognition |
| Using Webcast | text | for Semantic Event Detection in Broadcast Sports Video |
| UT-GAN: A Novel Unpaired | text | ual-Attention Generative Adversarial Network for Low-Light Text Image Enhancement |
| Utilization of | text | ure, contrast and color homogeneity for detecting and recognizing text from video frames |
| Uyghur Language | text | Detection in Complex Background Images Using Enhanced MSERs |
| Uyghur | text | Localization with Fast Component Detection |
| VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for | text | -to-Image Generative Models |
| Variable-Length Speaker Conditioning in Flow-Based | text | -to-Speech |
| Variance Based Image Binarization Scheme and Its Application in | text | Segmentation, A |
| Variational Bayes Method for Handwritten | text | Line Segmentation, A |
| Variational Distribution Learning for Unsupervised | text | -to-Image Generation |
| Variational DNN embeddings for | text | -independent speaker verification |
| VATr++: Choose Your Words Wisely for Handwritten | text | Generation |
| VCD- | text | ure: Variance Alignment Based 3D-2D Co-Denoising for Text-Guided Texturing |
| Vector Field Decomposition-Based Flow Matching for Zero-Shot Cross-Lingual | text | -to-Speech |
| Vector Quantized Diffusion Model for | text | -to-Image Synthesis |
| VectorFusion: | text | -to-SVG by Abstracting Pixel-Based Diffusion Models |
| VerbDiff: | text | -Only Diffusion Models with Enhanced Interaction Awareness |
| Verisimilar Image Synthesis for Accurate Detection and Recognition of | text | s in Scenes |
| Versatile Diffusion: | text | , Images and Variations All in One Diffusion Model |
| Vertical bar detection for gauging | text | similarity of document images |
| Vesselness for | text | detection in historical document images |
| VGSG: Vision-Guided Semantic-Group Network for | text | -Based Person Search |
| VicTR: Video-conditioned | text | Representations for Activity Recognition |
| Video Analysis -- Captions, | text | , Video Text |
| Video Analysis -- Captions, | text | , Video Text |
| Video and | text | Matching with Conditioned Embeddings |
| Video and | text | semantic center alignment for text-video cross-modal retrieval |
| Video and | text | semantic center alignment for text-video cross-modal retrieval |
| Video captioning with | text | -based dynamic attention and step-by-step learning |
| Video Diffusion, Video Sysnthesis, | text | to Video |
| Video Frame-wise Explanation Driven Contrastive Learning for Procedural | text | Generation |
| Video Generation from | text | Employing Latent Path Construction for Temporal Modeling |
| Video google: A | text | retrieval approach to object matching in videos |
| Video Question Answering Using Clip-Guided Visual- | text | Attention |
| Video Question Answering with Iterative Video- | text | Co-tokenization |
| Video Scene | text | Frames Categorization for Text Detection and Recognition |
| Video Scene | text | Frames Categorization for Text Detection and Recognition |
| Video Script Identification Based on | text | Lines |
| Video search in concept subspace: a | text | -like paradigm |
| Video Search with CLIP and Interactive | text | Query Reformulation |
| Video | text | detection and recognition: Dataset and benchmark |
| Video | text | Detection System Based on Automated Training, A |
| Video | text | Detection With Robust Feature Representation |
| Video | text | Extraction Using the Fusion of Color Gradient and Log-Gabor Filter |
| video | text | location method based on background classification, A |
| Video | text | recognition using feature compensation as category-dependent feature extraction |
| Video | text | recognition using sequential Monte Carlo and error voting methods |
| Video | text | Tracking with a Spatio-Temporal Complementary Model |
| Video, | text | , and Speech-Driven Realistic 3-D Virtual Head for Human-Machine Interface, A |
| Video-ColBERT: Con | text | ualized Late Interaction for Text-to-Video Retrieval |
| Video- | text | as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning |
| Video- | text | Compliance: Activity Verification Based on Natural Language Instructions |
| Video- | text | Representation Learning via Differentiable Weak Temporal Alignment |
| VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video- | text | Models |
| VideoDirector: Precise Video Editing via | text | -to-Video Models |
| VideoDreamer: Customized Multi-Subject | text | -to-Video Generation With Disen-Mix Finetuning on Language-Video Foundation Models |
| VideoMage: Multi-Subject and Motion Customization of | text | -to-Video Diffusion Models |
| ViewDiff: 3D-Consistent Image Generation with | text | -to-Image Models |
| ViLEM: Visual-Language Error Modeling for Image- | text | Retrieval |
| ViLNM: Visual-Language Noise Modeling for | text | -to-Image Person Retrieval |
| VimTS: A Unified Video and Image | text | Spotter for Enhancing the Cross-Domain Generalization |
| VinTAGe: Joint Video and | text | Conditioning for Holistic Audio Generation |
| VIRES: Video Instance Repainting via Sketch and | text | Guided Generation |
| Vision and | text | Transformer for Predicting Answerability on Visual Question Answering |
| Vision-Aware | text | Features in Referring Image Segmentation: From Object Understanding to Context Understanding |
| Vision-Language Matching for | text | -to-Image Synthesis via Generative Adversarial Networks |
| Vision-Language Pre-Training for Boosting Scene | text | Detectors |
| Vision-Language Relational Transformer for Video-to- | text | Generation |
| ViSTA: Vision and Scene | text | Aggregation for Cross-Modal Retrieval |
| Visual and | text | prompts guided interpretable network for universal low-dose CT MAR |
| Visual Attention Based Approach to | text | Extraction, A |
| Visual enhancement of incised | text | |
| Visual Re-ranking with Natural Language Understanding for | text | Spotting |
| Visual Semantic Reasoning for Image- | text | Matching |
| Visual Semantics: Extracting Visual Information from | text | Accompanying Pictures |
| Visual speaker authentication with random prompt | text | s by a dual-task CNN framework |
| Visual | text | Correction |
| Visual | text | Generation in the Wild |
| Visual | text | Recognition Through Contextual Processing |
| Visual Word Embedding for | text | Classification |
| Visual-Aware | text | as Query for Referring Video Object Segmentation |
| Visual-relation Conscious Image Generation from Structured- | text | |
| Visual- | text | ual Capsule Routing for Text-Based Video Segmentation |
| Visualizing Unstructured | text | Sequences Using Iterative Visual Clustering |
| Visually-Enabled Active Deep Learning for (Geo) | text | and Image Classification: A Review |
| VisualRAG: Knowledge-Guided Retrieval Augmentation for Image- | text | Matching |
| Vita-CLIP: Video and | text | adaptive CLIP via Multimodal Prompting |
| ViTA: An Efficient Video-to- | text | Algorithm using VLM for RAG-based Video Analysis System |
| Viterbi algorithm as an aid in | text | recognition, The |
| Vividdreamer: Invariant Score Distillation for Hyper-realistic | text | -to-3d Generation |
| VMC: Video Motion Customization Using Temporal Attention Adaption for | text | -to-Video Diffusion Models |
| VODiff: Controlling Object Visibility Order in | text | -to-Image Generation |
| VOLTER: Visual Collaboration and Dual-Stream Fusion for Scene | text | Recognition |
| VolTex: Food Volume Estimation Using | text | -Guided Segmentation and Neural Surface Reconstruction |
| VoP: | text | -Video Co-Operative Prompt Tuning for Cross-Modal Retrieval |
| Vox-E: | text | -guided Voxel Editing of 3D Objects |
| VP3D: Unleashing 2D Visual Prompt for | text | -to-3D Generation |
| VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image- | text | Matching |
| VSRNet: End-to-end video segment retrieval with | text | query |
| VTC: Improving Video- | text | Retrieval with User Comments |
| VTD-FCENet: A Real-Time HD Video | text | Detection with Scale-Aware Fourier Contour Embedding |
| VTPL: Visual and | text | prompt learning for visual-language models |
| VTQA: Visual | text | Question Answering via Entity Alignment and Cross-Media Reasoning |
| VX2 | text | : End-to-End Learning of Video-Based Text Generation From Multimodal Inputs |
| W-A net: Leveraging Atrous and Deformable Convolutions for Efficient | text | Detection |
| Wacnet: Word Segmentation Guided Characters Aggregation Net for Scene | text | Spotting With Arbitrary Shapes |
| Was: Dataset and Methods for Artistic | text | Segmentation |
| Watch Your Steps: Local Image and Scene Editing by | text | Instructions |
| Watch Your Strokes: Improving Handwritten | text | Recognition with Deformable Convolutions |
| Watermark Removal Attack Against | text | -to-Image Generative Model Watermarking |
| Watermarking JBIG2 | text | Region for Image Authentication |
| Watermarking | text | document images using edge direction histograms |
| WaterVG: Waterway Visual Grounding Based on | text | -Guided Vision and mmWave Radar |
| Wave: Warping Ddim Inversion Features for Zero-shot | text | -to-video Editing |
| Wavelet feature domain adaptive noise reduction using learning algorithm for | text | -independent speaker recognition |
| Wavelet feature selection based neural networks with application to the | text | independent speaker identification |
| Wavelet-gradient-fusion for video | text | binarization |
| Weak supervision for generating pixel-level annotations in scene | text | segmentation |
| Weakly Supervised Attention Rectification for Scene | text | Recognition |
| Weakly Supervised Salient Object Detection with | text | Supervision |
| Weakly Supervised | text | -based Person Re-Identification |
| Weakly Supervised Video Moment Retrieval From | text | Queries |
| Weakly Supervised Video Representation Learning with Unaligned | text | for Sequential Videos |
| Weakly-Supervised 3D Spatial Reasoning for | text | -Based Visual Question Answering |
| Weakly-Supervised Alignment of Video with | text | |
| Weakly-Supervised | text | -driven Contrastive Learning for Facial Behavior Understanding |
| WEB Image Classification Based on the Fusion of Image and | text | Classifiers |
| Webly Supervised Image- | text | Embedding with Noisy Tag Refinement |
| WECROMCL: Weakly Supervised Cross-modality Contrastive Learning for Transcription-only Supervised | text | Spotting |
| Weighted Graph Embedding Feature with Bi-Directional Long Short-Term Memory Classifier for Multi-Document | text | Summarization |
| Well-calibrated confidence measures for multi-label | text | classification with a large number of labels |
| WeStcoin: Weakly-Supervised Con | text | ualized Text Classification with Imbalance and Noisy Labels |
| We | text | : Scene Text Detection under Weak Supervision |
| WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short | text | |
| What Are You Talking About? | text | -to-Image Coreference |
| What does scene | text | tell us? |
| What If We Only Use Real Datasets for Scene | text | Recognition? Toward Scene Text Recognition With Fewer Labels |
| What If We Only Use Real Datasets for Scene | text | Recognition? Toward Scene Text Recognition With Fewer Labels |
| What is a good evaluation protocol for | text | localization systems? Concerns, arguments, comparisons and solutions |
| What is the Real Need for Scene | text | Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties |
| What Is Wrong With Scene | text | Recognition Model Comparisons? Dataset and Model Analysis |
| What Machines See Is Not What They Get: Fooling Scene | text | Recognition Models With Adversarial Text Images |
| What Machines See Is Not What They Get: Fooling Scene | text | Recognition Models With Adversarial Text Images |
| When IC meets | text | : Towards a rich annotated integrated circuit text dataset |
| When IC meets | text | : Towards a rich annotated integrated circuit text dataset |
| Where you edit is what you get: | text | -guided image editing with region-based attention |
| Which super-resolution algorithm is proper for Farsi | text | image sequences |
| Who's Waldo? Linking People Across | text | and Images |
| Whole is Greater than Sum of Parts: Recognizing Scene | text | Words |
| Wikipedia-based semantic tensor space model for | text | analytics, A |
| Word Extraction from On-Line Handwritten | text | Lines |
| Word Image Matching as a Technique for Degraded | text | Recognition |
| Word segmentation in handwritten Korean | text | lines based on gap clustering techniques |
| Word segmentation of printed | text | lines based on gap clustering and special symbol detection |
| Word separation of unconstrained handwritten | text | lines in PCR forms |
| Word Shape Analysis in a Knowledge-Based System for Reading | text | |
| Word spotting and recognition via a joint deep embedding of image and | text | |
| Wordfence: | text | detection in natural images with border awareness |
| Wordrobe: | text | -guided Generation of Textured 3d Garments |
| Words Matter: Scene | text | for Image Classification and Retrieval |
| Words or Vision: Do Vision-Language Models Have Blind Faith in | text | ? |
| WordSup: Exploiting Word Annotations for Character Based | text | Detection |
| WOUAF: Weight Modulation for User Attribution and Fingerprinting in | text | -to-Image Diffusion Models |
| Write a Classifier: Predicting Visual Classifiers from Unstructured | text | |
| Writer identification using | text | line based features |
| Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese | text | recognition |
| Writing speed normalization for on-line handwritten | text | recognition |
| X-Edit: Detecting and Localizing Edits in Images Altered by | text | -Guided Diffusion Models |
| X-Mesh: Towards Fast and Accurate | text | -driven 3D Stylization via Dynamic Textual Guidance |
| X-Pool: Cross-Modal Language-Video Attention for | text | -Video Retrieval |
| You'll Never Walk Alone: A Sketch and | text | Duet for Fine-Grained Image Retrieval |
| Your Student is Better than Expected: Adaptive Teacher-Student Collaboration for | text | -Conditional Diffusion Models |
| Zero-Painter: Training-Free Layout Control for | text | -to-Image Synthesis |
| Zero-Shot Composed Image Retrieval Considering Query-Target Relationship Leveraging Masked Image- | text | Pairs |
| Zero-Shot Contrastive Loss for | text | -Guided Diffusion Image Style Transfer |
| Zero-shot skeleton-based action recognition with dual visual- | text | alignment |
| Zero-shot spatial layout conditioning for | text | -to-image diffusion models |
| Zero-Shot Styled | text | Image Generation, but Make It Autoregressive |
| Zero-Shot Temporal Action Detection by Learning Multimodal Prompts and | text | -Enhanced Actionness |
| Zero-Shot | text | Classification with Semantically Extended Graph Convolutional Network |
| Zero-Shot | text | -Driven Dynamic Neural Radiance Fields Stylization |
| Zero-Shot | text | -Guided Object Generation with Dream Fields |
| Zero-Shot | text | -to-Parameter Translation for Game Character Auto-Creation |
| Zero-Shot Video Moment Retrieval With Angular Reconstructive | text | Embeddings |
| ZeroCap: Zero-Shot Image-to- | text | Generation for Visual-Semantic Arithmetic |
| Zone identification in the printed Gujarati | text | |
4233 for text