Index for text

_text_
2D and 3D Video Scene text Classification
2LSPE: 2D Learnable Sinusoidal Positional Encoding using Transformer for Scene text Recognition
360PanT: Training-Free text-Driven 360-Degree Panorama-to-Panorama Translation
3D Highlighter: Localizing Regions on 3D Shapes via text Descriptions
3D Human Motion Generation from the text Via Gesture Action Classification and the Autoregressive Model
3D-Aware text-Driven Talking Avatar Generation
3D-SceneDreamer: text-Driven 3D-Consistent Scene Generation
3D-VisTA: Pre-trained Transformer for 3D Vision and text Alignment
4D-fy: text-to-4D Generation Using Hybrid Score Distillation Sampling
A-STAR: Test-time Attention Segregation and Retention for text-to-image Synthesis
ABCNet v2: Adaptive Bezier-Curve Network for Real-Time End-to-End text Spotting
ABCNet: Real-Time Scene text Spotting With Adaptive Bezier-Curve Network
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene text Spotting
Ablating Concepts in text-to-Image Diffusion Models
Accurate Arbitrary-Shaped Scene text Detection via Iterative Polynomial Parameter Regression
Accurate Detection for Scene texts with a Cascaded CNN Networks
Accurate Scene text Detection Through Border Semantics Awareness and Bootstrapping
Accurate Scene text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint
Accurate Scene text Recognition Based on Recurrent Neural Network
Accurate Scene text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Accurate Segmentation-Based Scene text Detector with Context Attention and Repulsive Text Border, An
Accurate Segmentation-Based Scene text Detector with Context Attention and Repulsive Text Border, An
Accurate text localization in images based on SVM output scores
Accurate Threshold Insensitive Kernel Detector for Arbitrary Shaped text, An
Accurate video text detection through classification of low and high contrast images
Accurate, data-efficient, unconstrained text recognition with convolutional neural networks
ACE: Anti-Editing Concept Erasure in text-to-Image Models
Acquire and then Adapt: Squeezing out text-to-Image Model for Image Restoration
ActBERT: Learning Global-Local Video-text Representations
Active Collection of Land Cover Sample Data from Geo-Tagged Web texts
Active Contours Network to Straighten Distorted text Lines
active learning approach to frequent itemset-based text clustering, An
Active Learning With Complementary Sampling for Instructing Class-Biased Multi-Label text Emotion Classification
Activity Recognition Applications from Contextual Video-Text Fusion
ActivityCLIP: Enhancing group activity recognition by mining complementary information from text to supplement image modality
Actor and Action Modular Network for text-Based Video Segmentation
AdaBoost for text Detection in Natural Scene
Adapting Style and Content for Attended text Sequence Recognition
Adapting text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
Adaptive Algorithm for text Detection from Natural Scenes, An
Adaptive Boundary Proposal Network for Arbitrary Shape text Detection
Adaptive Correlation Filtering Method for text-Based Person Search, An
Adaptive Fuzzy text Segmentation in Images with Complex Backgrounds Using Color and Texture
Adaptive fuzzy wavelet algorithm for text-independent speaker recognition
Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured text
Adaptive Latent Graph Representation Learning for Image-text Matching
Adaptive method for multi colored text binarization
Adaptive multi-text union for stable text-to-image synthesis learning
Adaptive multi-text union for stable text-to-image synthesis learning
Adaptive Offline Quintuplet Loss for Image-text Matching
Adaptive Region Growing Color Segmentation for text Using Irregular Pyramid
Adaptive Scene text Detection Based on Transferring Adaboost
Adaptive scene-text binarisation on images captured by smartphones
Adaptive Script-Independent Block-Based text Line Extraction, An
Adaptive Script-Independent text Line Extraction
Adaptive text Recognition Through Visual Matching
Adding Conditional Control to text-to-Image Diffusion Models
Addressing Information Inequality for text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments
ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene text Detection
Advance One-Shot Multispectral Instance Detection With text's Supervision
Advancing Zero-Shot Digital Human Quality Assessment Through text-Prompted Evaluation
Adversarial and Isotropic Gradient Augmentation for Image Retrieval With text Feedback
Adversarial Attribute-text Embedding for Person Search With Natural Language Query
Adversarial learning based attentional scene text recognizer
Adversarial Representation Learning for text-to-Image Matching
Adversarial Robustification via text-to-image Diffusion Models
Adversarial Synthesis of Human Pose from text
Adversarial text to Continuous Image Generation
Adversarial Training Lattice LSTM for Named Entity Recognition of Rail Fault texts
ADVMIX: Data Augmentation for Accurate Scene text Spotting
Ae textspotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
AEA-FIRM: Adaptive Elastic Alignment with Fine-Grained Representation Mining for text-Based Aerial Pedestrian Retrieval
Aesthetic text Logo Synthesis via Content-aware Layout Inferring
Affective Image Editing: Shaping Emotional Factors via text Descriptions
Affective Image Filter: Reflecting Emotions from text to Images
Agent-Based Control Prompt Tuning for Video-text Retrieval
Aggregating Image and text Quantized Correlated Components
Aggregating Local and Global text Features for Linguistic Steganalysis
Aggregating Local Context for Accurate Scene Text Detection
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of text-to-Video Generation with LMM
AITTI: Learning Adaptive Inclusive Token for text-to-Image Generation
Aletheia: An Advanced Document Layout and text Ground-Truthing System for Production Environments
Algorithm for Colour-Based Natural Scene text Segmentation, An
Algorithm for Matching OCR-Generated text Strings, An
Algorithm for Reducing text Line Candidates of Incorrect Orientation, An
Algorithm for text page up/down orientation determination
Algorithms for compressing compound document images with large text/background overlap
ALIF: A dataset for Arabic embedded text recognition in TV broadcast
Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With text Feedback
Align Your Gaussians: text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Aligning text and Document Illustrations: Towards Visually Explainable Digital Humanities
Aligning text-to-Image Diffusion Models With Constrained Reinforcement Learning
AlignIT: Enhancing Prompt Alignment in Customization of text-to-Image Models
Alignment and Generation Adapter for Efficient Video-text Understanding
Alignment of Curved text Strings for Enhanced OCR Readability
Alignment of free layout color texts for character recognition
Alignment of Paragraphs in Bilingual texts Using Bilingual Dictionaries and Dynamic Programming
All You Need Is a Second Look: Towards Arbitrary-Shaped text Detection
ALR-GAN: Adaptive Layout Refinement for text-to-Image Synthesis
alternative framework for univariate filter based feature selection for text categorization, An
ALTID : Arabic/Latin text Images Database for recognition research
AMITA: Attribute-Guided Masked Image-text Alignment for Multi-Label Image Representation
AMO Sampler: Enhancing text Rendering with Overshooting
Analysis of Features and Metrics for Alignment in text-Dependent Voice Conversion
Analysis of the Novel Transformer Module Combination for Scene text Recognition
Analytical evaluation of term weighting schemes for text categorization
anchor-free region proposal network for Faster R-CNN-based text detection approaches, An
Ancient document analysis based on text line extraction
AniClipart: Clipart Animation with text-to-Video Priors
Animatabledreamer: text-guided Non-rigid 3d Model Generation and Reconstruction with Canonical Score Distillation
AniMo: Species-Aware Model for text-Driven Animal Motion Generation
Annotated Databases for the Recognition of Screen-Rendered text
ANNP: a neural network parser for real world texts
Anonymizing Temporal Phrases in Natural Language text to be Posted on Social Networking Services
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
Anycontrol: Create Your Artwork with Versatile Control on text-to-image Generation
AnyFace++: A Unified Framework for Free-Style text-to-Face Synthesis and Manipulation
AnyFace: Free-style text-to-Face Synthesis and Manipulation
AON: Towards Arbitrarily-Oriented text Recognition
Application of autoregressive models to the study of the temporal structure of a handwritten text
Application of Cluster Detection to text and Picture Processing, An
Application of Novel Chaotic Neural Networks to text Classification Based on PCA
Application of Planar Motion Segmentation for Scene text Extraction
Apply Hierarchical-Chain-of-Generation to Complex Attributes text-to-3D Generation
Applying GIS and text Mining Methods to Twitter Data to Explore the Spatiotemporal Patterns of Topics of Interest in Kuwait
Applying the conjugate gradient method for text document categorization
approach for detecting and cleaning of struck-out handwritten text, An
approach for handwritten Chinese text recognition unifying character segmentation and recognition, An
Approach for Recognizing text Labels in Raster Maps, An
approach to extracting the target text line from a document image captured by a pen scanner, An
approach to get overall emotion from comment text towards a certain image uploaded to social network using Latent Semantic Analysis, An
Approximate String Match for Garbled text with Various Accuracies, An
Arabic character recognition system: A statistical approach for recognizing cursive typewritten text
Arabic hand-written text-line extraction
Arabic handwritten texts clusterization based on Feature Relation Graph (FRG)
Arabic ligatures: Analysis and application in text recognition
Arabic text detection in videos using neural and boosting-based approaches: Application to video indexing
Arbitrarily oriented text detection using geodesic distances between corners and skeletons
Arbitrarily Shaped Scene text Detection With a Mask Tightness Text Detector
Arbitrarily Shaped Scene text Detection With a Mask Tightness Text Detector
Arbitrarily shaped scene text detection with dynamic convolution
Arbitrarily-Oriented text Detection in Low Light Natural Scene Images
Arbitrary Shape Scene text Detection With Adaptive Text Region Representation
Arbitrary Shape Scene text Detection With Adaptive Text Region Representation
Arbitrary Shape text Detection using Transformers
Arbitrary Shape text Detection via Boundary Transformer
Arbitrary Shape text Detection via Segmentation with Probability Maps
Arbitrary Style Guidance for Enhanced Diffusion-Based text-to-Image Generation
Arbitrary-Oriented Scene text Detection via Rotation Proposals
Arbitrary-Shape Scene text Detection via Visual-Relational Rectification and Contour Approximation
architecture for handwritten text recognition systems, An
Are 2D-LSTM really dead for offline text recognition?
Are All Combinations Equal? Combining textual and Visual Features with Multiple Space Learning for Text-based Video Retrieval
Are Digraphs Good for Free-text Keystroke Dynamics?
Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection in text
ARES: text-Driven Automatic Realistic Simulator for Autonomous Traffic
ARRPNGAN: text-to-image GAN with attention regularization and region proposal networks
ArtAdapter: text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
ArtGlyphDiffuser: text-driven artistic glyph generation via Style-to-CLIP Projection and Multi-Level Controlled diffusion
ARTIST: Improving the Generation of text-Rich Images with Disentangled Diffusion Models and Large Language Models
Artistic Style Transfer via Fine-Grained text Guidance and Contrastive Semantics Similarity
ART•V: Auto-Regressive text-to-Video Generation with Diffusion Models
ASAYAR: A Dataset for Arabic-Latin Scene text Localization in Highway Traffic Panels
Assessing Affective Dimensions of Play in Psychodynamic Child Psychotherapy via text Analysis
Assessing Image and text Generation with Topological Analysis and Fuzzy Logic
Assessing similarity in handwritten texts
Assessing User Experience of text Readability with Eye Tracking in Virtual Reality
Assistive text Reading from Complex Background for Blind Persons
Associating text and graphics for scientific chart understanding
ASTER: An Attentional Scene text Recognizer with Flexible Rectification
ASTS: A Unified Framework for Arbitrary Shape text Spotting
Asymmetric Cross-Scale Alignment for text-Based Person Search
ATA: Adaptive Transformation Agent for text-Guided Subject-Position Variable Background Inpainting
ATM: Attentional text Matting
AToM: Aligning text-to-Motion Model at Event-Level with GPT-4Vision Reward
ATT3D: Amortized text-to-3D Object Synthesis
Attend, Correct and Focus: A Bidirectional Correct Attention Network for Image-text Matching
Attention Calibration for Disentangled text-to-Image Personalization
Attention Guidance by Cross-Domain Supervision Signals for Scene text Recognition
attention-based row-column encoder-decoder model for text recognition in Japanese historical documents, An
Attention-Bridged Modal Interaction for text-to-Image Generation
Attentionhand: text-driven Controllable Hand Image Generation for 3d Hand Reconstruction in the Wild
AttnGAN: Fine-Grained text to Image Generation with Attentional Generative Adversarial Networks
Attribute-Centric Compositional text-to-Image Generation
Attribute-Centric Cross-Modal Alignment for Weakly Supervised text-Based Person Re-ID
AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis
AttT2M: text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
Audio Visual Segmentation through text Embeddings
Audio-Enhanced text-to-Video Retrieval using Text-Conditioned Feature Alignment
Audio-Enhanced text-to-Video Retrieval using Text-Conditioned Feature Alignment
Authenticating Binary text Documents Using a Localising OMAC Watermark Robust to Printing and Scanning
Automated cartographic text placement
Automated Detection of Adverse Drug Events from Older Patients' Electronic Medical Records Using text Mining
automatic algorithm for text skew estimation in document images using recursive morphological transforms, An
Automatic annotation of unique locations from video and text
Automatic Chinese text Classification Using Character-Based and Word-Based Approach
Automatic Concept Discovery from Parallel text and Visual Corpora
Automatic Detection and Localization of Natural Scene text in Video
Automatic detection and recognition of Korean text in outdoor signboard images
Automatic diacritization of Arabic text using recurrent neural networks
Automatic discrimination of text and non-text natural images
Automatic discrimination of text and non-text natural images
Automatic document classification using text and images
Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks
Automatic extraction of correlation-entropy features for text document analysis directly in run-length compressed domain
Automatic Feature Extraction and text Recognition From Scanned Topographic Maps
Automatic identification and skew estimation of text lines in real scene images
Automatic Identification of text in Digital Video Key Frames
Automatic image-text alignment for large-scale web image indexing and retrieval
Automatic Inpainting Scheme for Video text Detection and Removal
Automatic Labeling for Scene text Database
Automatic news video segmentation and categorization based on closed-captioned text
Automatic performance evaluation for video text detection
Automatic Performance Evaluation Protocol for Video text Detection Algorithms, An
Automatic recognition of printed arabic text using neural network classifier
Automatic Recognition of Printed Farsi texts
Automatic Segmentation of Printed Persian (Farsi) text
Automatic segmentation of the IAM off-line database for handwritten English text
Automatic separation of machine-printed and hand-written text lines
Automatic text area segmentation in natural images
Automatic text Detection and Recognition
Automatic text detection and removal in video sequences
Automatic text Detection and Tracking in Digital Video
Automatic text detection for mobile augmented reality translation
Automatic text Extraction from Arabic Newspapers
Automatic text Extraction from Video for Content-Based Annotation and Retrieval
Automatic text Extraction in Digital Video Based on Motion Analysis
Automatic text Location in Images and Video Frames
Automatic text location in natural scene images
Automatic text location using cluster-based template matching
Automatic text processing
Automatic text segmentation from complex background
Automatic tracing and extraction of text-line and word segments directly in JPEG compressed document images
Automatic writer identification from text line images
Autonomous Document Cleaning: A Generative Approach to Reconstruct Strongly Corrupted Scanned texts
Autonomous text Capturing Robot Using Improved DCT Feature and Text Tracking
Autonomous text Capturing Robot Using Improved DCT Feature and Text Tracking
AutoSplice: A text-prompt Manipulated Image Dataset for Media Forensics
Autostr: Efficient Backbone Search for Scene text Recognition
Auxiliary captioning: Bridging image-text matching and image captioning
AvatarCraft: Transforming text into Neural Human Avatars with Parameterized Shape and Pose Control
AvatarStudio: High-Fidelity and Animatable 3D Avatar Creation from text
Awesome Typography: Statistics-Based text Effects Transfer
Background-Insensitive Scene text Recognition with Text Semantic Segmentation
Background-Insensitive Scene text Recognition with Text Semantic Segmentation
Bag of Embedded Words learning for text retrieval
Bag of features approach for offline text-independent Chinese writer identification
Balancing Optimization Strategies and Practical Goals: An Efficient Scene text Detector
BAMG: text-based Person Re-identification via Bottlenecks Attention and Masked Graph Modeling
Baseline detection of multi-lingual unconstrained handwritten text lines
BATINeT: Background-Aware text to Image Synthesis and Manipulation Network
Bayesian Similarity Model Estimation for Approximate Recognized text Search
Bayesian Super-Resolution of text in Video with a Text-Specific Bimodal Prior
Bayesian Super-Resolution of text in Video with a Text-Specific Bimodal Prior
Bayesian-based method of unconstrained handwritten offline Chinese text line recognition, A
BDNet: A BERT-based dual-path network for text-to-image cross-modal person re-identification
Be Yourself: Bounded Attention for Multi-subject text-to-image Generation
Beatrix: A Self-Learning System for Off-Line Recognition of Handwritten texts
Being Comes from Not-Being: Open-Vocabulary text-to-Motion Generation with Wordless Training
Belief Mining in Persian texts Based on Deep Learning and Users' Opinions
Benchmark for Chinese-English Scene text Image Super-resolution, A
Benchmark for Controllable text-Image-to-Video Generation, A
Benchmarking Robustness to text-Guided Corruptions
better fitness measure of a text-document for a given set of keywords, A
Beyond Coarse-grained Matching in Video-text Retrieval
Beyond One and Two Tower: Cross-Modal Consensus Learning for Image-text Retrieval
Beyond text QA: Multimedia Answer Generation by Harvesting Web Information
Beyond text: Frozen Large Language Models in Visual Signal Comprehension
Beyond verbs: Understanding actions in videos with text
Beyond visual semantics: Exploring the role of scene text in image understanding
Bi-Attention enhanced representation learning for image-text matching
Bi-Directional Image-text Retrieval With Position Attention and Similarity Filtering
Bi-Directional Spatial-Semantic Attention Networks for Image-text Matching
Bi-directional Training for Composed Image Retrieval via text Prompt Learning
Bi-modal Handwritten text Corpus: Baseline Results, A
Bi-modal Handwritten text Recognition (BiHTR) ICPR 2010 Contest Report
Bi-tonal image non-text matter removal with run length and connected component analysis
Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for text Guided Medical Image Segmentation
Bidirectional extraction and recognition of scene text with layout consistency
Bilevel Feature Extraction-Based text Mining for Fault Diagnosis of Railway Systems
Bilingual text Classification
Bilingual, Open World Video text Dataset and Real-Time Video Text Spotting With Contrastive Learning, A
Bilingual, Open World Video text Dataset and Real-Time Video Text Spotting With Contrastive Learning, A
BiLMa: Bidirectional Local-Matching for text-based Person Re-identification
Bimodal beta mixture distribution for enhanced OOD inner-differentiation in multi-class text classification
Binarization and cleanup of handwritten text from carbon copy medical form images
Binarization of low quality text using a Markov random field model
Binarization-Free Clustering Approach to Segment Curved text Lines in Historical Manuscripts, A
Binary text image compression using overlapping rectangular partitioning
Binary text image file preprocessing to account for printer dot gain
Biometric Recognition Based on Free-text Keystroke Dynamics
Biometric recognition using online uppercase handwritten text
Bipartite Graph Coarsening for text Classification Using Graph Neural Networks
BiSeR-LMA: A Bidirectional Semantic Reasoning and Large Model Enhancement Approach for text-Video Cross-Modal Retrieval
BizGen: Advancing Article-level Visual text Rendering for Infographics Generation
Blended Diffusion for text-driven Editing of Natural Images
Blending-NeRF: text-Driven Localized Editing in Neural Radiance Fields
Blind Deblurring of text Images Using a Text-Specific Hybrid Dictionary
Blind Deblurring of text Images Using a Text-Specific Hybrid Dictionary
Blind deblurring text images via Beltrami regularization
blind deconvolution model for scene text detection and recognition in video, A
Blind Source Separation Techniques for Detecting Hidden texts and Textures in Document Images
Blind text images deblurring based on a generative adversarial network
BlobGEN-Vid: Compositional text-to-Video Generation with Blob Video Representations
Block Segmentation and text Extraction in Mixed Text/Image Documents
Block Segmentation and text Extraction in Mixed Text/Image Documents
BLSTM-based handwritten text recognition using Web resources
BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both text and Audio Inputs
Boosting SpLSA for text Classification
Boosting text-To-Image Person Re-Identification With Generative Hard Negative
Boosting Weakly-Supervised Temporal Action Localization with text Information
Boosting-based transductive learning for text detection
Bootstrapping text Recognition from Stop Words
Bordernet: An Efficient Border-attention text Detector
BOTH2Hands: Inferring 3D Hands from Both text Prompts and Body Dynamics
Bottom-Up Scene text Detection with Markov Clustering Networks
Boundary textSpotter: Toward Arbitrary-Shaped Scene Text Spotting
Boundary-Aware Arbitrary-Shaped Scene text Detector With Learnable Embedding Network
Box It to Bind It: Unified Layout Control and Attribute Binding in text-to-Image Diffusion Models
BoxDiff: text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Breaking text-Based CAPTCHA with Sparse Convolutional Neural Networks
Breaking text-based CAPTCHAs with variable word and character orientation
Breaking The Limits of text-conditioned 3D Motion Synthesis with Elaborative Descriptions
BreakingNews: Article Annotation by Image and text Processing
Breathing Life Into Sketches Using text-to-Video Priors
Bridge-GAN: Interpretable Representation Learning for text-to-Image Synthesis
Bridging Different Language Models and Generative Vision Models for text-to-image Generation
Bridging Synthetic and Real Worlds for Pre-training Scene text Detectors
Bridging the Gap Between Audio and text Using Parallel-Attention for User-Defined Keyword Spotting
Bridging the Gap Between End-to-End and Two-Step text Spotting
Bridging Video and text: A Two-Step Polishing Transformer for Video Captioning
Bridging Video-text Retrieval with Multiple Choice Questions
BRsyn-Caps: Chinese text Classification Using Capsule Network Based on Bert and Dependency Syntax
BTS: A Bi-lingual Benchmark for text Segmentation in the Wild
Building compact recognizer with recognition rate maintained for on-line handwritten Japanese text recognition
Building text features for object image classification
BURSTS: A bottom-up approach for robust spotting of texts in scenes
ByTheWay: Boost Your text-to-Video Generation Model to Higher Quality in a Training-free Way
C-CLIP: Contrastive Image-text Encoders to Close the Descriptive-Commentative Gap
C-Net: A Compression-Based Lightweight Network for Machine-Generated text Detection
C4Synth: Cross-Caption Cycle-Consistent text-to-Image Synthesis
Cache-aided cross-modal correlation correction for unsupervised cross-domain text-based person search
CAETFN: Context Adaptively Enhanced Text-Guided Fusion Network for Multimodal Sentiment Analysis
CAMEL: CAusal Motion Enhancement Tailored for Lifting text-Driven Video Editing
Camera based degraded text recognition using grayscale feature
Camera text Recognition based on Perspective Invariants
Camera-based analysis of text and documents: a survey
CAMP: Cross-Modal Adaptive Message Passing for text-Image Retrieval
CamType: assistive text entry using gaze with an off-the-shelf webcam
Can Generative Adversarial Networks Teach Themselves text Segmentation?
Can text-to-Video Generation help Video-Language Alignment?
Canny text Detector: Fast and Robust Scene Text Localization Algorithm
Canny text Detector: Fast and Robust Scene Text Localization Algorithm
Cap4Video: What Can Auxiliary Captions Do for text-Video Retrieval?
Capacity of text Marking Channel
CapsFusion: Rethinking Image-text Data at Scale
Caption text extraction for indexing purposes using a hierarchical region-based image model
Caption text recognition in video frames by MAP matching
cascade detector for text detection in natural scene images, A
Cascaded Segmentation-Detection Networks for text-Based Traffic Sign Detection
Caseg: CLIP-Based Action Segmentation with Learnable text Prompt
CAT-TPT: Class-Agnostic text-based Test-time Prompt Tuning for Vision-Language Models
CatVersion: Concatenating Embeddings for Diffusion-Based text-to-Image Personalization
Causality-Driven Explainable Multimodal Fusion With Visual-text Parallel Computing for Cloth-Changing Pedestrian Re-Identification
CBNet: A Plug-and-Play Network for Segmentation-Based Scene text Detection
CCDPlus: Towards Accurate Character to Character Distillation for text Recognition
CDistNet: Perceiving Multi-domain Character Distance for Robust text Recognition
CDtext: Scene text detector based on context-aware deformable transformer
CE-text: A context-Aware and embedded text detector in natural scene images
CE-text: A context-Aware and embedded text detector in natural scene images
CelebV-text: A Large-Scale Facial Text-Video Dataset
CelebV-text: A Large-Scale Facial Text-Video Dataset
Center textSpotter: A Novel Text Spotter for Autonomous Unmanned Vehicles
CFOR: Character-First Open-Set text Recognition via Context-Free Learning
CGNN: Caption-assisted graph neural network for image-text retrieval
Challenges in Content-Based Image Indexing of Cultural Heritage Collections: Support vector machine active learning with applications to text classification
Character Energy and Link Energy-Based text Extraction in Scene Images
Character extraction in web image for text recognition
Character feature Alignment-based scene text spotter
Character Grounding and Re-identification in Story of Videos and text Descriptions
Character Position-Aware Compression Framework for Screen text Image, A
Character Region Attention for text Spotting
Character Region Awareness for text Detection
Character Segmentation of Handwritten Bangla text by Vertex Characterization of Isothetic Covers
Character Segmenting Techniques for Handwritten text: A Survey
Character-Aware Sampling and Rectification for Scene text Recognition
Character-Level Interaction in Computer-Assisted Transcription of text Images
Character-Level Interaction in Multimodal Computer-Assisted Transcription of text Images
Character-like region verification for extracting text in scene images
Character-Position-Free On-Line Handwritten Japanese text Recognition by Two Segmentation Methods
Character-Stroke Detection for text-Localization and Extraction
Characterization and classification of semantic image-text relations
Characterness: An Indicator of text in the Wild
Chat-edit-3d: Interactive 3d Scene Editing via text Prompts
ChatGen: Automatic text-to-Image Generation From FreeStyle Chatting
ChatTraffic: text-to-Traffic Generation via Diffusion Model
Check, Locate, Rectify: A Training-Free Layout Calibration System for text- to- Image Generation
Chinese Street View text: Large-Scale Chinese Text Reading With Partially Supervised Learning
Chinese Street View text: Large-Scale Chinese Text Reading With Partially Supervised Learning
Chinese text distinction and font identification by recognizing most frequently used characters
Chinese text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
Chinese/Kanji text and Data Processing
Choose What You Need: Disentangled Representation Learning for Scene text Recognition, Removal and Editing
Circle text Expansion as Low-Rank Textures
CiteTracker: Correlating Image and text for Visual Tracking
CKD: Cross-Task Knowledge Distillation for text-to-Image Synthesis
Class dependent feature scaling method using naive Bayes classifier for text datamining
Class-agnostic Object Counting with text-to-image Diffusion Model
Class-Aware Mask-guided feature refinement for scene text recognition
Class-Balanced text to Image Synthesis With Attentive Generative Adversarial Network
Class-dependent projection based method for text categorization
Classification Architecture Based on Connected Components for text Detection in Unconstrained Environments, A
Classification of Machine Printed and Handwritten texts Using Character Block Layout Variance
Classification of Noisy Free-text Prostate Cancer Pathology Reports Using Natural Language Processing
Classification of text Documents
Classification of text documents based on score level fusion approach
Classification with reject option in text categorisation systems
Classifying networked text data with positive and unlabeled examples
CLEval: Character-Level Evaluation for text Detection and Recognition Tasks
CLIP is Almost All You Need: Towards Parameter-Efficient Scene text Retrieval without OCR
CLIP is Also an Efficient Segmenter: A text-Driven Approach for Weakly Supervised Semantic Segmentation
CLIP-Actor: text-Driven Recommendation and Stylization for Animating Human Meshes
CLIP-Driven Fine-Grained text-Image Person Re-Identification
CLIP-Event: Connecting text and Images with Event Structures
CLIP-Forge: Towards Zero-Shot text-to-Shape Generation
CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable text-to-Image Synthesis
CLIP-NeRF: text-and-Image Driven Manipulation of Neural Radiance Fields
CLIP2GAN: Toward Bridging text With the Latent Space of GANs
CLIP2Protect: Protecting Facial Privacy Using text-Guided Makeup via Adversarial Latent Search
Clip2Sam: Enhanced End-to-End text-to-Image Segmentation and Image Diffusion System
CLIPAG: Towards Generator-Free text-to-Image Generation
CLIPDraw++: text-to-Sketch Synthesis with Simple Primitives
CLIPstyler: Image Style Transfer with a Single text Condition
CLIPTER: Looking at the Bigger Picture in Scene text Recognition
CLIPtone: Unsupervised Learning for text-Based Image Tone Adjustment
Cloud of Line Distribution and Random Forest Based text Detection from Natural/Video Scene Images
Clustering-Based Approach to the Separation of text Strings from Mixed Text/Graphics Documents, A
Clustering-Based Approach to the Separation of text Strings from Mixed Text/Graphics Documents, A
CM-Net: Concentric Mask Based Arbitrary-Shaped text Detection
CMA-CLIP: Cross-Modality Attention Clip for text-Image Classification
CMFG: Cross-model Fine-grained Feature Interaction for text-video Retrieval
CMMLoc: Advancing text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
CMPD: Using Cross Memory Network With Pair Discrimination for Image-text Retrieval
CMT-CO: Contrastive Learning with Character Movement Task for Handwritten text Recognition
CNN for text Detection, Convolutional Neural Network
CNN-based text image super-resolution tailored for OCR
CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-text Dataset
coarse-to-fine scene text detection method based on Skeleton-cut detector and Binary-Tree-Search based rectification, A
Code-Mixing and Code-Switching on Social Media text: A Brief Survey
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-text Retrieval
Coding with ASCII: compact, yet text-based 3D content
Cognition Transferring and Decoupling for text-Supervised Egocentric Semantic Segmentation
Cognitive Themes Emerging from Air Photo Interpretation texts Published to 1960
Cogview3: Finer and Faster text-to-image Generation via Relay Diffusion
Collaborative Vision-text Representation Optimizing for Open-vocabulary Segmentation
Color Based Image Segmentation and its Application to text Segmentation, A
Color segmentation for text extraction
Color structure recovering in strong specular text regions
Color text extraction from camera-based images: The impact of the choice of the clustering distance
Color text extraction with selective metric-based clustering
Color text image binarization based on binary texture analysis
Coloring with Words: Guiding Image Colorization Through text-Based Palette Generation
Colour text segmentation in web images based on human perception
colour text/graphics separation based on a graph representation, A
Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents
Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents
combined Convolutional Neural Network and Dynamic Programming approach for text line normalization, A
Combined orientation and skew detection using geometric text-line modeling
Combining Deep and Ad-hoc Solutions to Localize text Lines in Ancient Arabic Document Images
Combining diverse on-line and off-line systems for handwritten text line recognition
Combining diverse systems for handwritten text line recognition
Combining HMM classifiers in a handwritten text recognition system
Combining Statistical Measures to Find Image text Regions
Combining Structure and Parameter Adaptation of HMMs for Printed text Recognition
Combining text and image information in content-based retrieval
Combining text and prosodic analysis for prominent word detection
COME: Clip-OCR and Master ObjEct for text image captioning
Comic text Detection Using Neural Network Approach
COMIM-GAN: Improved text-to-Image Generation via Condition Optimization and Mutual Information Maximization
CoMM: A Coherent Interleaved Image-text Dataset for Multimodal Understanding and Generation
Commercial Quality text: What Does it Take?
Commonsense-Guided Semantic and Relational Consistencies for Image-text Retrieval
comparative study of features for handwritten Bangla text recognition, A
Comparative Study of HMM and BLSTM Segmentation-Free Approaches for the Recognition of Handwritten text-Lines
Comparative Study to Evaluate a text-Independent Speaker Identification Engine for Arabic Speakers Using a CHMM-Based Approach, A
Comparing Data-driven and Phonetic N-gram Systems for text-Independent Speaker Verification
Comparison of Approaches for Automated text Extraction from Scholarly Figures, A
Comparison of clustering methods: A case study of text-independent speaker modeling
Comparison of some thresholding algorithms for text/background segmentation in difficult document images
Comparison of text String Similarity Algorithms for POI Name Harmonisation, A
comparison study on multiple binary-class SVM methods for unilabel text categorization, A
Compass Control: Multi Object Orientation Control for text-to-Image Generation
Compensating for the Incomplete With the Complete: An Efficient Scene text Detector
COMPGS: Unleashing 2D Compositionality for Compositional text-to-3D via Dynamically Optimizing 3D Gaussians
Complementarity-Aware Space Learning for Video-text Retrieval
complete OCR for printed Hindi text in Devanagari script, A
Complete Pyramidal Geometrical Scheme for text Based Image Description and Retrieval, A
Complying with Privacy Legislation: From Legal text to Implementation of Privacy-Aware Location-Based Services
component-tree based method for user-intention guided text extraction, A
Components Regulated Generation of Handwritten Chinese text-lines in Arbitrary Length
Composing Object Relations and Attributes for Image-text Matching
Composing text and Image for Image Retrieval - an Empirical Odyssey
Composite Script Identification and Orientation Detection for Indian text Images
Compositional coding capsule network with k-means routing for text classification
Compositional Image-text Matching and Retrieval by Grounding Entities
Compositional Learning of Image-text Query for Image Retrieval
Compositional Mixture Representations for Vision and text
comprehensive method for multilingual video text detection, localization, and extraction, A
comprehensive neural-based approach for text recognition in videos using natural language processing, A
Comprehensive regional guidance for attention map semantics in text-to-image diffusion models
comprehensive scheme for tattoo text detection, A
Comprehensive Study of Decoder-Only LLMs for text-to-Image Generation, A
comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition, A
Comprehensive Survey of Transformers in text Recognition: Techniques, Challenges, and Future Directions, A
Computational Topology in text Mining
Computer Assisted Transcription for Ancient text Images
Computer Assisted Transcription of Handwritten text Images
Computer Assisted Transcription of text Images: Results on the GERMANA Corpus and Analysis of Improvements Needed for Practical Use
Computer Interpretation of English text and Picture Patterns
Con-text: Text Detection for Fine-Grained Object Classification
Con-text: Text Detection for Fine-Grained Object Classification
Concept decompositions for short text clustering by identifying word communities
Concept Weaver: Enabling Multi-Concept Fusion in text-to-Image Models
ConceptCraft: One-Shot Personalized text-to-Image Generation via Object-Background Disentanglement
ConceptGuard: Continual Personalized text-to-Image Generation with Forgetting and Confusion Mitigation
Concepts-Locations-Emotions: Semantic Analysis and Visualization of Climate Change texts
Conceptual 12M: Pushing Web-Scale Image-text Pre-Training To Recognize Long-Tail Visual Concepts
Conditional Feature Learning Based Transformer for text-Based Person Search
Conditional Image-text Embedding Networks
conditional random field approach for face identification in broadcast news using overlaid text, A
Conditional random field for text segmentation from images with complex background
Conditional text Image Generation with Diffusion Models
Confidence Measures for Error Correction in Interactive Transcription Handwritten text
Configurable text Stamp Identification Tool with Application of Fuzzy Logic
CONFORM: Contrast is All You Need For High-Fidelity text-to-Image Diffusion Models
Connected and Degraded text Recognition Using Hidden Markov Model
Connected Component Level Discrimination of Handwritten and Machine-Printed text Using Eigenfaces
Connecting Consistency Distillation to Score Distillation for text-to-3d Generation
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
Connecting NeRFs, Images, and text
Consensus-aware Visual-semantic Embedding for Image-text Matching
Consistent Partition and Labelling of text Blocks
Consistent3D: Towards Consistent High-Fidelity text-to-3D Generation with Deterministic Sampling Prior
Constructing the Discriminative Kernels Using GMM for text-Independent Speaker Identification
Content and Style Aware Generation of text-Line Images for Handwriting Recognition
Content Based Image and Video Retrieval Using Embedded text
Content-based image retrieval with pachinko allocation model and a combination of colour, texture and text features
Content-Based Query of Image Databases, Inspirations from text Retrieval: Inverted Files, Frequency-based Weights and Relevance Feedback
Content-based query of image databases: Inspirations from text retrieval
Context Driven Text Segmentation and Recognition
Context Perception Parallel Decoder for Scene Text Recognition
Context Supplied by Text or Language
Context-Aware Attention Network for Image-Text Retrieval
Context-Aware Hierarchical Transformer for Fine-Grained Video-Text Retrieval
Context-aware relation enhancement and similarity reasoning for image-text retrieval
Context-Aware Text-Based Binary Image Stylization and Synthesis
Context-based text detection in natural scenes
Context-CIR: Learning from Concepts in Text for Composed Image Retrieval
Context2Rec: Leveraging comment text semantics and sequential features for enhanced recommendation systems
Contextual Text Block Detection Towards Scene Text Understanding
Contextual Text Block Detection Towards Scene Text Understanding
Contextual text/non-text stroke classification in online handwritten notes with conditional random fields
Contextual text/non-text stroke classification in online handwritten notes with conditional random fields
Continual Learning for Cross-Modal Image-text Retrieval Based on Domain-Selective Attention
Continuous approach to segmentation of handwritten text
Contour Restoration of text Components for Recognition in Video/Scene Images
contour-based approach to 3D text labeling on triangulated surfaces, A
Contour-Based Robust Algorithm for text Detection in Color Images, A
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene text Detection
Contra: (con)text (tra)nsformer for Cross-modal Video Retrieval
Contrastive author-aware text clustering
Contrastive Denoising Score for text-Guided Latent Diffusion Image Editing
Contrastive Transformer Learning With Proximity Data Generation for text-Based Person Search
Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos
Contribution to the Discrimination of the Medieval Manuscript texts: Application in the Palaeography
Control4D: Efficient 4D Portrait Editing With text
Controllable Artistic text Style Transfer via Shape-Matching GAN
Controllable Multi-Lingual Multi-Speaker Multi-Style text-to-Speech Synthesis With Multivariate Information Minimization, A
Controllable text-to-3D Generation via Surface-Aligned Gaussian Splatting
Controllable text-to-Image Synthesis for Multi-Modality MR Images
Controllable Video Generation With text-Based Instructions
Controlling Human Shape and Pose in text-to-Image Diffusion Models via Domain Adaptation
Controlnet-xs: Rethinking the Control of text-to-image Diffusion Models as Feedback-control Systems
Convolutional Neural Network Based text Steganalysis
Convolutional Neural Network-Based Chinese text Detection Algorithm via Text Structure Modeling, A
Convolutional Neural Network-Based Chinese text Detection Algorithm via Text Structure Modeling, A
Convolutional Neural Networks for Direct text Deblurring
Convolutional Recurrent Neural Network for the Handwritten text Recognition of Historical Greek Manuscripts, A
Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated texts
CookGAN: Causality Based text-to-Image Synthesis
Cooperative Game Modeling With Weighted Token-Level Alignment for Audio-text Retrieval
COPT: Unsupervised Domain Adaptive Segmentation Using Domain-agnostic text Embeddings
Copyright protection for the electronic distribution of text documents
Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text
Correcting document image warping based on regression of curved text lines
Correlated Topic Modeling for Short texts in Spherical Embedding Spaces
CoSER: Towards Consistent Dense Multiview text-To-Image Generator for 3D Creation
CosmicMan: A text-to-Image Foundation Model for Humans
CoSMo: Content-Style Modulation for Image Retrieval with text Feedback
Cost-Effective Adversarial Attacks against Scene text Recognition
Could scene context be beneficial for scene text detection?
Countering Personalized text-to-Image Generation with Influence Watermarks
Counting Guidance for High Fidelity text-to-Image Synthesis
Coupled Snakelets for Curled text-Line Segmentation from Warped Document Images
Coverless Image Steganography Based on Semantic-Controlled text-to-Image Generation
Cpgan: Content-parsing Generative Adversarial Networks for text-to-image Synthesis
Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene text Spotter
Create Your World: Lifelong text-to-Image Diffusion
Creating generic text summaries
Creation and Analysis of a Corpus of text Rich Indian TV Videos
CRF Based Scheme for Overlapping Multi-colored text Graphics Separation, A
Crime Prediction and Monitoring in Porto, Portugal, Using Machine Learning, Spatial and text Analytics
CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in text-Embedded Images from Russia-Ukraine Conflict
Cross Initialization for Face Personalization of text-to-Image Models
Cross-Dataset Study for text-based 3D Human Motion Retrieval, A
Cross-Domain detection of AI-Generated text: Integrating linguistic richness and lexical pair dispersion via deep learning
Cross-Domain Multi-Modal Few-Shot Object Detection via Rich text
Cross-Lingual text Image Recognition via Multi-Hierarchy Cross-Modal Mimic
Cross-Lingual text Image Recognition via Multi-Task Sequence to Sequence Learning
Cross-Modal Adaptive Dual Association for text-to-Image Person Retrieval
Cross-Modal and Hierarchical Modeling of Video and text
Cross-Modal Contrastive Learning for text-to-Image Generation
Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems
Cross-Modal Dynamic Networks for Video Moment Retrieval With text Query
Cross-Modal Feature Fusion-Based Knowledge Transfer for text-Based Person Search
Cross-modal feature learning and alignment network for text-image person re-identification
Cross-Modal Implicit Relation Reasoning and Aligning for text-to-Image Person Retrieval
Cross-modal independent matching network for image-text retrieval
Cross-modal knowledge learning with scene text for fine-grained image classification
Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-Directional text-Image Matching
Cross-Modal Progressive Perspective Matching Network for Remote Sensing Image-text Retrieval
Cross-modal Scene Graph Matching for Relationship-aware Image-text Retrieval
Cross-Modal Semantic Matching Generative Adversarial Networks for text-to-Image Synthesis
Cross-Modal text Steganography Against Synonym Substitution-Based Text Attack
Cross-Modal text Steganography Against Synonym Substitution-Based Text Attack
Cross-Modal Uncertainty Modeling With Diffusion-Based Refinement for text-Based Person Retrieval
Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and text Guidance
Crossing the lines: making optimal use of context in line-based Handwritten Text Recognition
Crossmodal Translation Based Meta Weight Adaption for Robust Image-text Sentiment Analysis
Crypto-stego System for Securing text and Image Data
CSA: Cross-scale alignment with adaptive semantic aggregation and filter for image-text retrieval
CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image
CT-Net: Arbitrary-Shaped text Detection via Contour Transformer
CTIGEN-CDM: Controlled text-to-Image Generation Using Cropped Diffusion Models
Ctrl-Room: Controllable text-to-3D Room Meshes Generation with Layout Constraints
Curriculum learning for printed text line recognition of ligature-based scripts
Cursive Script, Historical Documents, text Line Segmentation, Script Line, Segmentation, Text Line Extraction
Cursive Script, Historical Documents, text Line Segmentation, Script Line, Segmentation, Text Line Extraction
Cursive stroke sequencing for handwritten text documents recognition
Curved scene text detection via transverse and longitudinal sequence connection
Customization Assistant for text-to-image Generation
Customize-a-video: One-shot Motion Customization of text-to-video Diffusion Models
Customizing 360-Degree Panoramas through text-to-Image Diffusion Models
CustomListener: text-Guided Responsive Interaction for User-Friendly Listening Head Generation
CycleMatch: A cycle-consistent embedding network for image-text matching
DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for text-to-Image Generation
DAE-GAN: Dynamic Aspect-aware GAN for text-to-Image Synthesis
DALL-EVAL: Probing the Reasoning Skills and Social Biases of text-to-Image Generation Models
DART: Disease-aware Image-text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Data Augmentation for Scene text Recognition
data base for arabic handwritten text recognition research, A
Data Embedding in text for a Copier System
Data-Hiding Capacity Improvement for text Watermarking Using Space Coding Method
Database for Arabic Handwritten text Image Recognition and Writer Identification, A
Database for Arabic Printed text Recognition Research
Database for Handwritten text Recognition Research, A
Database for Offline Arabic Handwritten text Recognition, A
dataset for Arabic text detection, tracking and recognition in news videos- AcTiV, A
Dataset to Support Sexist Content Detection in Arabic text, A
Datenerf: Depth-aware text-based Editing of Nerfs
DATID-3D: Diversity-Preserved Domain Adaptation Using text-to-Image Diffusion for 3D Generative Model
DCDM: Diffusion-conditioned-diffusion Model for Scene text Image Super-resolution
De-Diffusion Makes text a Strong Cross-Modal Interface
De-rendering Stylized texts
Debiased Video-text Retrieval via Soft Positive Sample Calibration
Debiasing Framework For Attribute Binding In Diffusion-Based text-To-Image Generation, A
Deblurring text Images via L0-Regularized Intensity and Gradient Prior
Deep Automated text Scoring Model Based on Memory Network
Deep Belief Networks Based Toponym Recognition for Chinese text
Deep BLSTM neural networks for unconstrained continuous handwritten text recognition
Deep Boosting Learning: A Brand-New Cooperative Approach for Image-text Matching
Deep Convolutional Deblurring and Detection Neural Network for Localizing text in Videos, A
Deep correlation for matching images and text
Deep Cross-Modal Projection Learning for Image-text Matching
Deep Direct Regression for Multi-oriented Scene text Detection
Deep feature extraction with tri-channel textual feature map for text classification
Deep Features for text Spotting
Deep Geometric Moments Promote Shape Consistency in text-to-3D Generation
Deep image compression using scene text quality assessment
Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos
deep learning approach to handwritten text recognition in the presence of struck-out text, A
deep learning approach to handwritten text recognition in the presence of struck-out text, A
Deep Learning for Image-to-text Generation: A Technical Overview
Deep Learning in the Domain of Multi-Document text Summarization
Deep Matching Prior Network: Toward Tighter Multi-oriented text Detection
Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection
Deep Neural Network Based 3D Articulatory Movement Prediction Using Both text and Audio Inputs
Deep neural network based hidden Markov model for offline handwritten Chinese text recognition
Deep Neural Network with Attention Model for Scene text Recognition
Deep Relational Reasoning Graph Network for Arbitrary Shape text Detection
Deep Reward Supervisions for Tuning text-to-image Diffusion Models
Deep textSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
DeepErase: Weakly Supervised Ink Artifact Removal in Document text Images
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser
DeepSolo: Let Transformer Decoder with Explicit Points Solo for text Spotting
DeepWriterID: An End-to-End Online text-Independent Writer Identification System
Deformable scene text detection using harmonic features and modified pixel aggregation network
Deformation Robust text Spotting with Geometric Prior
Deformation-Invariant Networks for Handwritten text Recognition
Degraded Gray-Scale text Recognition Using Pseudo-2D Hidden Markov-Models and N-Best Hypotheses
Delaunay triangulation based text detection from multi-view images of natural scene
DeltaEdit: Exploring text-free Training for Text-Driven Image Manipulation
DeltaEdit: Exploring text-free Training for Text-Driven Image Manipulation
Dense Chained Attention Network for Scene text Recognition
Dense prediction for text line segmentation in handwritten document images
Dense text-to-Image Generation with Attention Modulation
density-based approach for text extraction in images, A
Dependability Feature Learning Based on Sample Generation for Unsupervised text-to-Image Person Re-Identification
Dependence Models for Searching text in Document Images
Deriving a Priori Co-occurrence Probability Estimates for Object Recognition from Social Networks and text Processing
Deriving Symbol Dependent Edit Weights for text Correction: The Use of Error Dictionaries
Design and Evaluation of Features That Best Define text in Complex Scene Images
Design and Preliminary Evaluation of a Finger-Mounted Camera and Feedback System to Enable Reading of Printed text for the Blind, The
DesignDiffusion: High-Quality text-to-Design Image Generation with Diffusion Models
Detect Arbitrary-Shaped text via Adaptive Thresholding and Localization Quality Estimation
Detect Visual Spoofing in Unicode-Based text
Detect-and-Guide: Self-regulation of Diffusion Models for Safe text-to-Image Generation via Guideline Token Optimization
Detected text-Based Image Retrieval Approach for Textual Images
Detecting and reading text in natural scenes
Detecting Arbitrarily Oriented text Labels in Early Maps
Detecting dense text in natural images
Detecting Misspelled Words in Turkish text Using Syllable n-gram Frequencies
Detecting moving text in video using temporal information
Detecting natural scenes text via auto image partition, two-stage grouping and two-layer classification
Detecting Oriented text in Natural Images by Linking Segments
Detecting Origin Attribution for text-to-Image Diffusion Models
Detecting Signs of Depression Using Social Media texts Through an Ensemble of Ensemble Classifiers
Detecting Tampered Scene text in the Wild
Detecting text Areas and Decorative Elements in Ancient Manuscripts
Detecting text in Natural Image with Connectionist Text Proposal Network
Detecting text in Natural Image with Connectionist Text Proposal Network
Detecting text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Color Invariance
Detecting text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Text Detection
Detecting text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Text Detection
Detecting text in natural scenes with stroke width transform
Detecting text in Scene and Traffic Guide Panels With Attention Anchor Mechanism
Detecting text in the Wild with Deep Character Embedding Network
Detecting text Lines in Handwritten Documents
Detecting texts of arbitrary orientations in natural images
Detecting Traffic Information From Social Media texts With Deep Learning Approaches
Detecting Video texts Using Spatial-Temporal Wavelet Transform
Detection and Interpretation of text Information in Noisy Video Sequences
Detection and Location of Multicharacter Sequences in Lines of Imaged text
Detection and rectification of arbitrary shaped scene texts by using text keypoints and links
Detection and rectification of arbitrary shaped scene texts by using text keypoints and links
Detection and Segmentation of Antialiased text in Screen Images
Detection Approaches for Table Semantics in text
Detection of Curved text in Video: Quad Tree Based Method
Detection of curved text path based on the fuzzy curve-tracing (FCT) algorithm
Detection of Data Hiding in Binary text Images
Detection of text marks on moving vehicles
Detection of text on road signs from video
Detection of text Region and Segmentation from Natural Scene Images
Detection of text regions from digital engineering drawings
Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for text Clustering
Deterministic Turing Machine for Context Sensitive Translation of Braille Codes to Urdu Text, A
Devanagari and Bangla text Extraction from Natural Scene Images
Devanagari text Recognition: A Transcription Based Formulation
Development and Evaluation of text Localization Techniques Based on Structural Texture Features and Neural Classifiers
Development of a Robust and Compact On-Line Handwritten Japanese text Recognizer for Hand-Held Devices
Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for text-To-Video Generation, The
DF-GAN: A Simple and Effective Baseline for text-to-Image Synthesis
diabetic retinopathy classification method based on image-text contrastive learning, A
Dial: Dense Image-text Alignment for Weakly Supervised Semantic Segmentation
DiCTI: Diffusion-based Clothing Designer via text-guided Input
Dictionary design for text image compression with JBIG2
Dictionary-guided Scene text Recognition
Diff-tracker: text-to-image Diffusion Models are Unsupervised Trackers
DiffAgent: Fast and Accurate text-to-Image API Selection with Large Language Model
DiffBoost: Enhancing Medical Image Segmentation via text-Guided Diffusion Model
Different Approaches to Bilingual text Classification Based on Grammatical Inference Techniques
Differentiable Duration Refinement Using Internal Division for Non-Autoregressive text-to-Speech
Differential-Processing Extraction Approach to text and Image Segmentation, A
Differentiation of alphabets in handwritten texts
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between text and Vision for Zero-Shot Image Captioning
Diffusion for Description or text to Image Generation
Diffusion for Layout Control in text to Image Generation
Diffusion in the Dark: A Diffusion Model for Low-Light text Recognition
Diffusion Models in 3D Synthesis, text to 3D Models
Diffusion Soup: Model Merging for text-to-image Diffusion Models
Diffusion-based Blind text Image Super-Resolution
Diffusion-Enhanced Test-Time Adaptation with text and Image Augmentation
Diffusion-SDF: text-to-Shape via Voxelized Diffusion
DiffusionCLIP: text-Guided Diffusion Models for Robust Image Manipulation
DiffusionGAN3D: Boosting text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Diffusionpen: Towards Controlling the Style of Handwritten text Generation
DiffusionRet: Generative text-Video Retrieval with Diffusion Model
DiffusionSTR: Diffusion Model for Scene text Recognition
Digital image analysis to enhance underwritten text in the Archimedes palimpsest
Digital Ink Recogntion Server for Handwritten Japanese text, A
DINOv2 Meets text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Diphone spanish text-to-speech synthesizer
Direct Regression Scene text Detector With Position-Sensitive Segmentation, A
Direct text to Speech Translation System Using Acoustic Units
Direct Unsupervised text Line Extraction from Colored Historical Manuscript Images Using DCT
DIRECT-3D: Learning Direct text-to-3D Generation on Massive Noisy 3D Data
Direct2.5: Diverse text-to-3D Generation via Multi-view 2.5D Diffusion
Discovering Low-Rank Shared Concept Space for Adapting text Mining Models
Discovering meaningful multimedia patterns with audio-visual concepts and associated text
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-text Retrieval
Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-text Search
Discrete-continuous Action Space Policy Gradient-based Attention for Image-text Matching
Discrimination of machine-printed from handwritten text using simple structural characteristics
Discriminative Class Tokens for text-to-Image Diffusion Models
Discriminative Kernel-Based Approach to Rank Images from text Queries, A
Discriminative Model for On-line Handwritten Japanese text Retrieval, A
Discriminative Probing and Tuning for text-to-Image Generation
discriminative semi-Markov model for robust scene text recognition, A
Disease-Specific Extraction of text from Cardiac Echo Videos for Decision Support
DisenDreamer: Subject-Driven text-to-Image Generation With Sample-Aware Disentangled Tuning
Disentangled Clothed Avatar Generation from text Descriptions
Disentangled Contour Learning for Quadrilateral text Detection
Disentangling Inter- and Intra-Video Relations for Multi-Event Video-text Retrieval and Grounding
Disentangling Subject-Irrelevant Elements in Personalized text-to-Image Diffusion via Filtered Self-Distillation
Dissecting Deep Metric Learning Losses for Image-text Retrieval
Distilling Knowledge of Bidirectional Language Model for Scene text Recognition
Distinction between handwritten and machine-printed text based on the bag of visual words model
Distinguishing between Handwritten and Machine Printed text in Bank Cheque Images
Distinguishing mathematics notation from English text using computational geometry
Distinguishing text/Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks
Distinguishing text/Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks
Distributional semantics of objects in visual scenes in comparison to text
Diverse text-to-3d Synthesis with Augmented Text Embedding
Diverse text-to-3d Synthesis with Augmented Text Embedding
Diversified text-to-image generation via deep mutual information estimation
DiZNet: An end-to-end text detection and recognition algorithm with detail in text zone
DiZNet: An end-to-end text detection and recognition algorithm with detail in text zone
DM-GAN: Dynamic Memory Generative Adversarial Networks for text-To-Image Synthesis
DM-PCL: text-Driven Dual-Modal Prototype Consistency Learning for Weakly-Supervised Few-Shot Part Segmentation
DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for text-to-Image Synthesis
Do text-free Diffusion Models Learn Discriminative Visual Representations?
DOC: text Recognition via Dual Adaptation and Clustering
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for text-rich Document Understanding
Document Analysis System Based on text Line Matching of Multiple OCR Outputs, A
Document dewarping via text-line based optimization
Document filtering for fast approximate string matching of erroneous text
Document Image De-warping Based on Detection of Distorted text Lines
Document Image Dewarping using Robust Estimation of Curled text Lines
Document image ground truth generation from electronic text
Document Mining Based on Semantic Understanding of text
Document Rectification Approach Dealing with Both Perspective Distortion and Warping Based on text Flow Curve Fitting, A
Document segmentation and classification into musical scores and text
Document skew detection/control system for printed document images containing a mixture of pure text lines and non-text portions
Document skew detection/control system for printed document images containing a mixture of pure text lines and non-text portions
Does text attract attention on e-commerce images: A novel saliency prediction dataset and method
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes
Domain Generalization in CLIP via Learning with Diverse text Prompts
Domain-Complementary Prior With Fine-Grained Feedback for Scene text Image Super-Resolution
Don't Forget Me: Accurate Background Recovery for text Removal via Modeling Local-Global Context
Dot text Detection Based on FAST Points
Double supervision for scene text detection and recognition based on BMINet
Doubly Abductive Counterfactual Inference for text-Based Image Editing
Downtown Osaka Scene text Dataset
Dragtext: Rethinking Text Embedding in Point-Based Image Editing
Dream-in-Style: text-to-3D Generation Using Stylized Score Distillation
Dream3D: Zero-Shot text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Dream3D: Zero-Shot text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
DreamArtist: Controllable One-Shot text-to-Image Generation via Positive-Negative Adapter
DreamAvatar: text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
DreamBlend: Advancing Personalized Fine-Tuning of text-to-Image Diffusion Models
DreamBooth3D: Subject-Driven text-to-3D Generation
DreamBooth: Fine Tuning text-to-Image Diffusion Models for Subject-Driven Generation
DreamControl: Control-Based text-to-3D Generation with 3D Self-Prior
Dreamdissector: Learning Disentangled text-to-3d Generation from 2d Diffusion Priors
Dreamdrone: text-to-image Diffusion Models Are Zero-shot Perpetual View Generators
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent text-to-Image Personalization
Dreammesh: Jointly Manipulating and texturing Triangle Meshes for Text-to-3d Generation
DreamPropeller: Supercharge text-to-3D Generation with Parallel Sampling
Dreamreward: text-to-3d Generation with Human Preference
Dreamscene360: Unconstrained text-to-3d Scene Generation with Panoramic Gaussian Splatting
Dreamscene: 3d Gaussian-based text-to-3d Scene Generation via Formation Pattern Sampling
DreamStone: Image as a Stepping Stone for text-Guided 3D Shape Generation
Dreamtext: High Fidelity Scene Text Synthesis
Dreamview: Injecting View-specific text Guidance Into Text-to-3d Generation
Dreamview: Injecting View-specific text Guidance Into Text-to-3d Generation
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable text-to-Image Diffusion Generation
DSTA: Reinforcing Vision-Language Understanding for Scene-text VQA With Dual-Stream Training Approach
DStext V2: A comprehensive video text spotting dataset for dense and small text
DStext V2: A comprehensive video text spotting dataset for dense and small text
DTLLM-VLT: Diverse text Generation for Visual Language Tracking Based on LLM
DU-Net: A Dual U-Net for semantic text-guided style transfer
Dual Adversarial Inference for text-to-Image Synthesis
Dual Alignment Unsupervised Domain Adaptation for Video-text Retrieval
dual branch graphic text detection network based on progressive Domain adaptation, A
Dual Encoding for Video Retrieval by text
Dual Relation Network for Scene text Recognition
Dual Stream Relation Learning Network for Image-text Retrieval
Dual-branch scale disentanglement for text-video retrieval
Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval
Dual-path CNN with Max Gated block for text-based person re-identification
Dual-Path Rare Content Enhancement Network for Image and text Matching
DUET: Detection Utilizing Enhancement for text in Scanned or Captured Documents
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without text Alignment
DVHMM: variable length text recognition error model
Dynamic Attention Analysis for Backdoor Detection in text-to-Image Diffusion Models
Dynamic Contrastive Distillation for Image-text Retrieval
Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End text Spotting
Dynamic Prompt Optimizing for text-to-Image Generation
Dynamic receptive field adaptation for scene text recognition
Dynamic recognition in the omni-writer frame: Application to hand-printed text recognition
Dynamic semantic prototype perception for text-video retrieval
Dynamic sparse and weight allocation-based text-driven person retrieval
Dynamic text Line Segmentation for Real-Time Recognition of Chinese Handwritten Sentences
Dynamic texts From UAV Perspective Natural Images
Dynamic Unilateral Dual Learning for text to Image Synthesis
Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking for Image-text Retrieval
Dynamic Word Based text Compression
DynTypo: Example-Based Dynamic text Effects Transfer
Dysen-VDM: Empowering Dynamics-Aware text-to-Video Diffusion with LLMs
E.T. the Exceptional Trajectories: text-to-Camera-Trajectory Generation with Character Awareness
E2VTS: Energy-Efficient Video text Spotting from Unmanned Aerial Vehicles
E4C: Enhance Editability for text-Based Image Editing by Harnessing Efficient CLIP Guidance
EA-VTR: Event-aware Video-text Retrieval
Eaformer: Scene text Segmentation with Edge-aware Transformers
Early feature stream integration versus decision level combination in a multiple classifier system for text line recognition
Earthquake Information Extraction and Comparison from Different Sources Based on Web text
EAST: An Efficient and Accurate Scene text Detector
ECLIPSE: A Resource-Efficient text-to-Image Prior for Image Generations
EDA: Explicit text-Decoupling and Dense Alignment for 3D Visual Grounding
Edge Approximation text Detector
Edge Based Binarization for Video text Images
Edge guided and Fourier attention-based Dual Interaction Network for scene text erasing
Edge-Based Features for Localization of Artificial Urdu text in Video Images
Edge-based method for text detection from complex document images
Edge-based text localization and character segmentation algorithms for automatic slab information recognition
EdgeRelight360: text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting
Edit Probability for Scene text Recognition
Editing Implicit Assumptions in text-to-Image Diffusion Models
Educational video understanding: Mapping handwritten text to textbook chapters
EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for text-based Person Retrieval
Effect of Improved Path Evaluation for On-line Handwritten Japanese text Recognition
Effective 3D text Recurrent Voting Generator for Metaverse, An
Effective and efficient video text extraction using key text points
Effective and efficient video text extraction using key text points
Effective feature descriptor-based new framework for off-line text-independent writer identification
effective method for text line segmentation in historical document images, An
effective sentence-extraction technique using contextual information and statistical approaches for text summarization, An
Effective shrinkage of large multi-class linear SVM models for text categorization
Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost
Effective Uyghur Language text Detection in Complex Background Images for Traffic Prompt Identification
Effective video text detection using line features
Effectively localize text in natural scene images
effectiveness of T5, GPT-2, and BERT on text-to-image generation task, The
Efficiency investigation of manifold matching for text document classification
Efficient Algorithm for Segmenting Warped text-Lines in Document Images, An
Efficient and Accurate Arbitrary-Shaped text Detection With Pixel Aggregation Network
Efficient and flexible text extraction from document pages
Efficient Automatic text Location Method and Content-Based Indexing and Structuring of Video Database
Efficient Character Skew Rectification in Scene text Images
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and text-to-Image Models
Efficient Exploration of text Regions in Natural Scene Images Using Adaptive Image Sampling
Efficient graph-based dictionary search and its application to text-image searching
Efficient Image-text Retrieval via Keyword-Guided Pre-Screening
Efficient indexing for Query By String text retrieval
Efficient Industrial System for Vehicle Tyre (Tire) Detection and text Recognition Using Deep Learning, An
Efficient label-free pruning and retraining for text-VQA Transformers
Efficient Light Balancing Techniques for text Images in Video Presentation Systems
Efficient Method for Offline text Independent Writer Identification, An
Efficient Method for text Detection in Video Based on Stroke Width Similarity, An
Efficient Multimodal Aggregation Network for Video-text Retrieval, An
Efficient Scene text localization and recognition with local character refinement
Efficient side information encoding for text hardcopy documents
Efficient System for Hazy Scene text Detection using a Deep CNN and Patch-NMS, An
Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
Efficient text Capture Method for Moving Robots Using DCT Feature and Text Tracking, An
Efficient text Capture Method for Moving Robots Using DCT Feature and Text Tracking, An
Efficient text Classification Using Tree-structured Multi-linear Principal Component Analysis
Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm
Efficient text localization in born-digital images by local contrast-based segmentation
Efficient text Segmentation Technique Based on Naive Bayes Classifier, An
Efficient text-based Person Search via Single-stage Identity-guided Attribute Parsing and Alignment
Efficient text-Guided 3D-Aware Generation With Score Distillation on 3D Distribution
Efficient text-to-Image Generation: An Adaptive Step Schedule Controller for Diffusion Models
Efficient Token-Guided Image-text Retrieval With Consistent Multimodal Contrastive Training
Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with text-Image Alignment
Efficient video text detection using edge features
Efficient video text recognition using multiple frame integration
Efficient Visual Search of Videos Cast as text Retrieval
EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognition
EgotextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free text-to-Video Generation
Eigenspace method for text retrieval in historical document images
ELITE: Encoding Visual Concepts into textual Embeddings for Customized Text-to-Image Generation
Elucidating Optimal Reward-Diversity Tradeoffs in text-to-Image Diffusion Models
Embedded Application for Degraded text Recognition, An
Embedded Bernoulli Mixture HMMs for Continuous Handwritten text Recognition
embedded method: Improve the relevance of text and face image with enhanced face attributes, An
Embil: An English-manipuri Bi-lingual Benchmark for Scene text Detection and Language Identification
Emergent Visual-semantic Hierarchies in Image-text Representations
EmoGen: Emotional Image Content Generation with text-to-Image Diffusion Models
EmoLabel: Semi-Automatic Methodology for Emotion Annotation of Social Media text
EmoSphere++: Emotion-Controllable Zero-Shot text-to-Speech Via Emotion-Adaptive Spherical Vector
Emotion Correlation Mining Through Deep Learning Models on Natural Language text
Emotion Recognition in text for 3-D Facial Expression Rendering
EmotionAlBERTo: Emotion Recognition of Italian Social Media texts Through BERT
Empathy Detection From text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods
Empirical Study and Analysis of text-to-image Generation Using Large Language Model-powered Textual Representation, An
Empirical Study of Scaling Law for Scene text Recognition, An
EMU: Effective Multi-Hot Encoding Net for Lightweight Scene text Recognition With a Large Character Set
Encapsulated Composition of text-to-Image and Text-to-Video Models for High-Quality Video Synthesis
Encapsulated Composition of text-to-Image and Text-to-Video Models for High-Quality Video Synthesis
Encoding Video Narration as text
End-to-End Approach for Handwriting Recognition: From Handwritten text Lines to Complete Pages, An
End-to-End Handwritten Paragraph text Recognition Using a Vertical Attention Network
End-to-end interactive joint model: Clause-phrase multi-task learning for suicidal ideation cause extraction (SICE) in Chinese Weibo text
end-to-end model for multi-view scene text recognition, An
End-to-end OCR text Re-organization Sequence Learning for Rich-text Detail Image Comprehension, An
End-to-end OCR text Re-organization Sequence Learning for Rich-text Detail Image Comprehension, An
End-to-End page-Level assessment of handwritten text recognition
End-to-End Pre-Training With Hierarchical Matching and Momentum Contrast for text-Video Retrieval
End-to-end scene text recognition
End-to-end scene text recognition using tree-structured models
End-to-end text recognition with convolutional neural networks
End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene text Recognition, An
End-to-End Video text Detection with Online Tracking
End-to-End Video text Spotting with Transformer
Enforcing similarity constraints with integer programming for better scene text recognition
Enhanced Active Contour Method for Locating text
Enhanced Feature Extraction Framework for Cross-Modal Image-text Retrieval, An
Enhanced Generative Structure Prior for Chinese text Image Super-Resolution
Enhanced Motion-text Alignment for Image-to-Video Transfer Learning
Enhanced Network Embedding with text Information
Enhanced Probabilistic Neural Network Approach Applied to text Classification, An
Enhanced Semantic Similarity Learning Framework for Image-text Matching
Enhanced text Extraction from Arabic Degraded Document Images Using EM Algorithm
Enhancement and feature extraction for images of incised and ink texts
Enhancement of camera captured text images with specular reflection
Enhancement of text images using a context based nonlinear interpolative vector quantization method
Enhancing 3D Fidelity of text-to-3D using Cross-View Correspondences
Enhancing Diffusion Models with text-encoder Reinforcement Learning
Enhancing energy minimization framework for scene text recognition with top-down cues
Enhancing fine-detail image synthesis from text descriptions by text aggregation and connection fusion module
Enhancing fine-detail image synthesis from text descriptions by text aggregation and connection fusion module
Enhancing Handwritten text Recognition with N-gram sequence decomposition and Multitask Learning
Enhancing knowledge distillation for semantic segmentation through text-assisted modular plugins
Enhancing Micro Gesture Recognition for Emotion Understanding via Context-Aware Visual-Text Contrastive Learning
Enhancing Scene text Detection via Fused Semantic Segmentation Network with Attention
Enhancing Scene text Detectors with Realistic Text Image Synthesis Using Diffusion Models
Enhancing Scene text Detectors with Realistic Text Image Synthesis Using Diffusion Models
Enhancing Semantic Fidelity in text-to-image Synthesis: Attention Regulation in Diffusion Models
Enhancing Tampered text Detection Through Frequency Feature Fusion and Decomposition
Enhancing text-Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement
Enhancing text-like edges in digital images
Enhancing text-Video Retrieval Performance With Low-Salient but Discriminative Objects
Enhancing the Video Editing Capabilities of text-to-Video Generators Using DDPM Inversion
Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided text Prompts
Enriching Video Captions With Contextual Text
Ensemble Methods to Improve the Performance of an English Handwritten text Line Recognizer
Episodic Learning Network for text Detection on Human Bodies in Sports Images, An
ER-Chat: A text-to-Text Open-Domain Dialogue Framework for Emotion Regulation
ER-Chat: A text-to-Text Open-Domain Dialogue Framework for Emotion Regulation
EraseNet: End-to-End text Removal in the Wild
Erasing Scene text with Weak Supervision
ERNIE-ViLG 2.0: Improving text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
Erp-Aware text-To-360 Panorama Diffusion Model
ESA: External Space Attention Aggregation for Image-text Retrieval
Escaping Plato's Cave: Towards the Alignment of 3D and text Latent Spaces
ESIR: End-To-End Scene text Recognition via Iterative Image Rectification
Estate: Expert-Guided State text Enhancement for Zero-Shot Industrial Anomaly Detection
EStextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Estimating the Orientation and Recovery of text Planes in a Single Image
Estimating the readability of handwritten text: A Support Vector Regression based approach
Estimating the Semantics via Sector Embedding for Image-text Retrieval
Estimating urban flooding depth by integrating multimodal image-text data: A segment-level direct preference optimization-based multimodal large language model
Estimation of Skew Angle in text-Image Analysis by SLIDE: Subspace-Based Line Detection
Evaluating a Hidden Markov Model of Syntax in a text Recognition System
Evaluating and Improving Compositional text-to-Visual Generation
Evaluating Data Attribution for text-to-Image Models
Evaluating OCR and Non-OCR text Representations for Learning Document Classifiers
Evaluating text-to-Image Matching using Binary Image Selection (BISON)
Evaluating text-to-Video Alignment: A Hierarchical Benchmark for Video Generation Models
Evaluating text-to-visual Generation with Image-to-text Generation
Evaluating text-to-visual Generation with Image-to-text Generation
Evaluation of HMM-Based Techniques for the Recognition of Screen Rendered text, An
Evaluation of Model-Based Retrieval Effectiveness with OCR text
Evaluation of neural network language models in handwritten Chinese text recognition
Evaluation of the Concatenative Turkish text-to-Speech System
Evaluation of the Optimal Topic Classification for Social Media Data Combined with text Semantics: A Case Study of Public Opinion Analysis Related to COVID-19 with Microblogs
Event-Guided Procedure Planning from Instructional Videos with text Supervision
evidence-based model of saliency feature extraction for scene text analysis, An
Evolution Maps for Connected Components in text Documents
example-based prior model for text image super-resolution, An
Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-text Multimodal Classification
Experimental Evaluation of OCR text Representations for Learning Document Classifiers, An
Experimental Investigation of text-Based CAPTCHA Attacks and Their Robustness, An
Experimental Study of Pruning Techniques in Handwritten text Recognition Systems, An
Experimental System for Office Document Handling and text Recognition, An
Experiments in text Recognition with Binary N-Gram and Viterbi Algorithms
Experiments in text Recognition with the Modified Viterbi Algorithm
Experiments in the Recognition of Handprinted text: Part I Character Recognition
Explain2Attack: text Adversarial Attacks via Cross-Domain Interpretability
Explaining Semantic text Similarity in Knowledge Graphs
Explicitly-Decoupled text Transfer With Minimized Background Reconstruction for Scene Text Editing
Explicitly-Decoupled text Transfer With Minimized Background Reconstruction for Scene Text Editing
Exploiting Color Information for Better Scene text Recognition
Exploiting colour information for better scene text detection and recognition
Exploiting Unlabeled Videos for Video-text Retrieval via Pseudo-Supervised Learning
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-text Consistency and Domain Distribution Gap
Exploring Effective Interactive text-Based Video Search in vitrivr
Exploring Fine-Grained Visual-text Feature Alignment With Prompt Tuning for Domain-Adaptive Object Detection
Exploring Global and Local Linguistic Representations for text-to-Image Synthesis
Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image Retrieval
Exploring Phrase-level Grounding with text-to-image Diffusion Model
Exploring Pre-trained text-to-video Diffusion Models for Referring Video Object Segmentation
Exploring Sparse MoE in GANs for text-conditioned Image Synthesis
Exploring Sparse Spatial Relation in Graph Inference for text-Based VQA
Exploring text representation impact on K-means based arabic text documents clustering
Exploring text representation impact on K-means based arabic text documents clustering
Exploring text-to-Motion Generation with Human Preference
Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene text Detection
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for text-to-Image Synthesis
Exploring The Potential of Vision-Language Models for Pure-Image and text-Guided-Image Saliency Prediction
Exploring the Spatial Distribution Characteristics of Emotions of Weibo Users in Wuhan Waterfront Based on Gender Differences Using Social Media texts
Exploring the Spatiotemporal Patterns of Residents' Daily Activities Using text-Based Social Media Data: A Case Study of Beijing, China
Exposing fake images generated by text-to-image diffusion models
Expressive Image Generation and Editing with Rich text
Expressive text-to-Image Generation with Rich Text
Expressive text-to-Image Generation with Rich Text
Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions
Expressive Visual text-to-Speech Using Active Appearance Models
Extended Bi-gram Features in text Categorization
Extended character defect model for recognition of text from maps
Extending TrOCR for text Localization-Free OCR of Full-Page Scanned Receipt Images
External Word Segmentation of Off-Line Handwritten text Lines
Extracting Curved text Lines Using Local Linearity of Text Line
Extracting Curved text Lines Using Local Linearity of Text Line
Extracting Spatio-Temporal Information from Chinese Archaeological Site text
Extracting text From Greyscale Images
Extracting text from WWW Images
Extraction and Recognition of Bangla texts from Natural Scene Images Using Cnn
Extraction of Handwritten text from Carbon Copy Medical Form Images
Extraction of line-word-character segments directly from run-length compressed printed text-documents
Extraction of Lines of texts in Unconstrained Handwritten Documents
Extraction of Nom text Regions from Stele Images Using Area Voronoi Diagram
Extraction of Pluvial Flood Relevant Volunteered Geographic Information (VGI) by Deep Learning from User Generated texts and Photos
Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed text-Documents
Extraction of special effects caption text events from digital video
Extraction of Spelling Variations from Language Structure for Noisy text Correction
Extraction of text boxes from Engineering Drawings
Extraction of text Lines and Text Blocks on Document Images Based on Statistical Modeling
Extraction of text Lines and Text Blocks on Document Images Based on Statistical Modeling
Extraction of text Words in Document Images Based on a Statistical Characterization
Extraction Of Thematically Relevant text From Images
extractive text summarization technique for Bengali document(s) using K-means clustering algorithm, An
Extractive text Summarization Using Topological Features
Extrapolate azimuth angles: text and edge guided ISAR image generation based on foundation model
Extremely Low-Light Image Enhancement with Scene text Restoration
Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-text Transformation
FA-GAN: Feature-Aware GAN for text to Image Synthesis
Face typing: Vision-based perceptual interface for hands-free text entry with a scrollable virtual keyboard
FaceCLIP: Facial Image-to-Video Translation via a Brief text Description
FaceCLIPNeRF: text-driven 3D Face Manipulation using Deformable Neural Radiance Fields
Faces a la Carte: text-to-Face Generation via Attribute Disentanglement
Faces that Speak: Jointly Synthesising Talking Face and Speech from text
Facial Action Unit Recognition Enhanced by text Descriptions of FACS
Facsimile device with skew correction and text line direction detection
Factorizing text-to-video Generation by Explicit Image Conditioning
Factors in Emotion Recognition With Deep Learning Models Using Speech and text on Multiple Corpora
FakeInversion: Learning to Detect Images from Unseen text-to-Image Models by Inverting Stable Diffusion
Fantasia3D: Disentangling Geometry and Appearance for High-quality text-to-3D Content Creation
FARNet: Fragmented affinity reasoning network of text instances for arbitrary shape text detection
FARNet: Fragmented affinity reasoning network of text instances for arbitrary shape text detection
Fashion Image Retrieval with text Feedback by Additive Attention Compositional Learning
Fast and accurate scene text understanding with image binarization and off-the-shelf OCR
Fast and Accurate text Detection in Natural Scene Images with User-Intention
Fast and effective text detection
fast and efficient method for extracting text paragraphs and graphics from unconstrained documents, A
Fast and Efficient text Steganalysis Method, A
Fast and Flexible Statistical Method for text Extraction in Document Pages, A
Fast and memory efficient text image compression with JBIG2
Fast and robust text detection in images and video frames
fast and robust text spotter, A
Fast and simple text replacement algorithm for text-based augmented reality
Fast and simple text replacement algorithm for text-based augmented reality
Fast Appearance-Based Full-text Search Method for Historical Newspaper Images, A
Fast Approximate Modelling of the Next Combination Result for Stopping the text Recognition in a Video
Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on text-Block Recognition for Screen Content Coding
fast hierarchical method for multi-script and arbitrary oriented scene text extraction, A
Fast Lexicon-Based Scene text Recognition with Sparse Belief Propagation
fast multiresolution text line and non text-line structures extraction and discrimination scheme for document image analysis, A
fast multiresolution text line and non text-line structures extraction and discrimination scheme for document image analysis, A
Fast online incremental approach of unseen place classification using disjoint-text attribute prediction
Fast perspective recovery of text in natural scenes
Fast scene text localization by learning-based filtering and verification
Fast Selection of Small and Precise Candidate Sets from Dictionaries for text Correction Tasks
Fast Supervised Topic Models for Short text Emotion Detection
Fast text categorization using concise semantic analysis
Fast text line detection by finding linear connected components on Canny edge image
Fast text line extraction in document images
Fast text/graphics resolution improvement using wavelet based denoising and chain-code table lookup
Fast Uyghur text detection in videos based on learning of baseline feature
Fast Uyghur text Detector for Complex Background Images, A
Fast(er) Reconstruction of Shredded text Documents via Self-Supervised Deep Asymmetric Metric Learning
Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-text Retrieval
FAST: Facilitated and Accurate Scene text Proposals through FCN Guided Pruning
FastCLIPstyler: Optimisation-free text-based Image Style Transfer Using Style Representations
FastEdit: fast text-guided single-image editing via semantic-aware diffusion fine-tuning
FASTER: A Font-Agnostic Scene text Editing and Rendering Framework
FAStext: Efficient Unconstrained Scene Text Detector
FastFaceCLIP: A lightweight text-driven high-quality face image manipulation
FastVideoEdit: Leveraging Consistency Models for Efficient text-to-Video Editing
FateZero: Fusing Attentions for Zero-shot text-based Video Editing
FC-Render: Adaptive Font- and Color-Aware text Diffusion Model
FDS: Frequency-Aware Denoising Score for text-Guided Latent Diffusion Image Editing
Feature Embedding Based text Instance Grouping for Largely Spaced and Occluded Text Detection
Feature Embedding Based text Instance Grouping for Largely Spaced and Occluded Text Detection
Feature extracted from wavelet decomposition using biorthogonal Riesz basis for text-independent speaker recognition
Feature extracted from wavelet eigenfunction estimation for text-independent speaker recognition
Feature First: Advancing Image-text Retrieval Through Improved Visual Features
Feature Fusion Network for Scene text Detection
Feature Representations for Scene text Character Recognition: A Comparative Study
Feature selection for event extraction in biomedical text
Feature selection to recognize text from palm leaf manuscripts
Feature selection using hybrid poor and rich optimization algorithm for text classification
Feature subset selection using naive Bayes for text classification
Feature Weight Optimization and Pruning in Historical text Recognition
FedSH: Towards Privacy-Preserving text-Based Person Re-Identification
FeedEdit: text-Based Image Editing with Dynamic Feedback Regulation
FERGI: Automatic Scoring of User Preferences for text-to-Image Generation from Spontaneous Facial Expression Reaction
FETNet: Feature erasing and transferring network for scene text removal
Few Could Be Better Than All: Feature Sampling and Grouping for Scene text Detection
Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
Few-shot Hierarchical text Classification with Bidirectional Path Constraint by label weighting
Few-Shot text Style Transfer via Deep Feature Similarity
Fg-T2M++: LLMs-Augmented Fine-Grained text Driven Human Motion Generation
Fg-T2M: Fine-Grained text-Driven Human Motion Generation via Diffusion Model
FHT: An Unconstraint Farsi Handwritten text Database
Find More Accurate text Boundary for Scene Text Detection
Find More Accurate text Boundary for Scene Text Detection
Find text in Documents
Find text in Video Scenes
Finding Hidden Semantics of text Tables
Finding structure in noisy text: Topic classification and unsupervised clustering
Finding text In Images
Finding text in Natural Scenes by Figure-Ground Segmentation
Finding text Regions using Localised Statistical Measures
Fine-Grained Erasure in text-To-Image Diffusion-Based Foundation Models
Fine-Grained Image-text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Fine-grained Image-text Matching by Cross-modal Hard Aligning Network
Fine-grained Image-text Retrieval via Complementary Feature Learning
Fine-Grained Image-text Retrieval via Discriminative Latent Space Learning
Fine-grained semantic oriented embedding set alignment for text-based person search
Fine-Grained Video-text Retrieval With Hierarchical Graph Reasoning
Fine-Grained Visual text Prompting
Fine-Granularity Alignment for text-Based Person Retrieval Via Semantics-Centric Visual Division
Fine-Tuning text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation
FineControlNet: Fine-level text Control for Image Generation with Spatially Aligned Text Control Injection
FineControlNet: Fine-level text Control for Image Generation with Spatially Aligned Text Control Injection
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer text Inputs
Finematch: Aspect-based Fine-grained Image and text Mismatch Detection and Correction
Fisher Linear Discriminant Analysis for text-image combination in multimedia information retrieval
FlashEval: Towards Fast and Accurate Evaluation of text-to-Image Diffusion Generative Models
Flexible text Recovery from Degraded Typewritten Historical Documents
Flick Typing: A New VR text Input System Based on Space Gestures
FlipSketch: Flipping Static Drawings to text-Guided Sketch Animations
Focal text: an Accurate Text Detection with Focal Loss
Focal text: an Accurate Text Detection with Focal Loss
Focal Visual-text Attention for Memex Question Answering
Focal Visual-text Attention for Visual Question Answering
Focus Entirety and Perceive Environment for Arbitrary-Shaped text Detection
Focus on Scene text Using Deep Reinforcement Learning
Focus-N-Fix: Region-Aware Fine-Tuning for text-to-Image Generation
FocusCLIP: Focusing on Anomaly Regions by Visual-text Discrepancies
Focusing Attention: Towards Accurate text Recognition in Natural Images
Font Recognition and Contextual Processing for More Accurate Text Recognition
Font Watermarking Network for text Images
Fontender: Interactive Japanese text Design with Dynamic Font Fusion Method for Comics
Food3D: text-Driven Customizable 3D Food Generation With Gaussian Splatting
Foreground and background separated image style transfer with a single text condition
Foreground and text-lines Aware Document Image Rectification
Foreground text Extraction in Color Document Images for Enhanced Readability
Foreground text segmentation in complex color document images using Gabor filters
Forged text detection in video, scene, and document images
Forget-Me-Not: Learning to Forget in text-to-Image Diffusion Models
Formal Distance vs. Association Strength in text Processing
Formalization of On-Line Handwritten Japanese text Recognition Free from Line Direction Constraint, A
FOTS: Fast Oriented text Spotting with a Unified Network
Fourier Contour Embedding for Arbitrary-Shaped text Detection
Fractals Based Multi-Oriented text Detection System for Recognition in Mobile Video Images
Fractional poisson enhancement model for text detection and recognition in video frames
Framework for Detecting and Selecting text Line Candidates of Correct Orientation, A
Framework for Performance Evaluation of Face, text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol
Free-editor: Zero-shot text-driven 3d Scene Editing
Free-text keystroke dynamics authentication for Arabic language
FREE: A Fast and Robust End-to-End Video text Spotter
FreeControl: Training-Free Spatial Control of Any text-to-Image Diffusion Model with Any Condition
Freemotion: A Unified Framework for Number-Free text-to-Motion Synthesis
Frequency-selective countnet: Enhancing text-guided object counting with frequency features
Frequent Itemsets Methods for text Clustering
Fringe Map Based text Line Segmentation of Printed Telugu Document Images
From External to Internal: Structuring Image for text-to-Image Attributes Manipulation
From senses to texts: An all-in-one graph-based approach for measuring semantic similarity
From text Detection to Text Segmentation: A Unified Evaluation Scheme
From text Detection to Text Segmentation: A Unified Evaluation Scheme
From text to Speech: A Multimodal Cross-Domain Approach for Deception Detection
From text to Video: Exploiting Mid-Level Semantics for Large-Scale Video Classification
From Two to One: A New Scene text Recognizer with Visual Language Modeling Network
From Words to Structured Visuals: A Benchmark and Framework for text-to-Diagram Generation and Editing
FT2TF: First-Person Statement text-to-Talking Face Generation
Full-text Access to Historical Newspapers
Full-text Search System for Images of Hand-Written Cursive Documents, A
Fully convolutional network with dilated convolutions for handwritten text line segmentation
Fully convolutional recurrent network for handwritten Chinese text recognition
Fully Shareable Scene text Recognition Modeling for Horizontal and Vertical Writing
Fundamental Visual Concept Learning From Correlated Images and text
Furniture-geek: Understanding fine-grained furniture attributes from freely associated text and tags
Further explorations in text alignment with handwritten documents
Further reduced form of wavelet feature for text independent speaker recognition
Fused text Segmentation Networks for Multi-oriented Scene Text Detection
Fused text Segmentation Networks for Multi-oriented Scene Text Detection
Fusion Encoder with Multi-Task Guidance for Cross-Modal text-Image Retrieval in Remote Sensing, A
Fusion of Speech, Faces and text for Person Identification in TV Broadcast
Fusion Strategy for the Single Shot text Detector, A
fuzzy find matching tool for image text analysis, A
Fuzzy Inference-Based Models for Extractive text Summarization
Fuzzy Semantics for Arbitrary-Shaped Scene text Detection
Fuzzy text/non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector
Fuzzy text/non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene text Detection and Recognition
Gabor filter based block energy analysis for text extraction from digital document images
GADNet: Improving image-text matching via graph-based aggregation and disentanglement
GALIP: Generative Adversarial CLIPs for text-to-Image Synthesis
Gamma correction acceleration for real-time text extraction from complex colored images
GAN-TSTEGA: text Steganography Based on Generative Adversarial Networks
GANFusion: Feed-Forward text-to-3D with Diffusion in GAN Space
Garmentaligner: text-to-garment Generation via Retrieval-augmented Multi-level Corrections
Gated Cross Word-visual Attention-driven Generative Adversarial Networks for text-to-image Synthesis
Gatha: Relational Loss for enhancing text-based style transfer
Gaussctrl: Multi-view Consistent text-driven 3d Gaussian Splatting Editing
Gaussian Constrained Attention Network for Scene text Recognition
Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images
Gaussian Mixture Modeling of Neighbor Characters for Multilingual text Extraction in Images
GaussianDreamer: Fast Generation from text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
GaussianEditor: Editing 3D Gaussians Delicately with text Instructions
Gaussians-to-Life: text-Driven Animation of 3D Gaussian Splatting Scenes
GCNs-Based Context-Aware Short Text Similarity Model
Gender Bias in text-to-Video Generation Models: A Case Study of Sora
General and domain-specific techniques for detecting and recognizing superimposed text in video
general approach for multi-oriented text line extraction of handwritten documents, A
Generalized Interpolative Vector Quantization Method for Jointly Optimal Quantization, Interpolation, and Binarization of text Images, A
Generalizing Edit Distance to Incorporate Domain Information: Handwritten text Recognition as a Case-Study
Generalizing to Unseen Domains via text-guided Augmentation: A Training-free Approach
Generatect: text-conditional Generation of 3d Chest CT Volumes
Generating Diverse and Natural 3D Human Motions from text
Generating Holistic 3D Scene Abstractions for text-Based Image Retrieval
Generating Human Interaction Motions in Scenes with text Control
Generating Human Motion in 3D Scenes from text Descriptions
Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised text Latent Space
Generative Adversarial Approach for Zero-Shot Learning from Noisy texts, A
Generative Adversarial Network for text-to-Face Synthesis and Manipulation with Pretrained BERT Model
Generative Adversarial Networks Based on Dynamic Word-Level Update for text-to-Image Synthesis
Generative and Discriminative Fuzzy Restricted Boltzmann Machine Learning for text and Image Classification
Generative Image Steganography Based on text-to-Image Multimodal Generative Model
Generative Negative text Replay for Continual Vision-Language Pretraining
Generative Photography: Scene-Consistent Camera Control for Realistic text-to-Image Synthesis
Generative text Convolutional Neural Network for Hierarchical Document Representation Learning
generic method for determining the up/down orientation of text in Roman and non-Roman scripts, A
generic method for determining up/down orientation of text in Roman and non-Roman scripts, A
GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated text Corpora
Geometry Normalization Networks for Accurate Scene text Detection
Geometry-Aware Scene text Detection with Instance Transformation Network
Geospatial Semantics Analysis of the Qinghai-Tibetan Plateau Based on Microblog Short texts
Geotagging text Content With Language Models and Feature Mining
Getting it Right: Improving Spatial Consistency in text-to-image Models
GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-text Contexts
GLASS: Global to Local Attention for Scene-text Spotting
GLIGEN: Open-Set Grounded text-to-Image Generation
Global-aware Fragment Representation Aggregation Network for image-text retrieval
Global-local prompts guided image-text embedding, alignment and aggregation for multi-label zero-shot learning
Global-Shared text Representation Based Multi-Stage Fusion Transformer Network for Multi-Modal Dense Video Captioning
Glyph-BYT5: A Customized text Encoder for Accurate Visual Text Rendering
Glyph-BYT5: A Customized text Encoder for Accurate Visual Text Rendering
GlyphMastero: A Glyph Encoder for High-Fidelity Scene text Editing
goal-oriented verification-based approach for target text line extraction from a document image captured by a pen scanner, A
Good Seed Makes a Good Crop: Discovering Secret Seeds in text-to-Image Diffusion Models
GPT-4V(ision) is a Human-Aligned Evaluator for text-to-3D Generation
GPT-Based text-to-SQL for Spatial Databases
GPT4Motion: Scripting Physical Motions in text-to-Video Generation via Blender-Oriented GPT Planning
GradBias: Unveiling Word Influence on Bias in text-to-Image Generative Models
Gradient Difference Based Technique for Video text Detection, A
Gradient Vector Flow and Grouping-Based Method for Arbitrarily Oriented Scene text Detection in Video Images
Gradient-based approach to offline text-independent Persian writer identification
GraDual: Graph-based Dual-modal Representation for Image-text Matching
Granularity-Aware Single-Point Scene text Spotting With Sequential Recurrence Self-Attention
Graph based method for Arabic text summarization
Graph Clustering-Based Ensemble Method for Handwritten text Line Segmentation
Graph Structured Network for Image-text Matching
Graph-based Method to Remove Interferential Curve From text Image, A
Graph-Based Segmentation and Feature-extraction Framework for Arabic text Recognition, A
Graph-Based text Segmentation Using a Selected Channel Image
Graph-empowered text-to-SQL generation on Electronic Medical Records
Graphical Figure Classification Using Data Fusion for Integrating text and Image Features
Graphics and Scene text Classification in Video
Graphological Analysis of Handwritten text Documents for Human Resources Recruitment
Grid Diffusion Models for text-to-Video Generation
Grit: A Generative Region-to-text Transformer for Object Understanding
Grounded Image text Matching with Mismatched Relation Reasoning
Grounded text-to-Image Synthesis with Attention Refocusing
Grounding Visual Representations with texts for Domain Generalization
Grouping text lines in freeform handwritten notes
Grouping Using Factor Graphs: An Approach for Finding text with a Camera Phone
GroupViT: Semantic Segmentation Emerges from text Supervision
GSAM+Cutie: text-Promptable Tool Mask Annotation for Endoscopic Video
Guided text Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments
Guiding Prototype Networks with label semantics for few-shot text classification
Gvgen: text-to-3d Generation with Volumetric Representation
HACG: Leveraging Hierarchical Alignment and Caption Generation for text-Video Retrieval
HairCLIP: Design Your Hair by text and Reference Image
Hallucination Elimination and text Annotation Framework for Large Vision-Language Models in Traffic Scenarios
HAM: Hidden Anchor Mechanism for Scene text Detection
Hand-Gesture Based text Input for Wearable Computers
Hand-written text recognition based on a new formulation
HanDiffuser: text-to-Image Generation with Realistic Hand Appearances
Handwriting Recognition: Tablet PC text Input
handwritten ancient text detector based on improved feature pyramid network, A
Handwritten and Machine Printed text Separation in Document Images Using the Bag of Visual Words Paradigm
Handwritten and Printed text Segmentation: A Signature Case Study
Handwritten and Printed text Separation: Linearity and Regularity Assessment
Handwritten and Typewritten text Identification and Recognition Using Hidden Markov Models
Handwritten Arabic text recognition using Deep Belief Networks
Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs
Handwritten Chinese text line segmentation by clustering with distance metric learning
Handwritten Chinese text Recognition by Integrating Multiple Contexts
Handwritten Chinese/Japanese text Recognition Using Semi-Markov Conditional Random Fields
Handwritten document image segmentation into text lines and words
Handwritten Signature and text based User Verification using Smartwatch
Handwritten text Generation from Visual Archetypes
Handwritten text Generation via Disentangled Representations
Handwritten text Line Identification in Indian Scripts
Handwritten text Line Segmentation by Shredding Text into its Lines
Handwritten text Line Segmentation by Shredding Text into its Lines
Handwritten text Localization in Skewed Documents
Handwritten text Recognition for Marriage Register Books
Handwritten text recognition through writer adaptation
Handwritten text Retrieval Using Two-Stage Pattern Matching with Handwritten Query
Handwritten text segmentation using average longest path algorithm
Handwritten text Segmentation Using Elastic Shape Analysis
Handwritten text Separation from Annotated Machine Printed Documents Using Markov Random Fields
Handwritten texts for Personality Identification Using Convolutional Neural Networks
Harivo: Harnessing text-to-image Models for Video Generation
Harnessing text Insights With Visual Alignment for Medical Image Segmentation
Harnessing text-to-image Diffusion Models for Category-agnostic Pose Estimation
Harnessing the Power of MLLMs for Transferable text-to-Image Person ReID
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing text Spotting Performance
Harnessing the Power of text-image Contrastive Models for Automatic Detection of Online Misinformation
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity text-to-Image Synthesis
HD-Fusion: Detailed text-to-3D Generation Leveraging Multiple Noise Estimation
Head-Mounted Device for Recognizing text in Natural Scenes, A
HeadEvolver: text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
Headstudio: text to Animatable Head Avatars with 3d Gaussian Splatting
Heterogeneous Graph to Abstract Syntax Tree Framework for text-to-SQL, A
HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse text-to-3D Generation
HFENet: Hybrid Feature Enhancement Network for Detecting texts in Scenes and Traffic Panels
HGAN: Hierarchical Graph Alignment Network for Image-text Retrieval
HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene text Detection
Hi-SAM: Marrying Segment Anything Model for Hierarchical text Segmentation
Hidden Bawls, Whispers, and Yelps: Can text Convey the Sound of Speech, Beyond Words?
Hidden Markov Model for Language Syntax in text Recognition, A
Hidden Markov Model-Based Ensemble Methods for Offline Handwritten text Line Recognition
Hierarchical Adaptive Filtering Network for text Image Specular Highlight Removal
Hierarchical Feature Aggregation Based on Transformer for Image-text Matching
Hierarchical online NMF for detecting and tracking topic hierarchies in a text stream
Hierarchical Shape Primitive Features for Online text-independent Writer Identification
Hierarchical Spatio-temporal Decoupling for text-to-Video Generation
Hierarchical text Spotter for Joint Text Spotting and Layout Analysis
Hierarchical text Spotter for Joint Text Spotting and Layout Analysis
Hierarchically-Fused Generative Adversarial Network for text to Realistic Image Synthesis
HierCode: A lightweight hierarchical codebook for zero-shot Chinese text recognition
HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic text Classification
high-capacity text watermarking method based on geometric micro-distortion, A
High-Dimensional Access Method for Approximated Similarity Search in text Mining, A
Highly Transparent and Secure Scheme for Concealing text Within Audio
Histogram-Based Two-Stage Adaptive Character Segmentation for Transcription of Inter-Point Hindi Braille to text, A
Historical Handwritten text Images Word Spotting Through Sliding Window HOG Features
HiT: Hierarchical Transformer with Momentum Contrast for Video-text Retrieval
HMM-Based Approach for text Region Detection in Coded Video Bitstreams
HMM-Based Multi Oriented text Recognition in Natural Scene Image
HMM-Based Recognizer with Segmentation-free Strategy for Unconstrained Chinese Handwritten text
HOI-Diff: text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
HOIAnimator: Generating text-Prompt Human-Object Animations Using Novel Perceptive Diffusion Models
Holistic Features are Almost Sufficient for text-to-Video Retrieval
Holistic Vertical Regional Proposal Network for Scene text Detection
HOVER: Hyperbolic Video-text Retrieval
How Good Is Good Enough? Establishing Quality Thresholds for the Automatic text Analysis of Retro-Digitized Comics
How is Visual Attention Influenced by text Guidance? Database and Model
How Much Handwritten text Is Needed for Text-Independent Writer Verification and Identification
How Much Handwritten text Is Needed for Text-Independent Writer Verification and Identification
How to Make Cross Encoder a Good Teacher for Efficient Image-text Retrieval?
HowTo100M: Learning a text-Video Embedding by Watching Hundred Million Narrated Video Clips
HRS-Bench: Holistic, Reliable and Scalable Benchmark for text-to-Image Models
HTD: A Fast Human-centered text-locating Method for Auxiliary Reading
HTR-VT: Handwritten text recognition with vision transformer
Human Motion Aware text-to-Video Generation with Explicit Camera Control
Human Preference Score: Better Aligning text-to-image Models with Human Preference
Human-centered Interactive Learning via MLLMs for text-to-Image Person Re-identification
HumanGaussian: text-Driven 3D Human Generation with Gaussian Splatting
Hybrid Algorithm for Contextual Text Recognition, A
Hybrid approach for Farsi/Arabic text detection and localisation in video frames
Hybrid Approach to Detect and Localize texts in Natural Scene Images, A
Hybrid Approach to Detect texts in Natural Scenes by Integration of a Connected-Component Method and a Sliding-Window Method, A
Hybrid approach to efficient text extraction in complex color images
Hybrid Chinese/English text detection in images and video frames
Hybrid Contextual Text Recognition with String Matching
Hybrid Deep Architecture for Robust Recognition of text Lines of Degraded Printed Documents, A
hybrid method based on estimation of distribution algorithms to train convolutional neural networks for text categorization, A
Hybrid Network For End-To-End text-Independent Speaker Identification
Hybrid R-BILSTM-C Neural Network Based text Steganalysis, A
Hybrid word/Part-of-Arabic-Word Language Models for arabic text document recognition
HybridEditDif: text and Exemplar Guided Image Editing with Diffusion Models
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and texts
Hyper-3DG: text-to-3D Gaussian Generation via Hypergraph
HyperDreamBooth: HyperNetworks for Fast Personalization of text-to-Image Models
HyperStyle3D: text-Guided 3D Portrait Stylization via Hypernetworks
Hypothesis Preservation Approach to Scene text Recognition with Weighted Finite-State Transducer
hypothesize-and-verify framework for text recognition using deep recurrent neural networks, A
Hytext: A Scene-Text Extraction Method for Video Retrieval
I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for text-Guided Multi-Mask Inpainting
I2T2I: Learning text to image synthesis with textual data augmentation
I2T: Image Parsing to text Description
I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-Shaped Scene text Detection
IAM-OnDB: An on-line English sentence database acquired from handwritten text on a whiteboard
IBM Rich Transcription 2007 Speech-to-text Systems for Lecture Meetings, The
IBN-STR: A Robust text Recognizer for Irregular Text in Natural Scenes
IBN-STR: A Robust text Recognizer for Irregular Text in Natural Scenes
ICA Based Approach for Complex Color Scene text Binarization, An
ICDAR 2005 text locating competition results
ICDAR 2011 Robust Reading Competition - Challenge 1: Reading text in Born-Digital Images (Web and Email)
ICDAR 2011 Robust Reading Competition Challenge 2: Reading text in Scene Images
ICDAR 2011: Arabic Recognition Competition: Multi-font Multi-size Digitally Represented text
ICDAR 2015 competition HTRtS: Handwritten text Recognition on the tranScriptorium dataset
ICDAR 2015 competition on text line detection in historical documents
ICDAR 2015 contest on MultiSpectral text Extraction (MS-TEx 2015)
ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic text
ICDAR2015 competition on text Image Super-Resolution
ICPR 2020 Competition on text Block Segmentation on a NewsEye Dataset
ICPR 2020 Competition on text Block Segmentation on a Newseye Dataset
ICPR2016 contest on Arabic text detection and Recognition in video frames - AcTiVComp
ICPR2020 Competition on text Detection and Recognition in Arabic News Video Frames
ICPR2020 Competition on text Detection and Recognition in Arabic News Video Frames
ICT-QA: Question Answering Over Multi-Modal Contexts Including Image, Chart, and Text Modalities
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of text-to-Image Models
IDBNet: Improved differentiable binarisation network for natural scene text detection
IDEA: Inverted text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Identification of personality traits from handwritten text documents using multi-label classification models
Identifying Handwritten text in Mixed Documents
Identifying SCADA Systems and Their Vulnerabilities on the Internet of Things: A text-Mining Approach
Identifying visual attributes for object recognition from text and taxonomy
Identity-Preserving text-To-Video Generation by Frequency Decomposition
iEdit: Localised text-guided Image Editing with Weak Supervision
Illegible text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks
Illegible text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks
Illusion of Unlearning: The Unstable Nature of Machine Unlearning in text-to-Image Diffusion Models, The
iLogBook: Enabling text-Searchable Event Query Using Sparse Vehicle-Mounted GPS Data
Im2text and Text2Im: Associating Images and Texts for Cross-Modal Retrieval
Image and text Coupling for Creating Electronic Books from Manuscripts
Image and text fusion for UPMC Food-101 using BERT and CNNs
Image Binarization for End-to-End text Understanding in Natural Images
Image clustering using generated text centroids
Image Dataset of text Patches in Everyday Scenes, An
Image Generation Method of Bird text Based on Improved StackGAN
Image is Worth Multiple Words: Multi-Attribute Inversion for Constrained text-To-Image Synthesis, An
Image Over text: Transforming Formula Recognition Evaluation with Character Detection Matching
Image Overlay text Detection Based on JPEG Truncation Error Analysis
Image Retrieval for Visual Localization via Scene text Detection and Logo Filtering
Image Search With text Feedback by Visiolinguistic Attention Learning
Image Segmentation Using text and Image Prompts
Image text Detection Using a Bandlet-Based Edge Detector and Stroke Width Transform
Image-based Document Vectors for text Retrieval
Image-text Co-Decomposition for Text-Supervised Semantic Segmentation
Image-text Co-Decomposition for Text-Supervised Semantic Segmentation
Image-text Embedding Learning via Visual and Textual Semantic Reasoning
Image-text feature learning for unsupervised visible-infrared person re-identification
Image-text Matching, Image Text Retrieval, Image-Text Retrieval
Image-text Matching, Image Text Retrieval, Image-Text Retrieval
Image-text Matching, Image Text Retrieval, Image-Text Retrieval
Image-text Multimodal Emotion Classification via Multi-View Attentional Network
Image-text Pre-Training for Logo Recognition
Image-text Retrieval With Cross-Modal Semantic Importance Consistency
Image-text-Image Knowledge Transfer for Lifelong Person Re-Identification With Hybrid Clothing States
Image-to-Character-to-Word Transformers for Accurate Scene text Recognition
Image-to-text Conversion and Aspect-Oriented Filtration for Multimodal Aspect-Based Sentiment Analysis
Image/text filtering system and method
Imaged Document text Retrieval Without OCR
Imagen Editor and EditBench: Advancing and Evaluating text-Guided Image Inpainting
Imagic: text-Based Real Image Editing with Diffusion Models
IMMA: Immunizing text-to-image Models Against Malicious Adaptation
Impact of Character Models Choice on Arabic text Recognition Performance
Impact of OCR Accuracy and Feature Transformation on Automatic text Classification, The
Impact of OCR Errors on Automated Classification of OCR Japanese texts with Parts-of-Speech Analysis, An
Impact of online handwriting recognition performance on text categorization
Impact of Pre-Processing on Recognition of Cursive Video text
Imperceptible Backdoor Attacks on text-Guided 3D Scene Grounding
Implementation of Advanced Encryption Standard for encryption and decryption of images and text on a GPU
Implementation of Three text to Speech Systems for Kurdish Language
Implicit Bias Injection Attacks against text-to-Image Diffusion Models
Implicit Feature Alignment: Learn to Convert text Recognizer to Text Spotter
Implicit Feature Alignment: Learn to Convert text Recognizer to Text Spotter
Improved Component Tree Based Approach to User-Intention Guided text Extraction from Natural Scene Images, An
Improved Document Skew Detection Based on text Line Connected-component Clustering
Improved Gini-Index Algorithm to Correct Feature-Selection Bias in text Classification
Improved Legibility of text for Multiprojector Tiled Displays
Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images
Improved Method Based on Weighted Grid Micro-structure Feature for text-Independent Writer Recognition, An
Improved SAR Ship Classification Method Using text-to-Image Generation-Based Data Augmentation and Squeeze and Excitation, An
Improved Scene text Extraction Method Using Conditional Random Field and Optical Character Recognition, An
Improved shot boundary detection method based on text edges
Improved text-detection methods for a camera-based text reading system for blind persons
Improved text-detection methods for a camera-based text reading system for blind persons
Improved Zero-Shot Classification by Adapting VLMs with text Descriptions
Improvement of video text recognition by character selection
Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network
Improving Cross-Modal Constraints: text Attribute Person Search With Graph Attention Networks
Improving Cross-Modal Image-text Retrieval With Teacher-Student Learning
Improving Description-Based Person Re-Identification by Multi-Granularity Image-text Alignments
Improving distinctiveness in video captioning with text-video similarity
Improving End-to-End text Image Translation From the Auxiliary Text Translation Task
Improving End-to-End text Image Translation From the Auxiliary Text Translation Task
Improving Faithfulness of text-to-Image Diffusion Models through Inference Intervention
Improving Fine-Grained Understanding for Retrieval in Human Motion and text
Improving Full-text Precision on Short Queries Using Simple Constraints
Improving Handwritten Chinese text Recognition by Confidence Transformation
Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
Improving Image Recognition by Retrieving from Web-Scale Image-text Data
Improving image similarity measures for image browsing and retrieval through latent space learning between images and long texts
Improving Image-text Matching by Integrating Word Sense Disambiguation
Improving Image-text Matching With Bidirectional Consistency of Cross-Modal Alignment
Improving Multi-class text Classification with Naive Bayes
Improving Multiclass text Classification with the Support Vector Machine
Improving OCR text Categorization Accuracy with Electronic Abstracts
Improving Offline Handwritten text Recognition with Hybrid HMM/ANN Models
Improving Open-Vocabulary Scene text Recognition
Improving patch-based scene text script identification with ensembles of conjoined networks
Improving Persian text Classification Using Persian Thesaurus
Improving Scene text Detection by Scale-Adaptive Segmentation and Weighted CRF Verification
Improving text Classifier Performance based on AUC
Improving text-Based Person Search by Spatial Matching and Adaptive Threshold
Improving text-guided Object Inpainting with Semantic Pre-inpainting
Improving text-image Matching with Adversarial Learning and Circle Loss for Multi-modal Steganography
Improving Vision-and-language Navigation with Image-text Pairs from the Web
IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-text Retrieval
In-air Handwritten Chinese text Recognition with Attention Convolutional Recurrent Network
In-air handwritten Chinese text recognition with temporal convolutional recurrent network
In-Style: Bridging text and Uncurated Videos with Style Transfer for Text-Video Retrieval
In-Style: Bridging text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Incorporating Language Syntax in Visual text Recognition with a Statistical-Model
Incorporating Self-attention Mechanism and Multi-task Learning into Scene text Detection
Incremental Approach to text Representation, Categorization, and Retrieval, An
Incremental Detection of text on Road Signs
Incremental text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model
Indexing On-line Handwritten texts Using Word Confusion Networks
Indexing text Events in Digital Video Databases
InducT-GCN: Inductive Graph Convolutional Networks for text Classification
Industrial Scene text Detection With Refined Feature-Attentive Network
Inferential Rules for Identifying Answers in TOEFL texts
Inferring Semantic Layout for Hierarchical text-to-Image Synthesis
Infinite Liouville mixture models with application to text and texture categorization
Inflation with Diffusion: Efficient Temporal Adaptation for text-to-Video Super-Resolution
Influence of text line segmentation in Handwritten Text Recognition
Influence of text line segmentation in Handwritten Text Recognition
Information Detection for the Process of Typhoon Events in Microblog text: A Spatio-Temporal Perspective
Information Extraction and Classification from Free text Using a Neural Approach
Information extraction from scanned invoice images using text analysis and layout features
Information fusion for text classification an experimental comparison
Information Theoretic text Classification Using the Ziv-Merhav Method
InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot text-based Video Editing
Initialized and Guided EM-clustering of Sparse Binary Data with Application to text Based Documents
Initno: Boosting text-to-Image Diffusion Models via Initial Noise Optimization
Injecting text Clues for Improving Anomalous Event Detection From Weakly Labeled Videos
InNeRF360: text-Guided 3D-Consistent Object Inpainting on 360° Neural Radiance Fields
inpainting system for automatic image structure-texture restoration with text removal, An
Inspecting the Geographical Representativeness of Images from text-to-Image Models
Instance-wise distribution control of text-to-image diffusion models
InstanceCap: Improving text-to-Video Generation via Instance-aware Structured Caption
Instant3D: Instant text-to-3D Generation
InstantBooth: Personalized text-to-Image Generation without Test-Time Finetuning
Instruction-Augmented Multimodal Alignment for Image-text and Element Matching
Instruction-Guided Scene text Recognition
Instructive3D: Editing Large Reconstruction Models with text Instructions
Instrumental Assessment of Prosodic Quality for text-to-Speech Signals
Integrated Algorithm for text Recognition: Comparison with a Cascaded Algorithm, An
Integrated text and Line-Art Extraction from a Topographic Map
Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents
Integrating Knowledge Sources in Devanagari text Recognition System
Integrating Language Guidance Into Image-text Matching for Correcting False Negatives
Integrating Language Model in Handwritten Chinese text Recognition
Integrating multiple character proposals for robust scene text extraction
Integrating Visual, Audio and text Analysis for News Video
Integrating word level knowledge in text recognition
Integration of Linguistic and Geospatial Features Using Global Context Embedding for Automated Text Geocoding, The
Intelligent Typography: Artistic text Style Transfer for Complex Texture and Structure
IntelliSearch: Intelligent Search for Images and text on the Web
Inter-Intra Modal Representation Augmentation With DCT-Transformer Adversarial Network for Image-text Matching
InteractDiffusion: Interaction Control in text-to-Image Diffusion Models
Interactive Enhancement of Handwritten text through Multi-resolution Gaussian
Interactive Image Manipulation with Complex text Instructions
Interactive Off-Line Handwritten text Transcription Using On-Line Handwritten Text as Feedback
Interactive Off-Line Handwritten text Transcription Using On-Line Handwritten Text as Feedback
Interactive System to Extract Structured text from a Geometrical Representation, An
Interactive textbooks; Embedding Image Processing Operator Demonstrations in Text
Interfusion: text-driven Generation of 3d Human-object Interaction
Interleaved text/image Deep Mining on a large-scale radiology database
Interpretation of The Function of The Obelisk of Augustus in Rome From Antique texts to Present Time Virtual Reconstruction
Interword distance changes represented by sine waves for watermarking text images
Intra-modal consistency for image-text retrieval through soft-label distillation
Intra-Modal Constraint Loss for Image-text Retrieval
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in text-to-Image Generative Models
Inverse-Like Antagonistic Scene text Spotting via Reading-Order Estimation and Dynamic Sampling
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of text-to-Video Diffusion Models
Investigation on LLMs' Visual Understanding Ability Using SVG for Image-text Bridging, An
Investigations in Psychological Stress Detection from Social Media text using Deep Architectures
IOS-Net: An inside-to-outside supervision network for scale robust text detection in the wild
IPAD: Iterative, Parallel, and Diffusion-Based Network for Scene text Recognition
Irregular text block recognition via decoupling visual, linguistic, and positional information
Is An Image Worth Five Sentences? A New Look into Semantics for Image-text Matching
Is Arabic text categorization a solved task?
ISL RT-07 Speech-to-text System, The
ISTD-DLA: Industrial Scene text Detection Method Based on Dynamic Local-Aware Aggregation Network
It's All About The Scale: Efficient text Detection Using Adaptive Scaling
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, text, and Architectural Enhancements
IterVM: Iterative Vision Modeling Module for Scene text Recognition
ITI-Gen: Inclusive text-to-Image Generation
JECL: Joint Embedding and Cluster Learning for Image-text Pairs
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized text-to-Image Generation
Joint architecture and knowledge distillation in CNN for Chinese text recognition
Joint embeddings with multimodal cues for video-text retrieval
Joint Handwritten text Recognition and Word Classification for Tabular Information Extraction
Joint Image-text News Topic Detection and Tracking by Multimodal Topic And-Or Graph
Joint Inference of Objects and Scenes With Efficient Learning of text-Object-Scene Relations
Joint Intra & Inter-Grained Reasoning: A New Look Into Semantic Consistency of Image-text Retrieval
Joint representation learning for text and 3D point cloud
Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks
Joint Token and Feature Alignment Framework for text-Based Person Search
Joint Video and text Parsing for Understanding Events and Answering Queries
Joint Visual Semantic Reasoning: Multi-Stage Decoder for text Recognition
Jointdreamer: Ensuring Geometry Consistency and text Congruence in Text-to-3d Generation via Joint Score Distillation
Jointdreamer: Ensuring Geometry Consistency and text Congruence in Text-to-3d Generation via Joint Score Distillation
JPEG2000 Compatible Watermarking of text in Images
Kanji recognition in scene images without detection of text fields: robust against variation of viewpoint, contrast, and background texture
KDProR: A Knowledge-decoupling Probabilistic Framework for Video-text Retrieval
Kernel Adaptive Convolution for Scene text Detection via Distance Map Prediction
kernel trick for sequences applied to text-independent speaker verification systems, A
Kernel-Based Mixture Mapping for Image and text Association
Keystroke Biometric Recognition Studies on Long-text Input under Ideal and Application-Oriented Conditions
Keyword spotting in handwritten documents based on a generic text line HMM and a SVM verification
Keyword Spotting in Online Handwritten Documents Containing text and Non-text Using BLSTM Neural Networks
Keyword Spotting in Online Handwritten Documents Containing text and Non-text Using BLSTM Neural Networks
KHATT: An open Arabic offline handwritten text database
KHATT: Arabic Offline Handwritten text Database
Khmerst: A Low-resource Khmer Scene text Detection and Recognition Benchmark
Knowing Where to Focus: Attention-Guided Alignment for text-based Person Search
Knowledge Mining with Scene text for Fine-Grained Recognition
Knowledge-Driven Generative Adversarial Network for text-to-Image Synthesis
KOHTD: Kazakh offline handwritten text dataset
KT-GAN: Knowledge-Transfer Generative Adversarial Network for text-to-Image Synthesis
Ktext: Arbitrary shape text detection using modified K-Means
L-Verse: Bidirectional Generation Between Image and text
Label embedding for text recognition
Label Embedding: A Frugal Baseline for text Recognition
Label Incorporated Graph Neural Networks for text Classification
Label or Message: A Large-Scale Experimental Survey of texts and Objects Co-Occurrence
LAM Dataset: A Novel Benchmark for Line-Level Handwritten text Recognition, The
Language Adaptive Methodology for Handwritten text Line Segmentation
Language Identification for Printed text Independent of Segmentation
Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene text Detection and Spotting
language model using variable length tokens for open-vocabulary Hangul text recognition, A
Language-Aware Soft Prompting: text-to-Text Optimization for Few- and Zero-Shot Adaptation of V&L Models
Language-Aware Soft Prompting: text-to-Text Optimization for Few- and Zero-Shot Adaptation of V&L Models
Language-Independent text Lines Extraction Using Seam Carving
Language-Independent text-Line Extraction Algorithm for Handwritten Documents
Laplacian Approach to Multi-Oriented text Detection in Video, A
Laplacian Method for Video text Detection, A
Large scalability in document image matching using text retrieval
Large Scale Scene text Verification with Guided Attention
Large-Lexicon Attribute-Consistent text Recognition in Natural Images
Large-Scale text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
LarTap: A Luminance-Aware Framework With text-Correlation Priors for Multi-Exposure Image Fusion
LASP: text-to-Text Optimization for Language-Aware Soft Prompting of Vision and Language Models
LASP: text-to-Text Optimization for Language-Aware Soft Prompting of Vision and Language Models
Latent Guard: A Safety Framework for text-to-image Generation
Latenteditor: text Driven Local Editing of 3d Scenes
LaTeRF: Label and text Driven Object Radiance Fields
LaTr: Layout-Aware Transformer for Scene-text VQA
Latte3d: Large-scale Amortized text-to-enhanced3d Synthesis
LATTECLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic texts
Layerdiff: Exploring text-guided Multi-layered Composable Image Synthesis via Layer-collaborative Diffusion Model
layered method for determining manga text bubble reading order, A
Layout and language: exploring text block discovery in tables using linguistic resources
Layout-Agnostic Scene text Image Synthesis with Diffusion Models
Layout-Bridging text-to-Image Synthesis
LayoutFormer: Hierarchical text Detection Towards Scene Text Understanding
LayoutFormer: Hierarchical text Detection Towards Scene Text Understanding
LCM-Lookahead for Encoder-based text-to-image Personalization
LD-ZNet: A Latent Diffusion Approach for text-Based Image Segmentation
Learn to Augment: Joint Data Augmentation and Network Optimization for text Recognition
Learned Image Compression with text Quality Enhancement
Learning a Limited text Space for Cross-Media Retrieval
Learning Aligned Image-text Representations Using Graph Attentive Relational Network
Learning analytics system for assessing students' performance quality and text mining in online communication
Learning and Integrating Multi-Level Matching Features for Image-text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-text Retrieval
Learning bottom-up text attention maps for text detection using stroke width transform
Learning bottom-up text attention maps for text detection using stroke width transform
Learning by Imagination: A Joint Framework for text-Based Image Manipulation and Change Captioning
Learning CLIP Guided Visual-text Fusion Transformer for Video-based Pedestrian Attribute Recognition
Learning Coarse-to-Fine Graph Neural Networks for Video-text Retrieval
Learning confidence transformation for handwritten Chinese text recognition
Learning Continuous 3D Words for text-to-Image Generation
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using text and Sketch
Learning Deep Structure-Preserving Image-text Embeddings
Learning Disentangled Identifiers for Action-Customized text-to-Image Generation
Learning Dual Semantic Relations With Graph Attention for Image-text Matching
Learning From Short text Streams With Topic Drifts
Learning From text: A Multimodal Face Inpainting Network for Irregular Holes
Learning from Video and text via Large-Scale Discriminative Clustering
Learning Generative Structure Prior for Blind text Image Super-resolution
Learning Linguistic Association Towards Efficient text-Video Retrieval
Learning Markov Clustering Networks for Scene text Detection
Learning Multi-Dimensional Human Preference for text-to-Image Generation
Learning multi-view embedding in joint space for bidirectional image-text retrieval
Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image-text Matching
Learning Semantic Polymorphic Mapping for text-Based Person Retrieval
Learning Semantic Relationship among Instances for Image-text Matching
Learning Semantic text Features for Web Text-Aided Image Classification
Learning Semantic text Features for Web Text-Aided Image Classification
Learning Shape-Aware Embedding for Scene text Detection
Learning Shape-Color Diffusion Priors for text-Guided 3D Object Generation
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Learning Spatially-Variable Filters for Super-Resolution of text
Learning text-Line Segmentation Using Codebooks and Graph Partitioning
Learning text-to-Video Retrieval from Image Captioning
Learning the Lexicon from raw texts for open-vocabulary Korean word recognition
Learning to Detect Scene text Using a Higher-Order MRF with Belief Propagation
Learning to detect, localize and recognize many text objects in document images from few examples
Learning to Embed Semantic Similarity for Joint Image-text Retrieval
Learning to Generate Semantic Layouts for Higher text-Image Correspondence in Text-to-Image Synthesis
Learning to Generate Semantic Layouts for Higher text-Image Correspondence in Text-to-Image Synthesis
Learning to Generate text-Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs
Learning to Generate text-Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs
Learning to Group text Lines and Regions in Freeform Handwritten Notes
Learning to Localize Actions in Instructional Videos with Llm-based Multi-pathway text-video Alignment
Learning to Read L'Infinito: Handwritten text Recognition with Synthetic Training Data
Learning to Sample Effective and Diverse Prompts for text-to-Image Generation
Learning to Sort Handwritten text Lines in Reading Order through Estimated Binary Order Relations
Learning to summarize web image and text mutually
Learning to Super-Resolve Blurry Face and text Images
Learning transferable features in meta-learning for few-shot text classification
Learning Two-Branch Neural Networks for Image-text Matching Tasks
Learning Visual Compound Models from Parallel Image-text Datasets
Learning Visual Generative Priors without text
Lecture Video Enhancement and Editing by Integrating Posture, Gesture, and text
LEDITS++: Limitless Image Editing Using text-to-Image Models
LeftRefill: Filling Right Canvas based on Left Reference through Generalized text-to-Image Diffusion Model
Legit: text Legibility for User-Generated Media
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in text-to-image Diffusion Models
Length Normalization in Degraded text Collections
Length-sensitive Language-bound Recognition Network for Multilingual text Recognition, A
Leveraging Multimodal Large Language Models for Joint Discrete and Continuous Evaluation in text-to-Image Alignment
Leveraging Smart Devices for Scene text Preserved Image Stylization: A Deep Gaming Approach
Leveraging Style and Content features for text Conditioned Image Retrieval
Leveraging surrounding context for scene text detection
Leveraging text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Leveraging text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Leveraging text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Leveraging the Mixed-text Segmentation Problem to Design Secure Handwritten CAPTCHAs
Lexicon based feature extraction for emotion text classification
Lexicon Generation for Emotion Detection from text
Lexicon-based offline recognition of Amharic words in unconstrained handwritten text
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-text Sparse Retrieval
Lifelong Learning for text Steganalysis Based on Chronological Task Sequence
Light Weight text Extraction Technique for Hand-Held Device, A
light-weight text image processing method for handheld embedded cameras, A
Lightweight Attentional Feature Fusion: A New Baseline for text-to-Video Retrieval
Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
Lightweight Multi-Grained Image-text Retrieval Paradigm via Cascaded Representation Learning and Parameter-Free Feature Aggregation, A
Lightweight text-Driven Image Editing With Disentangled Content and Attributes
LIMITR: Leveraging Local Information for Medical Image-text Representation
Line Segmentation for Grayscale text Images of Khmer Palm Leaf Manuscripts
Line-Direction-Free and Character-Orientation-Free On-Line Handwritten Japanese text Recognition System, A
Linecounter: Learning Handwritten text Line Segmentation By Counting
LinGen: Towards High-Resolution Minute-Length text-to-Video Generation with Linear Computational Complexity
Linguistic Hallucination for text-Based Video Retrieval
Linguistic integration information in the aabatas arabic text analysis system
Linguistic Steganalysis via text Dual Attention Fusing Statistical and Multi-Layer Semantic Features
Linguistics-aware Masked Image Modeling for Self-supervised Scene text Recognition
Link the Head to the Beak: Zero Shot Learning from Noisy text Description at Part Precision
Linking Image and text with 2-Way Nets
Linking text and visual concepts semantically for cross modal multimedia search
LISTER: Neighbor Decoding for Length-Insensitive Scene text Recognition
LiT: Zero-Shot Transfer with Locked-Image text Tuning
Livephoto: Real Image Animation with text-guided Motion Control
Local Action-guided Motion Diffusion Model for text-to-motion Generation
Local Binary Pattern-Based Features for text Identification of Web Images
Local Gradient Difference Features for Classification of 2D-3D Natural Scene text Images
Local Skew Angle Estimation from Background Space in text Regions
Local variance image-based for scene text binarization under illumination effects
Local-enhanced representation for text-based person search
Local-Global Video-text Interactions for Temporal Grounding
Localization and Manipulation of Immoral Visual Cues for Safe text-to-Image Generation
Localization, extraction and recognition of text in Telugu document images
localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods, A
Localize, Group, and Select: Boosting text-VQA by Scene Text Modeling
Localize, Group, and Select: Boosting text-VQA by Scene Text Modeling
Localized Concept Erasure for text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
Localizing and segmenting text in images and videos
Localizing blurry and low-resolution text in natural images
Localizing Object-level Shape Variations with text-to-Image Diffusion Models
Localizing scene texts by fuzzy inference systems and low rank matrix recovery model
Localizing text in Scene Images by Boundary Clustering, Stroke Segmentation, and String Fragment Classification
LOCAT: Localization-Driven text Watermarking via Large Language Models
Locating text in Color Documents
Locating text in Complex Color Images
Locating text in Images Based on the Smooth Gray-Level Detection
Locating text in images using matched wavelets
Locating Uniform-colored text in Video Frames
LocVTP: Video-text Pre-training for Temporal Localization
LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese text Line Recognition
LoGoPrompt: Synthetic text Images Can Be Good Visual Prompts for Vision-Language Models
Long-CLIP: Unlocking the Long-text Capability of CLIP
Long-FAS: Cross-domain face anti-spoofing with long text guidance
Longest Common Subsequence Algorithm Suitable for Similar text Strings, A
Look More Than Once: An Accurate Detector for text of Arbitrary Shapes
Looking at Words and Points with Attention: A Benchmark for text-to-Shape Coherence
Looking from a Higher-level Perspective: Attention and Recognition Enhanced Multi-scale Scene text Segmentation
LoSh: Long-Short text Joint Prediction Network for Referring Video Object Segmentation
lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching, A
Lost in Translation: Latent Concept Misalignment in text-to-image Diffusion Models
Lost Your Style? Navigating with Semantic-Level Approach for text-to-Outfit Retrieval
LoTeR: Localized text prompt refinement for zero-shot referring image segmentation
Low Complexity Sign Detection and text Localization Method for Mobile Applications, A
LucidDreamer: Towards High-Fidelity text-to-3D Generation via Interval Score Matching
LuoJiaHOG: A hierarchy oriented geo-aware image caption dataset for remote sensing image-text retrieval
L_0-Regularized Intensity and Gradient Prior for Deblurring text Images and Beyond
M-Adaptor: text-Driven Whole-Body Human Motion Generation
M2d2m: Multi-Motion Generation from text with Discrete Diffusion Models
M3TTS: Multi-modal text-to-speech of multi-scale style control for dubbing
MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes
MAAN: Memory-Augmented Auto-Regressive Network for text-Driven 3D Indoor Scene Generation
MAC: Masked Contrastive Pre-Training for Efficient Video-text Retrieval
Machine Learning Approach to Hypothesis Decoding in Scene text Recognition, A
Machine printed text and handwriting identification in noisy document images
Machine reading of camera-held low quality text images: An ICA-based image enhancement approach for improving OCR accuracy
Machine reading of handwritten text information in field technician's maps
Machine recognition and correction of printed Arabic text
Machine Recognition of Multi Font Printed Arabic texts
Machine Recognition of Optically Captured Machine Printed Arabic text
Machine Recognition of Printed Kannada text
Machine-printed and hand-written text lines identification
MADA:Multi-Window Attention and Dual-Alignment for Image-text Retrieval
MAGAE: Multi-Level Alignment Over Aggregation Semantic Graph With Attribute Enhancement for text-Based Vehicle Retrieval
Magic3D: High-Resolution text-to-3D Content Creation
MAGIC: Multi-granularity domain adaptation for text recognition
MagicFusion: Boosting text-to-Image Generation Performance by Fusing Diffusion Models
Major Components of a Complete text Reading System
Make It Count: text-to-Image Generation with an Accurate Number of Objects
Make It Move: Controllable Image-to-Video Generation with text Descriptions
Make-A-Scene: Scene-Based text-to-Image Generation with Human Priors
Make-An-Animation: Large-Scale text-conditional 3D Human Motion Generation
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from text
Making the Most of text Semantics to Improve Biomedical Vision-Language Processing
Making the V in text-VQA Matter
Mandarin text-to-Speech Front-End With Lightweight Distilled Convolution Network
Manga text Detection with Manga-specific Data Augmentation and Its Applications on Emotion Analysis
ManiCLIP: Multi-attribute Face Manipulation from text
ManiGAN: text-Guided Image Manipulation
ManiTrans: Entity-Level text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation
MANTA: A Large-Scale Multi-View and Visual-text Anomaly Detection Dataset for Tiny Objects
Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video-text Retrieval
Marking text Documents
Marking text features of document images to deter illicit dissemination
Markov Model Order Optimization for text Recognition
Markov Random Field Based text Identification from Annotated Machine Printed Documents
MarkovGen: Structured Prediction for Efficient text-to-Image Generation
Markovian Engine for text Recognition: Cursive Arabic Text, Statistical Features and Interconnected HMMs, A
Markovian Engine for text Recognition: Cursive Arabic Text, Statistical Features and Interconnected HMMs, A
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity text-to-3D Content Creation
Mask R-CNN With Pyramid Attention Network for Scene text Detection
Mask textspotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Mask textSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
MaskDiffusion: Boosting text-to-Image Consistency with Conditional Mask
Masked and Permuted Implicit Context Learning for Scene Text Recognition
Masked text Pre-Training for Scene Text Detection
Masked text Pre-Training for Scene Text Detection
MASTER: Multi-aspect non-local network for scene text recognition
Masterweaver: Taming Editability and Face Identity for Personalized text-to-image Generation
Mathematical properties of the native integral ratio handwriting and text extraction technique
Matryoshka Learning With Metric Transfer for Image-text Matching
Maxfusion: Plug&play Multi-modal Generation in text-to-Image Diffusion Models
Maximum Likelihood Discriminant Feature for text-Independent Speaker Verification
Maximum Margin Approach to Learning text Classifiers Methods, Theory and Algorithms, The
Maximum Spanning Trees For text Segmentation
maximum-likelihood approach to segmentation-based recognition of unconstrained handwriting text, A
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex text-to-Image Generation
mDRA: A Multimodal Depression Risk Assessment Model Using Audio and text
MEAN: Multi - Element Attention Network for Scene text Recognition
Medblip: Bootstrapping Language-image Pretraining from 3d Medical Images and texts
Medical-Image Retrieval Based on Knowledge-Assisted text and Image Indexing
MedSyn: text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images
Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-text Retrieval
Memory-Efficient Models for Scene text Recognition via Neural Architecture Search
MER-CAPF: Audio-text emotion recognition through cross-attention mechanism and multi-granularity pooling strategy
MESA: text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
Met-MLTS: Leveraging Smartphones for End-to-End Spotting of Multilingual Oriented Scene texts and Traffic Signs in Adverse Meteorological Conditions
MetaCloak: Preventing Unauthorized Subject-Driven text-to-Image Diffusion-Based Synthesis via Meta-Learning
MetaHTR: Towards Writer-Adaptive Handwritten text Recognition
MetaWriter: Personalized Handwritten text Recognition Using Meta-Learned Prompt Tuning
Method and apparatus for detecting running text in an image
Method and apparatus for the resolution enhancement of gray-scale images that include text and line art
Method and system for recognizing a boundary between characters in handwritten text
Method for automatic recognition of white blocks as well as text, graphics and/or gray image areas on a printed master
method for detecting text of arbitrary shapes in natural scenes that improves text spotting, A
method for detecting text of arbitrary shapes in natural scenes that improves text spotting, A
method for discovering knowledge in texts, A
Method for Extracting text from Stone Inscriptions Using Character Spotting, A
Method for identification and compression of facsimile symbols in text processing systems
Method for identifying word bounding boxes in text
Method for Semantic Relatedness Based Query Focused text Summarization, A
Method for text Localization and Recognition in Real-World Images, A
Method for Transformer Oil Leakage Detection text Generation Using Combined Large and Small Models
Method for unconstrained text detection in natural scene image
method for variable quantization in JPEG for improved text quality in compound documents, A
method of N-grams in large-scale clustering of DNA texts, The
Method of separating text and graphs in digital image data
Methods for text segmentation from scene images
Metric Learning for text Documents
Mevg: Multi-event Video Generation with text-to-video Models
MF-GAN: Multi-conditional Fusion Generative Adversarial Network for text-to-Image Synthesis
MFECLIP: CLIP With Mapping-Fusion Embedding for text-Guided Image Editing
MicroCinema: A Divide-and-Conquer Approach for text-to-Video Generation
MIGC: Multi-Instance Generation Controller for text-to-Image Synthesis
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Mimir: Improving Video Diffusion Models for Precise text Understanding
Minimal Interaction Touchless text Input with Head Movements and Stereo Vision
Minimum Error Rate Training for PHMM-Based text Recognition
Minimum Risk Training for Handwritten Chinese/Japanese text Recognition Using Semi-Markov Conditional Random Fields
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition
Mining conversational text for procedures with applications in contact centers
Mining False Positive Examples for text-Based Person Re-Identification
Mining the displacement of max-pooling for text recognition
Minority-Focused text-to-Image Generation via Prompt Optimization
MirrorGAN: Learning text-To-Image Generation by Redescription
Mis?-) Using DRT for Generation of Natural Language text from Image Sequences
MISL: Multi-grained image-text semantic learning for text-guided image inpainting
MISL: Multi-grained image-text semantic learning for text-guided image inpainting
Mismatch Quest: Visual and textual Feedback for Image-Text Misalignment
Mita: An Information Extraction Approach to the Analysis of Free-Form text in Life-Insurance Applications
Mixdq: Memory-efficient Few-step text-to-image Diffusion Models with Metric-decoupled Mixed Precision Quantization
Mixed-Supervised Scene text Detection With Expectation-Maximization Algorithm
Mobile visual search on printed documents using text and low bit-rate features
MobileCLIP: Fast Image-text Models through Multi-Modal Reinforced Training
Mobilediffusion: Instant text-to-image Generation on Mobile Devices
Modality Disentangled Discriminator for text-to-Image Synthesis
Model and Data Integrated Transfer Learning for Unstructured Map text Detection
Model Based text Line Segmentation Method for Off-line Handwritten Documents, A
model for detecting and merging vertically spanned table cells in plain text documents, A
Model of On-line Handwritten Japanese text Recognition Free from Line Direction and Writing Format Constraints, A
model-based approach to offline text-independent Arabic writer identification and verification, A
Model-Based System Specification With Tesperanto: Readable text From Formal Graphics
Modeling Motion with Multi-Modal Features for text-Based Video Segmentation
Modeling of image, video and text fusion quality data packet system for aerospace complex products based on business intelligence
Modeling Stroke Mask for End-to-End text Erasing
Modeling Thousands of Human Annotators for Generalizable text-to-Image Person Re-identification
Modern vs Diplomatic Transcripts for Historical Handwritten text Recognition
Moment-Based Image Normalization for Handwritten text Recognition
Monkey: Image Resolution and text Label are Important Things for Large Multi-Modal Models
Mono-font Cursive Arabic text Recognition Using Speech Recognition System
Morality Classification in Natural Language text
MORAN: A Multi-Object Rectified Attention Network for scene text recognition
More Grounded Image Captioning by Distilling Image-text Matching Model
More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-text Matching
More than Words: In-the-Wild Visually-Driven Prosody for text-to-Speech
Morpheus: text-Driven 3D Gaussian Splat Shape and Color Stylization
MorphNeRF: text-Guided 3D-Aware Editing via Morphing Generative Neural Radiance Fields
Morphological Approach for text-Line Segmentation in Handwritten Documents, A
Morphological Approach to text String Extraction from Regular Periodic Overlapping Text-Background Images, A
Morphological Approach to text String Extraction from Regular Periodic Overlapping Text-Background Images, A
Morphological text Extraction from Images
Morphology-based hierarchical representation with application to text segmentation in natural images
Morphology-based text line extraction
Morphtext: Deep Morphology Regularized Accurate Arbitrary-Shape Scene Text Detection
Mosaicing-by-recognition for video-based text recognition
Mosaicing-by-recognition: a technique for video-based text recognition
MOST: A Multi-Oriented Scene text Detector with Localization Refinement
MotiF: Making text Count in Image Animation with Motion Focal Loss
MotionDiffuse: text-Driven Human Motion Generation With Diffusion Model
Motiondirector: Motion Customization of text-to-video Diffusion Models
Moto: Enhancing Embedding with Multiple Joint Factors for Chinese text Classification
Movie fill in the blank by joint learning from video and text with adaptive temporal attention
Movie/Script: Alignment and Parsing of Video and text Transcription
MPEG-7 Videotext Description Scheme for Superimposed Text in Images and Video
MRF based text binarization in complex images using stroke feature
MRF Model for Binarization of Natural Scene text, An
MRN: Multiplexed Routing Network for Incremental Multilingual text Recognition
MRP-GAN: Multi-resolution parallel generative adversarial networks for text-to-image synthesis
MSCap: Multi-Style Image Captioning With Unpaired Stylized text
MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating text and Speech
MSER-Based Real-Time text Detection and Tracking
MSR-Video to text dataset with clean annotations, The
MSSA: A Multi-Scale Semantic-Aware Method for Remote Sensing Image-text Retrieval
MTA-CLIP: Language-guided Semantic Segmentation with Mask-text Alignment
MTADiffusion: Mask text Alignment Diffusion Model for Object Inpainting
MTGT: Multiscale text Feature-Guided Transformer in medical image segmentation
MTRNet++: One-stage mask-based scene text eraser
MUGEN: A Playground for Video-Audio-text Multimodal Understanding and GENeration
MULAN: A Multi Layer Annotated Dataset for Controllable text-to-Image Generation
MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned text Embedding and Alternating Training
Multi language text detection using fast stroke width transform
Multi scale mirror connection based encoder decoder network for text localization
Multi-branch Network with Ensemble Learning for text Removal in the Wild
Multi-Concept Customization of text-to-Image Diffusion
Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video
Multi-Dimensional Quality Assessment for text-to-3D Assets: Dataset and Model
Multi-event Video-text Retrieval
Multi-fractal Modeling for On-line text-Independent Writer Identification
Multi-Grained Vision-and-Language Model for Medical Image and text Alignment
Multi-Granularity Aggregation Transformer for Joint Video-Audio-text Representation Learning
Multi-Granularity Matching Transformer for text-Based Person Search
Multi-granularity Prediction for Scene text Recognition
Multi-Granularity Prediction with Learnable Fusion for Scene text Recognition
Multi-Group Proportional Representation for text-to-Image Models
Multi-head Self-relation Network for Scene text Recognition, A
Multi-Label Generalized Zero Shot Chest X-Ray Classification by Combining Image-text Information With Feature Disentanglement
Multi-label text Classification Approach for Sentence Level News Emotion Analysis
Multi-layer feature fusion based image style transfer with arbitrary text condition
Multi-Layer Probabilistic Association Reasoning Network for Image-text Retrieval
Multi-lingual scene text detection and language identification
Multi-lingual text recognition from video frames
Multi-Modal Architecture With Spatio-Temporal-text Adaptation for Video-Based Traffic Accident Anticipation, A
Multi-modal Contextual Graph Neural Network for Text Visual Question Answering
Multi-Modal Fusion Network for Rumor Detection with texts and Images
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene text
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Multi-Modal Reasoning Graph for Scene-text Based Fine-Grained Image Classification and Retrieval
Multi-Modal Reference Learning for Fine-Grained text-to-Image Retrieval
Multi-Modal Representation Learning with text-Driven Soft Masks
Multi-modal text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
Multi-Modal Topic Model for Image Annotation Using text Analysis, A
Multi-Orientation Scene text Detection with Adaptive Clustering
Multi-orientation scene text detection with multi-information fusion
Multi-Oriented and Multi-Lingual Scene text Detection With Direct Regression
Multi-oriented Bangla and Devnagari text recognition
Multi-oriented English text Line Identification
Multi-oriented Scene text Detection via Corner Localization and Region Segmentation
Multi-oriented text detection from natural scene images based on a CNN and pruning non-adjacent graph edges
Multi-oriented text Detection with Fully Convolutional Networks
Multi-Oriented text Extraction in Stylistic Documents
Multi-oriented touching text character segmentation in graphical documents using dynamic programming
Multi-phase recognition of multifont photoscript Arabic text
multi-plane approach for text segmentation of complex document images, A
Multi-polarity text segmentation using graph theory
Multi-resolution form of SVD for text-independent speaker recognition
Multi-Resolution Pathology-Language Pre-training Model with text-Guided Visual Representation
Multi-Scale Feature Fusion Based on Piecewise Polynomial Activation Function for Image-text Matching
Multi-scale sequential network for semantic text segmentation and localization
Multi-scale text Line Segmentation Method in Freestyle Handwritten Documents, A
Multi-scale video text detection based on corner and stroke width verification
multi-scenario text generation method based on meta reinforcement learning, A
Multi-schema prompting powered token-feature woven attention network for short text classification
Multi-script and Multi-oriented text Localization from Scene Images
Multi-script iterative steerable directional filtering for handwritten text line extraction
Multi-script text Extraction from Natural Scenes
Multi-script text versus non-text classification of regions in scene images
Multi-script text versus non-text classification of regions in scene images
Multi-Script-Oriented text Detection and Recognition in Video/Scene/Born Digital Images
Multi-sensor text classification experiments: A comparison
Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained text-to-Image Synthesis
Multi-Sentence Complementarily Generation for text-to-Image Synthesis
Multi-Speaker text-to-Speech Training With Speaker Anonymized Data
Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene text Detection in Video Images
Multi-stage HMM based Arabic text recognition with rescoring
Multi-strategy tracking based text detection in scene videos
Multi-Style Shape Matching GAN for text Images
Multi-text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model
Multi-Track Timeline Control for text-Driven 3D Human Motion Generation
Multi-View User Preference Modeling for Personalized text-to-Image Generation
Multi-View Visual Semantic Embedding for Cross-Modal Image-text Retrieval
Multi3DRefer: Grounding text Description to Multiple 3D Objects
Multifractal Characterization of texts for Pattern Recognition: On the Complexity of Morphological Structures in Modern and Ancient Languages
Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs
Multilabel text Classification With Incomplete Labels: A Safe Generative Model With Label Manifold Regularization and Confidence Constraint
Multilateral Semantic Relations Modeling for Image text Retrieval
Multilevel Semantic Interaction Alignment for Video-text Cross-Modal Retrieval
Multilevel text-Line Segmentation Framework for Handwritten Historical Documents, A
Multilingual Artificial text Detection Using a Cascade of Transforms
Multilingual text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning
Multimodal alignment of event and text streams in spiking neural networks for human action recognition
Multimodal grid features and cell pointers for scene text visual question answering
Multimodal interactive transcription of text images
Multimodal Meme Classification Identifying Offensive Content in Image and text
Multimodal Neurons in Pretrained text-Only Transformers
Multimodal Processing and Interaction: Audio, Video, text
Multimodal Sentiment Analysis With Image-text Interaction Network
Multimodal Topic Modeling by Exploring Characteristics of Short text Social Media
Multimodal-LLM Agent For text-Driven Multi-Attribute Face Editing
Multioriented and Curved text Lines Extraction From Indian Documents
Multioriented Video Scene text Detection Through Bayesian Classification and Boundary Growing
multiple agent architecture for handwritten text recognition, A
Multiple attention encoded cascade R-CNN for scene text detection
Multiple Classifier Approach for the Recognition of Screen-Rendered text, A
Multiple Document Datasets Pre-training Improves text Line Detection With Deep Neural Networks
Multiple Geometry Transform Estimation from Single Camera-Captured text Image
Multiple Handwritten text Line Recognition Systems Derived from Specific Integration of a Language Model
Multiple Learned Dictionaries Based Clustered Sparse Coding for the Super-Resolution of Single text Image
Multiple Positives Enhanced NCE Loss for Image-text Retrieval, A
Multitwine: Multi-Object Compositing with text and Layout Control
Multivariate Feedback-Based Image-text Joint Learning for Sketch-Less Facial Image Retrieval
Multiview text Imagination Network Based on Latent Alignment for Image-Text Matching, A
Multiview text Imagination Network Based on Latent Alignment for Image-Text Matching, A
MuLTReNets: Multilingual text recognition networks for simultaneous script identification and handwriting recognition
MUST-VQA: Multilingual Scene-text VQA
Mutually Guided Dual-Task Network for Scene text Detection
Mutually textual and Visual Refinement Network for Image-Text Matching, A
MV-Adapter: Multimodal Video Transfer Learning for Video text Retrieval
MVCM: Enhancing Multi-View and Cross-Modality Alignment for Medical Visual Question Answering and Medical Image-text Retrieval
MVPortrait: text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
N-Gram-Based text Categorization
Name your style: text-guided artistic style transfer
Narrating the Video: Boosting text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Natural Language Watermarking Using Semantic Substitution for Chinese text
Natural scene text detection based on SWT, MSER and candidate classification
Natural Scene text Detection with Multi-channel Connected Component Segmentation
Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis
NaturalSpeech: End-to-End text-to-Speech Synthesis With Human-Level Quality
Navigating text-to-image Generative Bias Across Indic Languages
NCAP: Scene text Image Super-Resolution with Non-CAtegorical Prior
Negative-Aware Attention Framework for Image-text Matching
Negative-Prompt Inversion: Fast Image Inversion for Editing with text-Guided Diffusion Models
NEOCR: A Configurable Dataset for Natural Image text Recognition
neural model for text localization, transcription and named entity recognition in full pages, A
Neural network-based prediction of the stopping moment for text recognition in a video stream
Neural Network-based text Location for News Video Indexing
Neural network-based text location in color images
Neural Sign Actors: A diffusion model for 3D sign language production from text
Neuro or Symbolic? Fine-Tuned Transformer With Unsupervised LDA Topic Clustering for text Sentiment Analysis
Neuro-Symbolic Evaluation of text-to-Video Models using Formal Verification
Neuron-Based Spiking Transmission and Reasoning Network for Robust Image-text Retrieval
New Approach Based on texture and Geometric Features for Text Detection
New Approach for Overlay text Detection and Extraction From Complex Video Scene, A
new approach for text-independent speaker recognition, A
new approach for video text detection, A
New Approach towards text Filtering, A
New Arabic Printed text Image Database and Evaluation Protocols, A
New Binarization Approach Based on text Block Extraction
New Block Partitioned text Feature for Text Verification, A
New Block Partitioned text Feature for Text Verification, A
new deep CNN for 3D text localization in the wild through shadow removal, A
New Deep Wavefront Based Model for text Localization in 3D Video, A
new edge-based text verification approach for video, A
New Fourier-Statistical Features in RGB Space for Video text Detection
New Fuzzy Hierarchical Classification Based on SVM for text Categorization, A
New Gradient Based Character Segmentation Method for Video text Recognition, A
new hybrid method to detect text in natural scene, A
new instrumented approach for translating American Sign Language into sound and text, A
New Language-Independent Deep CNN for Scene text Detection and Style Transfer in Social Media Images, A
New Method for Arabic text Detection in Natural Scene Image Based on the Color Homogeneity, A
New Method for Arabic text Detection in Natural Scene Images, A
new method for detection and prediction of occluded text in natural scene images, A
New Method for Handwritten Scene text Detection in Video, A
new method for multi-oriented graphics-scene-3D text classification in video, A
New Method for text Verification Based on Random Forests, A
New Method for text-Line Segmentation for Warped Documents, A
New Method for Word Segmentation from Arbitrarily-Oriented Video text Lines, A
New Method for Writer Identification and Verification Based on Farsi/Arabic Handwritten texts, A
new multi-modal approach to bib number/text detection and recognition in Marathon images, A
New Nearest Neighbor Rule for text Categorization, A
new robust algorithm for video text extraction, A
new scheme for unconstrained handwritten text-line segmentation, A
new segmentation technique for omnifont Farsi text, A
new segmentation technique of Arabic text, A
New Smoothing Method for Lexicon-Based Handwritten text Keyword Spotting, A
New Strategy for Reducing Errors in Scene text Detection, A
new structural technique for recognizing printed Arabic text, A
New Symmetry Based on Proximity of Wavelet-Moments for text Frame Classification in Video, A
New Technique for Multi-Oriented Scene text Line Detection and Tracking in Video, A
New text Extraction Method Incorporating Local Information, A
New text-Line Alignment Approach Based on Piece-Wise Painting Algorithm for Handwritten Documents, A
New Type of Feature: Loose N-Gram Feature in text Categorization, A
new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2), A
New Video Images text Localization Approach Based on a Fast Hough Transform, A
New Wavelet and Color Features for text Detection in Video
new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video text lines, A
Newmove: Customizing text-to-video Models with Novel Motions
News2meme: An Automatic Content Generator from News Based on Word Subspaces from text and Image
NIVeL: Neural Implicit Vector Layers for text-to-Vector Generation
Noise Diffusion for Enhancing Semantic Faithfulness in text-to-Image Synthesis
Noise-aware Learning from Web-crawled Image-text Data for Image Captioning
NoiseCollage: A Layout-Aware text-to-Image Diffusion Model Based on Noise Cropping and Merging
Noisy text Categorization
Noisy-Aware Unsupervised Domain Adaptation for Scene text Recognition
Noisy-Correspondence Learning for text-to-Image Person Re-Identification
Non-Local text Image Reconstruction
Non-negative Sparse Semantic Coding for text categorization
non-stationary density model to separate overlapped texts in degraded documents, A
Non-Uniform Slant Correction for Handwritten text Line Recognition
Not Just text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Not Only text: Exploring Compositionality of Visual Representations in Vision-Language Models
Novel Algorithm for text Detection and Localization in Natural Scene Images, A
novel automated depression detection technique using text transcript, A
novel binarization approach for text in images, A
Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to text and Speech, A
Novel Data Representation for text Extraction from Multispectral Historical Document Images
novel domain independent scene text localizer, A
Novel Edge Features for text Frame Classification in Video
Novel Fuzzy Logic-Based text Classification Method for Tracking Rare Events on Twitter, A
Novel Illumination-Balance Technique for Improving the Quality of Degraded text-Photo Images, A
Novel Integrated Framework for Learning both text Detection and Recognition, A
Novel Method for Embedded text Segmentation Based on Stroke and Color, A
novel method for straightening curved text-lines in stylistic documents, A
novel method of text line segmentation for historical document image of the uchen Tibetan, A
Novel Multi-oriented Chinese text Extraction Approach from Videos, A
novel mutual nearest neighbor based symmetry for text frame classification in video, A
novel scene text detection algorithm based on convolutional neural network, A
Novel Sub-character HMM Models for Arabic text Recognition
Novel System for Robust text Location and Recognition of Book Covers, A
Novel text Detection System Based on Character and Link Energies, A
novel text structure feature extractor for Chinese scene text detection and recognition, A
novel text structure feature extractor for Chinese scene text detection and recognition, A
Novel text-Independent Speaker Verification System Using Ant Colony Optimization Algorithm, A
novel triangulation procedure for thinning hand-written text, A
novel two-stage algorithm for baseline estimation and correction in Farsi and Arabic handwritten text line, A
Novel Visual Representation on text Using Diverse Conditional GAN for Visual Recognition, A
Novice and Expert Performance of KeyScretch: A Gesture-Based text Entry Method for Touch-Screens
NTIRE 2025 challenge on text to Image Generation Model Quality Assessment
Null-text Inversion for Editing Real Images using Guided Diffusion Models
OASIS: Object-guided Attention for text-conditional Diffusion Synthesis of Human Interaction Sequences
Object proposals for text extraction in the wild
Object Reading: text Recognition for Object Recognition
Object-aware Query Perturbation for Cross-modal Image-text Retrieval
Object-conditioned Energy-based Attention Map Alignment in text-to-image Diffusion Models
Object-Driven text-To-Image Synthesis via Adversarial Training
Object-level semantic alignment for enhancing fidelity in text-to-image generation with diffusion models
Objective Distortion Measure for Binary text Image Based on Edge Line Segment Similarity
Objective Function Design for MCE-Based Combination of On-line and Off-line Character Recognizers for On-line Handwritten Japanese text Recognition
Occluded text Detection and Recognition in the Wild
Occlusion-Aware text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
OCR and Voting Shell Fulfilling Specific text Analysis Requirements
OCR of Printed Telugu text with High Recognition Accuracies
OCR Pipeline and Semantic text Analysis for Comics, An
OCR-VQGAN: Taming text-within-Image Generation
OCRSpell: An Interactive Spelling Correction System for OCR Errors in text
ODM: A text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
ODM: A text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Off-line Chinese Writer Retrieval System Based on text-sensitive Writer Identification, An
Offline arabic handwritten text recognition: A Survey
Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking
Offline Recognition of Large Vocabulary Cursive Handwritten text
Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK)
Offline Recognition of Unconstrained Handwritten texts Using HMMs and Statistical Language Models
Offline text-independent writer identification using codebook and efficient code extraction methods
Old fashion text-based image retrieval using FCA
Omnifont recognition of text using topological recognition techniques
OMNIPARSER: A Unified Framework for text Spotting, Key Information Extraction and Table Recognition
On appearance-based feature extraction methods for writer-independent handwritten text recognition
On Calibration of Scene-text Recognition Models
On Combining Multiple Segmentations in Scene text Recognition
On Manipulating Scene text in the Wild with Diffusion Models
On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model
On partitioning a dictionary for visual text recognition
On Recognizing texts of Arbitrary Shapes with 2D Self-Attention
On the Behavior of Contrastive Regularization in Improving Chinese text Recognizer
On the Detection of Images Generated from text
On the discriminability of keystroke feature vectors used in fixed text keystroke authentication
On the Evaluation of Handwritten text Line Detection Algorithms
On the General Value of Evidence, and Bilingual Scene-text Visual Question Answering
On the Generalization of Handwritten text Recognition Models
On the influence of vocabulary size and language models in unconstrained handwritten text recognition
On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten text Recognition
On the Processing of Fuzzy Patterns for text Independent Phonetic Speech Segmentation
On the Scalability of Diffusion-based text-to-Image Generation
On the Segmentation of text in Videos
On the use of Bernoulli mixture models for text classification
On the use of duration-corrected N-best hypotheses for text recognition in gray-scale document images
On Vocabulary Reliance in Scene text Recognition
On-Device text Image Super Resolution
On-Line Handwritten Japanese text Recognition Free from Constrains on Line Direction and Character Orientation
On-line Handwritten Japanese text Recognition System Free from Line Direction and Character Orientation Constraints, An
On-Line Handwritten text Line Detection Using Dynamic Programming
On-line Handwritten text Search Method Based on Directional Feature Matching, An
On-line recognition of handwritten Renqun shorthand for fast mobile Chinese text entry
On-line Writing-box-free Recognition of Handwritten Japanese text Considering Character Size Variations
ONE-DM: One-shot Diffusion Mimicker for Handwritten text Generation
One-shot Compositional Data Generation for Low Resource Handwritten text Recognition
One-Shot Doc Snippet Detection: Powering Search in Document Beyond text
One-Step Diffusion for Real-World Image Super-Resolution via Degradation Removal and text Prompts
One-Way Ticket: Time-Independent Unified Encoder for Distilling text-to-Image Diffusion Models
Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection
Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection
Online text-Independent Writer Identification Based on Stroke's Probability Distribution Function
Online text-independent Writer Identification Based on Temporal Sequence and Shape Codes
Ontology-Based text Mining Method to Develop D-Matrix From Unstructured Text, An
Ontology-Based text Mining Method to Develop D-Matrix From Unstructured Text, An
Opaque Document Imaging: Building Images of Inaccessible texts
Open set classification of untranscribed handwritten text image documents
Open-Set text Recognition via Character-Context Decoupling
Open-Vocabulary 3D Semantic Segmentation with text-to-Image Diffusion Models
Open-Vocabulary Panoptic Segmentation with text-to-Image Diffusion Models
Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models
Open-Vocabulary text-Driven Human Image Generation
OpenBias: Open-Set Bias Detection in text-to-Image Generative Models
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-text Generation
Opinion mining from noisy text data
OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene text Detection
Optical character correction of large-curvature annular sector text in polar coordinate system
Optical flow based dynamic curved video text detection
Optical modelling and language modelling trade-off for Handwritten text Recognition
Optimal Boxes: Boosting End-to-End Scene text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Optimal Classification Model for text Detection and Recognition in Video Frames
Optimal text/Background Color Combination of LED Information Boards for Visibility Improvement Based on Psychological Measurements, An
Optimal word order for non-causal text generation with Large Language Models: The Spanish case
Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs
Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition
Orientation and Scale Invariant text Region Extraction in WWW Images
Orientation Robust text Line Detection in Natural Images
OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page text Recognition by learning to unfold
Oscillating Feature Subset Search Algorithm for text Categorization
OST: Refining text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
OTE: Exploring Accurate Scene text Recognition Using One Token
Out of vocabulary word detection and recovery in Arabic handwritten text recognition
Outline Generation Transformer for Bilingual Scene text Recognition
Overview of text-Based Person Search: Recent Advances and Future Directions, An
P-CLIP: Progressive Discrepancy Learning for One-Shot text-to-Image Person Re-Identification
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese text Recognition
Paint-it: text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Pair-Copula Based Scheme for text Extraction from Digital Images, A
PairAug: What Can Augmented Image-text Pairs Do for Radiology?
Pairwise optimized Rocchio algorithm for text categorization
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped text
Pangu-draw: Advancing Resource-efficient text-to-image Synthesis with Time-decoupled Training and Reusable Coop-diffusion
PanoDreamer: Consistent text to 360-Degree Scene Generation
Paragraph text segmentation into lines with Recurrent Neural Networks
Parameter efficient finetuning of text-to-image models with trainable self-attention layer
Parametric Spectral-Based Method for Verification of text in Videos, A
Parco: Part-coordinating text-to-motion Synthesis
Parrot Captions Teach CLIP to Spot text
Parrot: Pareto-optimal Multi-reward Reinforcement Learning Framework for text-to-image Generation
Part-based method on handwritten texts
Partial Scene text Retrieval
Parts2Words: Learning Joint Embedding of Point Clouds and texts by Bidirectional Matching Between Parts and Words
PathLDM: text conditioned Latent Diffusion Model for Histopathology
Pay attention to what you read: Non-recurrent handwritten text-Line recognition
Pea-diffusion: Parameter-efficient Adapter with Knowledge Distillation in Non-english text-to-image Generation
Pen Acoustic Emissions for text and Gesture Recognition
Perceptive Vision for Headline Localisation in Bangla Handwritten text Recognition
Performance Analysis of text Halftone Modulation
Performance Evaluation of text Detection and Tracking in Video
Person Identification Using text and Image Data
Person Search by text Attribute Query As Zero-Shot Learning
PersonaBooth: Personalized text-to-Motion Generation
Personalised video summarisation using video-text multi-modal fusion
Personalized Residuals for Concept-Driven text-to-Image Generation
Personalized text snippet extraction using statistical language models
Perspective Scene text Recognition with Feature Compression and Ranking
PETR: Rethinking the Capability of Transformer-Based Language Model in Scene text Recognition
PFAN++: Bi-Directional Image-text Retrieval With Position Focused Attention Network
Phenology description is all you need! mapping unknown crop types with remote sensing time-series and LLM generated text alignment
Photographic text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network
PhotoOCR: Reading text in Uncontrolled Conditions
PhyS-EdiT: Physics-aware Semantic Image Editing with text Description
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded text-to-Video Generation
PI3D: Efficient text-to-3D Generation with Pseudo-Image Diffusion
PIA: Your Personalized Image Animator via Plug-and-Play Modules in text-to-Image Models
Picture and text Query and Archiving System, A
Picture is Worth More Than 77 text Tokens: Evaluating CLIP-Style Models on Dense Captions, A
PIDRo: Parallel Isomeric Attention with Dynamic Routing for text-Video Retrieval
Piece-wise linearity based method for text frame classification in video
Pitch Based Segmentation and Recognition of Dot-Matrix text
Pitman Shorthand inspired model for plain text compression
Pixart-sigma: Weak-to-strong Training of Diffusion Transformer for 4k text-to-image Generation
Pixel-Based Evaluation Method for text Detection in Color Images, A
PixtextGAN: structure aware text image synthesis for license plate recognition
Plan, Posture and Go: Towards Open-vocabulary text-to-motion Generation
Platypus: A Generalized Specialist Model for Reading text in Various Forms
Plda-based system for text-prompted password speaker verification
Plot: text-based Person Search with Part Slot Attention for Corresponding Part Discovery
Plug-and-Play Diffusion Features for text-Driven Image-to-Image Translation
Plug-and-Play Interpretable Responsible text-to-Image Generation via Dual-Space Multi-facet Concept Control
Plug-and-Play Regulators for Image-text Matching
Plugnet: Degradation Aware Scene text Recognition Supervised by a Pluggable Super-resolution Unit
PMMN: Pre-Trained Multi-Modal Network for Scene text Recognition
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved text-to-Image Diffusion
PointCloud-text Matching: Benchmark Dataset and Baseline
PolygloNet: Multilingual Approach for Scene text Recognition Without Language Constraints
Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms
Polygon-Free: Unconstrained Scene text Detection with Box Annotations
pooling based scene text proposal technique for scene text reading in the wild, A
pooling based scene text proposal technique for scene text reading in the wild, A
Portable and fast text detection
Portmanteauing Features for Scene text Recognition
Position-Guided text Prompt for Vision-Language Pre-Training
Post-training Quantization with Progressive Calibration and Activation Relaxing for text-to-image Diffusion Models
PosterMaker: Towards High-Quality Product Poster Generation with Accurate text Rendering
Powerful and Flexible: Personalized text-to-image Generation via Reinforcement Learning
PQPP: A Joint Benchmark for text-to-Image Prompt and Query Performance Prediction
PR-CLIP: Cross-Modal Positional Reconstruction for Remote Sensing Image-text Retrieval
Pre-Training a Graph Recurrent Network for text Understanding
PreciseCam: Precise Camera Control for text-to-Image Generation
Precisecontrol: Enhancing text-to-image Diffusion Models with Fine-grained Attribute Control
Predicated Diffusion: Predicate Logic-Based Attention Guidance for text-to-Image Diffusion Models
Predict, Prevent, and Evaluate: Disentangled text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization
Predicting Emotional Responses to Long Informal text
Predicting Motivations of Actions by Leveraging text
Predicting Visual Features From text for Image and Video Caption Retrieval
PRESENT: Zero-Shot text-to-Prosody Control
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Preserving privacy without compromising accuracy: Machine unlearning for handwritten text recognition
PreSTU: Pre-Training for Scene-text Understanding
Primitive Representation Learning for Scene text Recognition
Printed text Discrimination
Printed text Featuring Using the Visual Criteria of Legibility and Complexity
Printed text segmentation using distance transform
Prior knowledge guided text to image generation
Prior Preserved text-to-Image Personalization Without Image Regularization
Probabilistic Hierarchical Clustering Method for Organising Collections of text Documents, A
Probabilistic Kernels for Improved text-to-Speech Alignment in Long Audio Tracks
probabilistic model derived term weighting scheme for text classification, A
Processing of Binary Images of Handwritten text Documents
Processing of Off-Line Handwritten text: Polygonal-Approximation and Enforcement of Temporal Information
Progressive Contour Regression for Arbitrary-Shape Scene text Detection
Progressive Feature Mining and External Knowledge-Assisted text-Pedestrian Image Retrieval
Progressive Human Motion Generation Based on text and Few Motion Frames
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant text-to-Mesh Generation without 3D Data
Progressive scene text erasing with self-supervision
Progressive Spatio-Temporal Prototype Matching for text-Video Retrieval
Progressive text-Semantic-Aware Generative Adversarial Network for Image Fusion
Progressive text-to-Face Synthesis with Generative Adversarial Network
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward text-to-3D Scene Generation
Prompt Augmentation for Self-supervised text-guided Image Manipulation
Prompt Switch: Efficient CLIP Adaptation for text-Video Retrieval
Prompt Tuning Inversion for text-Driven Image Editing Using Diffusion Models
Prompt-Free Diffusion: Taking text Out of Text-to-Image Diffusion Models
Prompt-Free Diffusion: Taking text Out of Text-to-Image Diffusion Models
Prompt2Perturb (P2P): text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
PromptAD: Zero-shot Anomaly Detection using text Prompts
Prompting Hard or Hardly Prompting: Prompt Inversion for text-to-Image Diffusion Models
Proposal for a text-Indicated Writer Verification Method, A
Proposal of the hybrid spectral gradient method to extract character-text regions from general scene images
Protip: Probabilistic Robustness Verification on text-to-image Diffusion Models Against Stochastic Perturbation
Prototype-guided text-based person search on rich Chinese descriptions
Psg-adapter: Controllable Planning Scene Graph for Improving text-to-image Diffusion
Pull Pole Points to text Contour by Magnetism: A Real-Time Scene Text Detector
Pull Pole Points to text Contour by Magnetism: A Real-Time Scene Text Detector
Pure Transformer with Integrated Experts for Scene text Recognition
Push the limit of scene text recognition using character and text length guided text super-resolution
Push the limit of scene text recognition using character and text length guided text super-resolution
Push the limit of scene text recognition using character and text length guided text super-resolution
Pushing the Performance Limit of Scene text Recognizer without Human Annotation
PYRAD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline text Recognition Systems
Pyrboxes: An efficient multi-scale scene text detector with feature pyramids
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for text-to-Vision Content
quad tree based method for blurred and non-blurred video text frames classification through quality metrics, A
Quadrilateral Scene text Detector with Two-Stage Network Architecture, A
Quality Assessment for text-to-Image Generation: A Survey
Quality inspection of printed texts
Quality-related English text classification based on recurrent neural network
QWERTY- and 8pen- Based Touchless text Input with Hand Movement
R-Net: A Relationship Network for Efficient and Accurate Scene text Detection
R.A.C.E.: Robust Adversarial Concept Erasure for Secure text-to-image Diffusion Model
R2CNN: Rotational Region CNN for Arbitrarily-Oriented Scene text Detection
Rail Transit Line-Sign text Detection With Patch-Based Region Proposal Network
Random Subspace Method in text Categorization
Ranni: Taming text-to-Image Diffusion for Accurate Instruction Following
Rapid Evaluation of the Handwriting Performance for Gesture Based text Input
Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization
Re-ranking image-text matching by adaptive metric fusion
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene text Recognition
Read text from Signs in General Scenes
Reading Arbitrary-shaped Scene text from Images Through Spline Regression and Rectification
Reading Newspaper text
Reading text in the Wild from Compressed Images
Reading text in the Wild with Convolutional Neural Networks
Reading-Life Log: Technologies to Recognize texts That We Read, The
Reading-Strategy Inspired Visual Representation Learning for text-to-Video Retrieval
ReADS: A Rectified Attentional Double Supervised Network for Scene text Recognition
Real time image enhancement and segmentation for sign/text detection
Real time image enhancement for both text and color photo images
Real-Time Lexicon-Free Scene text Localization and Recognition
Real-time Lexicon-free Scene text Retrieval
Real-Time Scene text Detection Based on Stroke Model
Real-Time Scene text Detection With Differentiable Binarization and Adaptive Scale Fusion
Real-time scene text localization and recognition
Real-Time Scene text to Speech System, A
Real-Time text Detection With Similar Mask in Traffic, Industrial, and Natural Scenes
Real-Time text Steganalysis Based on Multi-Stage Transfer Learning
Real-time text tracking in natural scenes
real-time text-independent speaker identification system, A
Real-Time Visual Analytics for text Streams
RealCustom: Narrowing Real text Word for Real-Time Open-Domain Text-to-Image Customization
RealCustom: Narrowing Real text Word for Real-Time Open-Domain Text-to-Image Customization
RealDTT: Towards A Comprehensive Real-World Dataset for Tampered text Detection
RealmDreamer: text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Realtime multi-scale scene text detection with scale-based region proposal network
Reasoning elicitation and multi-granularity contrastive learning for text-rich image understanding in large vision-language models
Receler: Reliable Concept Erasing of text-to-image Diffusion Models via Lightweight Erasers
Recipe for Scaling up text-to-Video Generation with Text-free Videos, A
Recipe for Scaling up text-to-Video Generation with Text-free Videos, A
Recipe2Video: Synthesizing Personalized Videos from Recipe texts
ReCo: Region-Controlled text-to-Image Generation
Recognising text in Real Scenes
Recognition based text localization from natural scene images
Recognition of Apparent Personality Traits from text and Handwritten Images
Recognition of Arabic Machine-Printed Cursive text
Recognition of Bangla text from scene images through perspective correction
Recognition of cursive video text using a deep learning framework
Recognition of Hand-Written Archive text Documents
Recognition of Handwritten Chinese text by Segmentation: A Segment-Annotation-Free Approach
Recognition of Indian multi-oriented and curved text
Recognition of Multi-oriented, Multi-sized, and Curved text
Recognition of Noise Polyfont Printed text Using Combined HMMS, The
Recognition of Pornographic Web Pages by Classifying texts and Images
Recognition of printed arabic text based on global features and decision tree learning techniques
Recognition of Printed Arabic text Using Neural Networks
Recognition of printed Devanagari text using BLSTM Neural Network
Recognition of Printed text under Realistic Conditions
Recognition of Screen-Rendered text
Recognition of Video text through Temporal Integration
Recognition-Based Segmentation of Nom Characters from Body text Regions of Stele Images Using Area Voronoi Diagram
Recognition-Synergistic Scene text Editing
Recognize text in General Scenes
Recognizing Chinese texts with 3D Convolutional Neural Network
Recognizing irregular entities in biomedical text via deep neural networks
Recognizing Multiple text Sequences from an Image by Pure End-to-End Learning
Recognizing perspective scene text with context feature
Recognizing semantic correlation in image-text Weibo via feature space mapping
Recognizing text Elements for SVG Comic Compression and Its Novel Applications
Recognizing text in historical maps using maps from multiple time periods
Recognizing text in raster maps
Recognizing text with a CNN
Recognizing text with Perspective Distortion in Natural Scenes
Recognizing text-Based Traffic Guide Panels with Cascaded Localization Network
Recognizing text-Based Traffic Signs
Recon: Training-free Acceleration for text-to-image Synthesis with Retrieval of Concept Prompt Trajectories
Reconsidering Tourism Destination Images by Exploring Similarities between Travelogue texts and Photographs
Rectification and recognition of text in 3-D scenes
Rectifying Perspective Views of text in 3D Scenes Using Vanishing Points
Recurrent Affine Transformation for text-to-Image Synthesis
Recurrent Global Convolutional Network for Scene text Detection
Recurrent Highway Networks with Attention Mechanism for Scene text Recognition
Redefining the DCT-based feature for scene text detection: Analysis and comparison of spatial frequency-based features
Redif Extraction in Handwritten Ottoman Literary texts
Reduced annotation based on deep active learning for arabic text detection in natural scene images
Reference-Aware Adaptive Network for Image-text Matching
Referring Image Segmentation Using text Supervision
Refine, Control and Distill: A text-to-Image Framework for Faithful Image Generation
Refining text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
Region Reinforcement Network With Topic Constraint for Image-text Matching
Region-Aware Arbitrary-Shaped text Detection With Progressive Fusion
Region-Based Discriminative Feature Pooling for Scene text Recognition
Regularizing Visual Semantic Embedding With Contrastive Learning for Image-text Matching
Reinforcement Shrink-Mask for text Detection
Rejection Strategies for Offline Handwritten text Line Recognition
ReLatext: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks
Relation Graph Reasoning for Image-text Matching
Relation Mining and Visualization Framework for Automated text Summarization, A
Relation-Guided Network for Image-text Retrieval
Reliable and Efficient Concept Erasure of text-to-image Diffusion Models
Reliable Phrase Feature Mining for Hierarchical Video-text Retrieval
Remote Sensing Cross-Modal text-Image Retrieval Based on Attention Correction and Filtering
Remote Sensing Image Augmentation Based on text Description for Waterside Change Detection
Remote Sensing Image Generation via Object text Decoupling
Removing Distributional Discrepancies in Captions Improves Image-text Alignment
RenAIssance: A Survey Into AI text-to-Image Generation in the Era of Large Model
Report from the AND 2009 working group on noisy text datasets
Representation and Recognition of text Using Hidden Markov Models, The
Representation learning for very short texts using weighted word embedding aggregation
Representation transfer and data cleaning in multi-views for text simplification
Residual Dual Scale Scene text Spotting by Fusing Bottom-Up and Top-Down Processing
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-text Data Streams
ReStGAN: A step towards visually guided shopper experience via text-to-image synthesis
Retaining Knowledge and Enhancing Long-text Representations in CLIP through Dual-Teacher Distillation
Rethinking Diffusion for text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
Rethinking Noisy Video-text Retrieval via Relation-aware Alignment
Rethinking text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
Rethinking text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
Rethinking Training for De-biasing text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Rethinking Video-text Understanding: Retrieval from Counterfactually Augmented Data
Retrieval Methods for English-text with Misrecognized OCR Characters
Retrieval Strategies for Noisy text
Revealing Directions for text-Guided 3D Face Editing
Review of Cross-Modal Image-text Retrieval in Remote Sensing, A
Review of Segmentation and Contextual Analysis Techniques for Text Recognition, A
Revisiting Scene text Recognition: A Data Perspective
RIATIG: Reliable and Imperceptible Adversarial text-to-Image Generation with Natural Prompts
Rich Human Feedback for text-to-Image Generation
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in text-to-3D
Rickrolling the Artist: Injecting Backdoors into text Encoders for Text-to-Image Synthesis
Rickrolling the Artist: Injecting Backdoors into text Encoders for Text-to-Image Synthesis
RiFeGAN2: Rich Feature Generation for text-to-Image Synthesis From Constrained Prior Knowledge
RiFeGAN: Rich Feature Generation for text-to-Image Synthesis From Prior Knowledge
RLita: A Region-Level Image-text Alignment Method for Remote Sensing Foundation Model
RLST: A Reinforcement Learning Approach to Scene text Detection Refinement
RMGNet: The Progressive Relationship-Mining Graph Neural Network for text-to-Image Person Re-Identification
robust algorithm for text detection in color images, A
Robust Algorithm for text String Separation from Mixed Text/Graphics Images, A
Robust Algorithm for text String Separation from Mixed Text/Graphics Images, A
Robust and Accurate text Stroke Segmentation
robust and multiscale document image segmentation for block line/text line structures extraction, A
Robust and Non-Negative Collective Matrix Factorization for text-to-Image Transfer Learning
Robust and parallel Uyghur text localization in complex background images
Robust and Secure Data Hiding for PDF text Document
robust approach for recognition of text embedded in natural scenes, A
robust approach for text detection from natural scene images, A
Robust Approach to Extraction of texts from Camera Captured Images, A
robust approach to text line grouping in online handwritten Japanese documents, A
Robust Binarization for Video text Recognition
Robust Color-Independent text Detection Method from Complex Videos, A
Robust detection of stylized text events in digital video
Robust Disaster Assessment from Aerial Imagery Using text-to-Image Synthetic Data
Robust Extraction of text from Camera Images
Robust Extraction of text in Video
Robust Hashing With Bilinear Drift for Image-text Retrieval
robust hybrid approach for text line segmentation in historical documents, A
Robust Lexicon-Free Confidence Prediction for text Recognition
Robust Local Scoring Function for text-Independent Speaker Verification
Robust Model for On-Line Handwritten Japanese text Recognition, A
Robust outdoor text detection using text intensity and shape features
Robust outdoor text detection using text intensity and shape features
Robust Scene text Detection for Multi-script Languages Using Deep Learning
Robust Scene text Detection for Partially Annotated Training Data
Robust scene text detection using integrated feature discrimination
Robust Scene text Detection with Convolution Neural Network Induced MSER Trees
Robust Scene text Detection with Deep Feature Pyramid Network and CNN based NMS Model
Robust Scene text Recognition with Automatic Rectification
Robust scene text understanding with OCR token and word alignment for Text-VQA and text-caption
Robust scene text understanding with OCR token and word alignment for Text-VQA and text-caption
Robust scene text understanding with OCR token and word alignment for Text-VQA and text-caption
Robust seed-based stroke width transform for text detection in natural images
Robust Segmentation Technique for Line, Word and Character Extraction from Kannada text in Low Resolution Display Board Images, A
Robust skew detection in mixed text/graphics documents
Robust Split-and-Merge text Segmentation Approach for Images, A
Robust stereo correspondence for documents by matching connected components of text-lines with dynamic programming
Robust stereo matching for document images using parameter selection of text-line extraction
robust system for text extraction in video, A
Robust System For Thresholding And Skew Detection In Mixed text/graphics Documents, A
robust technique for text extraction in mixed-type binary documents, A
Robust text detection from binarized document images
Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions
Robust text Detection in Natural Scene Images
Robust text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks
Robust text Detection with Vertically-Regressed Proposal Network
Robust text Image Recognition via Adversarial Sequence-to-Sequence Domain Adaptation
Robust text Line Segmentation for Historical Manuscript Images Using Color and Texture
Robust text Segmentation in Low Quality Images via Adaptive Stroke Width Estimation and Stroke Based Superpixel Grouping
Robust text segmentation using graph cut
Robust text watermarking based on average skeleton mass of characters against cross-media attacks
Robust Two Level Classification Algorithm for text Localization in Documents, A
Robust video text segmentation and recognition with multiple hypotheses
Robust Video-text Retrieval Via Noisy Pair Calibration
Robust Wavelet Transform Based Technique for Video text Detection, A
Robustly Recognizing Irregular Scene text by Rectifying Principle Irregularities
Robustscanner: Dynamically Enhancing Positional Clues for Robust text Recognition
Rolling bilateral filter-based text image deblurring
Rotation and script independent text detection from video frames using sub pixel mapping
Rotation-Sensitive Regression for Oriented Scene text Detection
Rough-fuzzy based scene categorization for text detection and recognition in video
RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection
RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-text Adversarial Attacks
RUArt: A Novel text-Centered Solution for Text-Based Visual Question Answering
RUArt: A Novel text-Centered Solution for Text-Based Visual Question Answering
Rule Based Contextual Post-Processing for Devanagari Text Recognition
RVMamba: Selective text-Vision Mamba for Referring Video Object Segmentation
SAC: Semantic Attention Composition for text-Conditioned Image Retrieval
SAFE: Scale Aware Feature Encoder for Scene text Recognition
Safeguard text-to-image Diffusion Models with Human Feedback Inversion
SaHAN: Scale-Aware Hierarchical Attention Network for Scene text Recognition
SALAD: Skeleton-aware Latent Diffusion for text-driven Motion Generation and Editing
Salient Guided text Detection in E-Commerce Images
Salient Object-Aware Background Generation using text-Guided Diffusion Models
SAM: Self Attention Mechanism for Scene text Recognition Based on Swin Transformer
Sample-aware Data Augmentor for Scene text Recognition
SAMWISE: Infusing Wisdom in SAM2 for text-Driven Video Segmentation
SARAT-a system for the recognition of Arabic printed text
SAST: Semantic-Aware stylized text-to-Image generation
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker text-to-Speech Systems
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and text Data
Scale and Orientation Invariant text Segmentation for Born-Digital Compound Images
Scale robust deep oriented-text detection network
Scale Up Composed Image Retrieval Learning via Modification text Generation
Scale-aware Polar Representation for Arbitrarily-shaped text Detection
Scale-Invariant Multi-Oriented text Detection in Wild Scene Image
Scale-Residual Learning Network for Scene text Detection
Scaledreamer: Scalable text-to-3d Synthesis with Asynchronous Score Distillation
Scaling Down text Encoders of Text-to-Image Diffusion Models
Scaling Down text Encoders of Text-to-Image Diffusion Models
Scaling up GANs for text-to-Image Synthesis
SCATTER: Selective Context Attentional Scene Text Recognizer
ScenarioDiff: text-to-video Generation with Dynamic Transformations of Scene Conditions
Scene Graph Driven text-Prompt Generation for Image Inpainting
Scene Retrieval for Video Summarization Based on text-to-Image GAN
Scene text Character Recognition Using Spatiality Embedded Dictionary
Scene text Deblurring Using Text-Specific Multiscale Dictionaries
Scene text Deblurring Using Text-Specific Multiscale Dictionaries
Scene text detection and recognition with advances in deep learning: A survey
Scene text Detection and Recognition: The Deep Learning Era
Scene text Detection and Segmentation Based on Cascaded Convolution Neural Networks
Scene text Detection and Tracking for a Camera-Equipped Wearable Reading Assistant for the Blind
Scene text detection based on component-level fusion and region-level verification
Scene text detection based on multi-scale SWT and edge filtering
Scene text Detection Based on Robust Stroke Width Transform and Deep Belief Network
Scene text detection based on skeleton-cut detector
Scene text Detection in Foggy Weather Utilizing Knowledge Distillation of Diffusion Models
Scene text detection method based on the hierarchical model
Scene text detection suitable for parallelizing on multi-core
Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy
Scene text detection using graph model built upon maximally stable extremal regions
Scene text detection using sequential nontext filtering
Scene text detection using sparse stroke information and MLP
Scene text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification
Scene text Detection via Connected Component Clustering and Nontext Filtering
Scene text Detection via Deep Semantic Feature Fusion and Attention-based Refinement
Scene text Detection via Integrated Discrimination of Component Appearance and Consensus
Scene text detection via stroke width
Scene text Detection with Adaptive Line Clustering
Scene text detection with extremal region based cascaded filtering
Scene text Detection with Recurrent Instance Segmentation
Scene text detection with robust character candidate extraction method
Scene text Detection with Selected Anchors
Scene text detection with superpixels and hierarchical model
Scene text Extraction and Translation for Handheld Devices
Scene text extraction based on edges and support vector regression
Scene text Extraction by Superpixel CRFs Combining Multiple Character Features
Scene text Extraction in Complex Images
Scene text extraction in natural scene images using hierarchical feature combining and verification
Scene text Extraction Using Focus of Mobile Camera
Scene text Extraction with Edge Constraint and Text Collinearity
Scene text Extraction with Edge Constraint and Text Collinearity
Scene text extraction with local symmetry transform
Scene text Identification by Leveraging Mid-level Patches and Context Information
Scene text Image Super-resolution based on Text-conditional Diffusion Models
Scene text Image Super-resolution based on Text-conditional Diffusion Models
Scene text Image Super-Resolution in the Wild
Scene text Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss
Scene text Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss
Scene text Localization and Recognition with Oriented Stroke Detection
Scene text Localization Using Gradient Local Correlation
Scene text Recognition and Retrieval for Large Lexicons
Scene text recognition by learning co-occurrence of strokes based on spatiality embedded dictionary
Scene text Recognition in Mobile Applications by Character Descriptor and Structure Configuration
Scene text Recognition Models Explainability Using Local Features
Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields
Scene text Recognition Using Co-occurrence of Histogram of Oriented Gradients
Scene text Recognition using Higher Order Language Priors
Scene text Recognition Using Part-Based Tree-Structured Character Detection
Scene text Recognition Using Progressive Rectification Network And Spelling Error Correction Language Model
Scene text recognition using residual convolutional recurrent neural network
Scene text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation
Scene text recognition using sparse coding based features
Scene text Recognition Using Structure-Guided Character Detection and Linguistic Knowledge
Scene text Recognition with a Hough Forest Implicit Shape Model
Scene text recognition with CNN classifier and WFST-based word labeling
Scene text recognition with deeper convolutional neural networks
Scene text Recognition with Permuted Autoregressive Sequence Models
Scene text Recognition with Self-supervised Contrastive Predictive Coding
Scene text Recognition: No Country for Old Men?
Scene text rectification using glyph and character alignment properties
Scene text Removal, Text Erasing
Scene text Removal, Text Erasing
Scene text Retrieval via Joint Text Detection and Similarity Learning
Scene text Retrieval via Joint Text Detection and Similarity Learning
Scene text Script Identification with Convolutional Recurrent Neural Networks
Scene text Segmentation Based on Local Image Phase Information and MSER Method
Scene text Segmentation by Paired Data Synthesis
Scene text Segmentation via Inverse Rendering
Scene text Segmentation with Multi-level Maximally Stable Extremal Regions
Scene text Telescope: Text-Focused Scene Image Super-Resolution
Scene text Telescope: Text-Focused Scene Image Super-Resolution
Scene text Visual Question Answering
Scene text, Assistance for Visually Imapired
Scene-text Oriented Referring Expression Comprehension
Scene-text Synthesis Engine Achieved Through Learning From Decomposed Real-World Data, A
Scene-text-Detection Method Robust Against Orientation and Discontiguous Components of Characters
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and text
Scientometric Full-text Analysis of Papers Published in Remote Sensing between 2009 and 2021
SciOL and MuLMS-Img: Introducing A Large-Scale Multimodal Scientific Dataset and Models for Image-text Tasks in the Scientific Domain
SCOB: Universal text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCoRD: Subject-Conditional Relation Detection with text-Augmented Data
ScrabbleGAN: Semi-Supervised Varying Length Handwritten text Generation
Screen-rendered text images recognition using a deep residual network based segmentation-free method
Scribble-Guided Diffusion for Training-Free text-to-Image Generation
Script and nature differentiation for Arabic and Latin text images
Script-Free text Line Segmentation Using Interline Space Model for Printed Document Images
Script-Independent text Line Segmentation in Freestyle Handwritten Documents
Script-independent, HMM-based text Line Finding for OCR
Sculpt3D: Multi-View Consistent text-to-3D Generation with Sparse 3D Prior
SCUT-COUCH textline_NU: An Unconstrained Online Handwritten Chinese Text Lines Dataset
SCUT-HCCDoc: A new benchmark dataset of handwritten Chinese text in unconstrained camera-captured documents
SD-Prompt: Learnable and Adaptive Prompts for Enhancing Subject-Driven text-to-Image Synthesis
Search method and apparatus for locating digitally stored content, such as visual images, music and sounds, text, or software, in storage devices on a computer network
Searching a High Performance Feature Extractor for text Recognition Network
Searching OCR'ed text: An LDA Based Approach
Searching through a Speech Memory for text-Independent Speaker Verification
See Finer, See More: Implicit Modality Alignment for text-based Person Retrieval
See-Through-text Grouping for Referring Image Segmentation
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene text Recognition
Seek Common Ground While Reserving Differences: Semi-Supervised Image-text Sentiment Recognition
SeeTek: Very Large-Scale Open-set Logo Recognition with text-Aware Metric Learning
SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural text-to-Speech
SegLink++: Detecting Dense and Arbitrary-shaped Scene text by Instance-aware Component Grouping
Segmentation and Classification of Mixed text/Graphics/Image Documents
Segmentation and Recognition of Continuous Handwriting Chinese text
Segmentation and Recognition of Dimensioning text from Engineering Drawings
Segmentation and Word Spotting Methods for Printed and Handwritten Arabic texts: A Comparative Study
Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese text Recognition, A
Segmentation of Bangla unconstrained handwritten text
Segmentation of On-Line Freely Written Japanese text Using SVM for Improving Text Recognition
Segmentation of On-Line Freely Written Japanese text Using SVM for Improving Text Recognition
Segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition
Segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition
Segmentation of On-Line Handwritten Japanese text Using SVM for Improving Text Recognition
Segmentation of On-Line Handwritten Japanese text Using SVM for Improving Text Recognition
Segmentation of stick text based on sub connected area analysis
Segmentation of text and graphics
Segmentation of text and Graphics from Document Images
Segmentation of text From Color Map Images
Segmentation of text, picture and lines of a document image
Segmentation of text/image documents using texture approaches
Segmentation of Uniform Colored text from Color Graphics Background
Segmentation of Very Low Resolution Screen-Rendered text
Segmentation-Aware text-Guided Image Manipulation
Segmentation-Free Approach to text Recognition Recognition with Application to Arabic Text, A
Segmentation-Free Approach to text Recognition Recognition with Application to Arabic Text, A
Segmentation-Free Guidance for text-to-Image Diffusion Models
Segmentation-free handwritten Chinese text recognition with LSTM-RNN
Segmented handwritten text recognition with recurrent neural network classifiers
Segmenting a page of a document into areas which are text and areas which are halftone
Segmenting Messy text: Detecting Boundaries in Text Derived from Historical Newspaper Images
Segmenting Messy text: Detecting Boundaries in Text Derived from Historical Newspaper Images
Segmenting text Images with Massively Parallel Machines
Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-text Matching
Selectively Informative Description can Reduce Undesired Embedding Entanglements in text-to-Image Personalization
Self-Adaptive Image-text Fusion for Medical Image Classification
Self-attention based text Knowledge Mining for Text Detection
Self-attention based text Knowledge Mining for Text Detection
Self-Cross Diffusion Guidance for text-to-Image Synthesis of Similar Subjects
Self-Discovering Interpretable Diffusion Latent Directions for Responsible text-to-Image Generation
Self-learning structure for text localization
Self-Organized text Detection with Minimal Post-processing via Border Learning
Self-paced Learning to Improve text Row Detection in Historical Documents with Missing Labels
Self-supervised adaptation for on-line script text recognition
Self-supervised Character-to-Character Distillation for text Recognition
Self-supervised deep reconstruction of mixed strip-shredded text documents
Self-Supervised Discovery of Cross-Lingual Shared Knowledge for Continual text Recognition
Self-Supervised Implicit Glyph Attention for text Recognition
Self-Supervised Learning for text Recognition: A Critical Survey
Self-Supervised Learning of Visual Features through Embedding Images into text Topic Spaces
Self-supervised writer adaptation using perceptive concepts: application to on-line text recognition
Self-Training for Domain Adaptive Scene text Detection
Self-training for Handwritten text Line Recognition
SEM-CS: Semantic Clipstyler for text-Based Image Style Transfer
SEMACOL: Semantic-enhanced multi-scale approach for text-guided grayscale image colorization
Semantic and Morphological Information Guided Chinese text Classification
Semantic Controllable Long text Steganography Framework Based on LLM Prompt Engineering and Knowledge Graph, A
Semantic Correlation Mining between Images and texts with Global Semantics and Local Mapping
Semantic Distance Adversarial Learning for text-to-Image Synthesis
Semantic Indexing of Multimedia Content Using Visual, Audio, and text Cues
Semantic Integration of Information Through Relation Mining: Application to Bio-medical text Processing
Semantic keyword extraction via adaptive text binarization of unstructured unsourced video
Semantic Object Accuracy for Generative text-to-Image Synthesis
Semantic Oriented text Clustering Based on RDF
Semantic Proximity Based System of Arabic text Indexation, A
Semantic Role Aware Correlation Transformer for text To Video Retrieval
Semantic role-based representations in text classification
Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
Semantic text Summarization of Long Videos
Semantic-Aware Video text Detection
Semantic-Compensated and Attention-Guided Network for Scene text Detection
Semantic-Preserving Metric Learning for Video-text Retrieval
Semantic-Spatial Attention for Refined Object Placement in text-to-Image Synthesis
Semantically Consistent Hierarchical text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network
Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network
Semantically Invariant text-to-Image Generation
Semantics Disentangling for text-To-Image Generation
Semantics-Enhanced Adversarial Nets for text-to-Image Synthesis
Semi-automatic news video annotation framework for Arabic text
Semi-Incremental Recognition Method for On-Line Handwritten Japanese text, A
Semi-Incremental Recognition of On-Line Handwritten Japanese text
Semi-supervised learning for text-line detection
Semi-supervised network embedding with text information
Semi-Supervised Pixel-Level Scene text Segmentation by Mutually Guided Network
Semi-Supervised Scene text Recognition
Semi-Supervised text Classification With Universum Learning
Semi-Supervised text Detection With Accurate Pseudo-Labels
Semi-Supervised text-Based Person Search
Semiautomatic Ground Truth Generation for text Detection and Recognition in Video Images
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end text Spotting
SemStyle: Learning to Generate Stylised Image Captions Using Unaligned text
Sense discovery via co-clustering on images and text
Sentence level text classification in the Kannada language: A classifier's perspective
Sentiment analysis based on text information enhancement and multimodal feature fusion
Sentiment Similarity-oriented Attention Model with Multi-task Learning for text-based Emotion Recognition, A
Separate Images and Graphics from text
Separate, Locate, and Align: Determine Context Relation of Scene Text From Multiple Perspectives in TextVQA
Separating Content from Style Using Adversarial Learning for Recognizing text in the Wild
Separating handwritten material from machine printed text using hidden Markov models
Separating Handwritten text from Non-Textual Interference
Separating Lines of text in Free-Form Handwritten Historical Documents
Separating text and background in degraded document images: A comparison of global thresholding techniques for multi-stage thresholding
Separation of overlapping text from graphics
Separation of touching and overlapping words in adjacent lines of handwritten text
Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised text Recognition
Seq2seq-based Model with Global Semantic Context for Scene Text Recognition, A
Sequence as a Whole: A Unified Framework for Video Action Localization With Long-Range text Query
Sequence to Sequence -- Video to text
Sequence-to-Sequence Contrastive Learning for text Recognition
Sequence-To-Sequence Domain Adaptation Network for Robust text Image Recognition
Sequential alignment attention model for scene text recognition
Sequential Deformation for Accurate Scene text Detection
Sequential Monte Carlo video text segmentation
Sequential texts Driven Cohesive Motions Synthesis with Natural Transitions
Sequential Transformer for End-to-End Video text Detection
Sequential visual and semantic consistency for semi-supervised text recognition
set of benchmarks for Handwritten text Recognition on historical documents, A
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene text Recognition
SGDM: An Adaptive Style-Guided Diffusion Model for Personalized text to Image Generation
Shape My Moves: text-Driven Shape-Aware Synthesis of Human Motions
Shape Robust text Detection With Progressive Scale Expansion Network
Shape-Aware text-Driven Layered Video Editing
Shape-DNA: Effective Character Restoration and Enhancement for Arabic text Documents
Shape-Matching GAN++: Scale Controllable Dynamic Artistic text Style Transfer
ShapeScaffolder: Structure-Aware 3D Shape Generation from text
ShapeWords: Guiding text-to-Image Synthesis with 3D Shape-Aware Prompts
Shatter and Gather: Learning Referring Image Segmentation with text Supervision
SHE-Net: Syntax-Hierarchy-Enhanced text-Video Retrieval
Sherpa3D: Boosting High-Fidelity text-to-3D Generation via Coarse 3D Prior
Shifted Diffusion for text-to-image Generation
ShotAdapter: text-to-Multi-Shot Video Generation with Diffusion Models
Show-1: Marrying Pixel and Latent Diffusion Models for text-to-Video Generation
Shuffle and Divide: Contrastive Learning for Long text
SiamCLIM: text-Based Pedestrian Search Via Multi-Modal Siamese Contrastive Learning
Sigma-Lognormal Model for Handwritten text CAPTCHA Generation, A
Sign Detection Based text Localization in Mobile Device Captured Scene Images
Signing Avatars: Multimodal Challenges for text-to-sign Generation
Silent Branding Attack: Trigger-free Data Poisoning Attack on text-to-Image Diffusion Models
SILMM: Self-Improving Large Multimodal Models for Compositional text-to-Image Generation
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy Against text-to-Image Synthesis of Diffusion Models
SimAN: Exploring Self-Supervised Representation Learning of Scene text via Similarity-Aware Normalization
Similarity Search on Semantic Trajectories Using text Processing
Similarity Shuffled Criss-Cross Transformer With Angle Loss for Image-text Matching
SimMotionEdit: text-Based Human Motion Editing with Motion Similarity Prediction
Simple and Effective Multi-word Query Spotting in Handwritten text Images
Simple and Robust Correlation Filtering Method for text-Based Person Search, A
Simple Framework for text-Supervised Semantic Segmentation, A
Simulated Annealing Clustering of Chinese Words for Contextual Text Recognition
Simulated annealing-based text clustering
SINE: SINgle Image Editing with text-to-Image Diffusion Models
Single Shot Scene text Retrieval
Single Shot text Detector with Regional Attention
Single-frame text super-resolution: a bayesian approach
Single-Line text Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition
Single-Line text Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition
Six-CD: Benchmarking Concept Removals for text-to-image Diffusion Models
SKED: Sketch-guided text-based 3D Editing
skeleton based descriptor for detecting text in real scene images, A
Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy text Images
Sketch and text Guided Diffusion Model for Colored Point Cloud Generation
Sketch is Worth a Thousand Words: Image Retrieval with text and Sketch, A
SketchBird: Learning to Generate Bird Sketches from text
Skew Angle Detection and Correction in text Images Using RGB Gradient
Skew correction and line extraction in binarized printed text images
Skew Detection and text Line-Position Determination in Digitized Documents
Skew detection for complex document images using robust borderlines in both text and non-text regions
Skew detection for complex document images using robust borderlines in both text and non-text regions
Skew detection of text in a noisy digitized image
Skewed text correction based on the improved Hough transform
Skews in the Phenomenon Space Hinder Generalization in text-to-image Generation
SleeperMark: Towards Robust Watermark against Fine-Tuning text-to-image Diffusion Models
Sliding Line Point Regression for Shape Robust Scene text Detection
SLOAN: Scale-Adaptive Orientation Attention Network for Scene text Recognition
SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-text Retrieval
SmartBrush: text and Shape Guided Object Inpainting with Diffusion Model
Smile: Sequence-to-Sequence Domain Adaptation with Minimizing Latent Entropy for text Image Recognition
SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker text-to-Speech
Snap Video: Scaled Spatiotemporal Transformers for text-to-Video Synthesis
SnapGen: Taming High-Resolution text-To-Image Models for Mobile Devices with Efficient Architectures and Training
Snoopertext: A multiresolution system for text detection in complex visual scenes
Snoopertext: A text detection system for automatic indexing of urban scenes
Snoopertrack: text detection and tracking for outdoor videos
SNP-S3: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video-text Tasks
So Many Heads, So Many Wits: Multimodal Graph Reasoning for text-Based Visual Question Answering
Social Image-text Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation
Sounding Video Generator: A Unified Framework for text-Guided Sounding Video Generation
Source-Free Image-text Matching via Uncertainty-Aware Learning
Space-Time Diffusion Features for Zero-Shot text-Driven Motion Transfer
sparse version of the ridge logistic regression for large-scale text categorization, A
Sparsectrl: Adding Sparse Controls to text-to-video Diffusion Models
Spatial and Color Spaces Combination for Natural Scene text Extraction
Spatial and Spectral Based Segmentation of text in Multispectral Images of Ancient Documents
Spatial context-based Self-Supervised Learning for Handwritten Text Recognition
Spatial Transport Optimization by Repositioning Attention Map for Training-Free text-to-Image Synthesis
Spatially Prioritized and Persistent text Detection and Decoding
Spatio-Temporal Relevance Classification from Geographic texts Using Deep Learning
Spatiotemporal Typhoon Damage Assessment: A Multi-Task Learning Method for Location Extraction and Damage Identification from Social Media texts
SPCL: Semantic Polymorphism and Commonality Learning for text-Based Person Retrieval
Special issue on camera-based text and document recognition
Special issue on deep learning for video text analysis
Special Issue on Noisy text Analytics
Special Issue on Noisy text Analytics, II
Special Issue on Noisy text Analytics, III
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of text-to-Image Diffusion Models to Learn Any Unseen Style
Specific Category Region Proposal Network for text Detection in Natural Scene
Specific Diverse text-to-Image Synthesis via Exemplar Guidance
Spectral approach to find number of clusters of short-text documents
Spectral Fluctuation Method: A texture-Based Method to Extract Text Regions in General Scene Images
SpectralCLIP: Preventing Artifacts in text-Guided Style Transfer from a Spectral Perspective
SpeechPalette: A Comprehensive Speech Editing Method for text-Based Speech Editing, One-Shot TTS and Attributes Editing
Speedupnet: A Plug-and-play Adapter Network for Accelerating text-to-image Diffusion Models
SPEye: A Calibration-Free Gaze-Driven text Entry Technique Based on Smooth Pursuit
Spherical Linear Interpolation and text-Anchoring for Zero-Shot Composed Image Retrieval
Split-net: Dual transformer encoder with splitting scene text image for script identification
Spontaneous Handwriting text Recognition and Classification Using Finite-State Models
Spotlight text Detector: Spotlight on Candidate Regions Like a Camera
Spotting Phrases in Lines of Imaged text
SPS-SQL: Enhancing text-to-SQL generation onr small-scale LLMs with pre-synthesized queries
SPTS v2: Single-Point Scene text Spotting
ST-LDM: A Universal Framework for text-grounded Object Generation in Real Images
Stable Preference: Redefining Training Paradigm of Human Preference Model for text-to-image Synthesis
Stable text line detection
StableID: Multimodal learning for stable identity in personalized text-to-Face generation
StableVideo: text-driven Consistency-aware Diffusion Video Editing
Stacked Cross Attention for Image-text Matching
StackGAN: text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
StacMR: Scene-text Aware Cross-Modal Retrieval
STAN: A sequential transformation attention-based network for scene text recognition
STAR-Net: A SpaTial Attention Residue Network for Scene text Recognition
STARS: Semantics-Aware text-guided Aerial Image Refinement and Synthesis
StarVector: Generating Scalable Vector Graphics Code from Images and text
State Estimation in a Document Image and Its Application in text Block Identification and Text Line Extraction
State Estimation in a Document Image and Its Application in text Block Identification and Text Line Extraction
State-of-the-Art in Action: Unconstrained text Detection
Static text region detection in video sequences using color and orientation consistencies
Statistical Approach for Phrase Location and Recognition within a text Line: An Application to Street Name Recognition, A
Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme
Statistical text Line Analysis in Handwritten Documents
Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic text Lines, A
STEFANN: Scene text Editor Using Font Adaptive Neural Network
Steganalysis for text, Documents
STEP - Towards Structured Scene-text Spotting
STEPS: Sequential Probability Tensor Estimation for text-to-Image Hard Prompt Search
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from text-to-Image Diffusion Models
Stochastic text Models for Music Categorization
Stop Word Location and Identification for Adaptive text Recognition
Store classification using text-Exemplar-Similarity and Hypotheses-Weighted-CNN
Story Segmentation in News Videos Using Visual and text Cues
Story Visualization by Online text Augmentation with Context Memory
StoryDALL-E: Adapting Pretrained text-to-Image Transformers for Story Continuation
STPNet: Scale-Aware text Prompt Network for Medical Image Segmentation
Straight-Line Approximation and 1D Representation of Off-Line Handwritten text
Straightening warped text lines using polynomial regression
Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene text Recognition
Stratified Multi-Task Learning for Robust Spotting of Scene texts
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from text
StreamMel: Real-Time Zero-Shot text-to-Speech Via Interleaved Continuous Autoregressive Modeling
Street View text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems
Stretching deep architectures for text recognition
String Matching, text Matching
String-level learning of confidence transformation for Chinese handwritten text recognition
STRIVE: Scene text Replacement In Videos
stroke filter and its application to text localization, A
Stroke Filter for text Localization in Video Images
Stroke Segmentation and Recognition from Bangla Online Handwritten text
Stroke Verification with Gray-level Image for Hangul Video text Recognition
Stroke-Based Scene text Erasing Using Synthetic Data for Training
Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene text Recognition
Strokelets: A Learned Multi-scale Representation for Scene text Recognition
Structural feature-based event clustering for short text streams
Structure-Aware Generative Adversarial Network for text-to-Image Generation
Structured Human Assessment of text-to-Image Generative Models
Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis
Study on Automatic Chinese text Classification, A
Style Transformer With Common Knowledge Optimization for Image-text Retrieval, The
Style-A-Video: Agile Diffusion for Arbitrary text-Based Video Style Transfer
Style-Editor: text-driven object-centric style editing
Style-Preserving Diffusion for Scene text Editing
StyleCLIP: text-Driven Manipulation of StyleGAN Imagery
StyleMC: Multi-Channel Based Fast text-Guided Image Generation and Manipulation
StyleStudio: text-Driven Style Transfer with Selective Control of Style Elements
StyleT2I: Toward Compositional and High-Fidelity text-to-Image Synthesis
Stylized text-to-Fashion Image Generation
Sub-structure Learning Based Handwritten Chinese text Recognition
subtractive clustering scheme for text-independent online writer identification, A
Super-resolution Enhancement of text Image Sequences
Super-Resolved Binarization of text Based on the FAIR Algorithm
Superresolution-based Enhancement of text in Digital Video
supervised algorithm with a new differentiated-weighting scheme for identifying the author of a handwritten text, A
Supervised and Traditional Term Weighting Methods for Automatic text Categorization
Supervised Domain Adaptation from Scene text Recognition for Licence Plate Recognition
Supervised semantic relation mining from linguistically noisy text documents
support vector approach for cross-modal search of images and texts, A
Support vector machine-based approach for text description from the video
Support vector machine-based text detection in digital video
Suppression of non-text components in handwritten document images
Surgical text-to-image generation
Surprisingly Straightforward Scene text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis, The
Survey of text Watermarking in the Era of Large Language Models, A
survey on camera-captured scene text detection and extraction: towards Gurmukhi script, A
survey on methods, datasets and implementations for scene text spotting, A
survey on text generation using generative adversarial networks, A
SVGDreamer++: Advancing Editability and Diversity in text-Guided SVG Generation
SVGDreamer: text Guided SVG Generation with Diffusion Model
SViTT: Temporal Learning of Sparse Video-text Transformers
Swap Attention in Spatiotemporal Diffusions for text-to-Video Generation
Swaptext: Image Based Texts Transfer in Scenes
SwiftBrush: One-Step text-to-Image Diffusion Model with Variational Score Distillation
SwiftEdit: Lightning Fast text-Guided Image Editing via One-Step Diffusion
SwintextSpotter v2: Towards Better Synergy for Scene Text Spotting
SwintextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
SwintextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
SwintextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
SWT voting-based color reduction method for detecting text in natural scene images
Symbolic Subtraction from Fixed Formatted Graphics and text from Filled in Forms
Symbolization of Regional Elements Based on Local-Chronicle text Mining and Image-Feature Extraction, The
Symmetric-key block cipher for image and text cryptography
Symmetry-based object proposal for text detection
Symmetry-based text line detection in natural scenes
Symmetry-Constrained Rectification Network for Scene text Recognition
Syn3DTxt: Embedding 3D Cues for Scene text Generation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-text Translation
Synthesizing Talking Faces from text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network
Synthetic Data for text Localisation in Natural Images
Synthetic-to-real Unsupervised Domain Adaptation for Scene text Detection in the Wild
Synthetically Supervised Feature Learning for Scene text Recognition
System and method for automatically distinguishing between graphic information and text information of image data
System for Bangla Online Handwritten text, A
System for Handwritten and Machine-Printed text Separation in Bangla Document Images, A
system for the off-line recognition of handwritten text, A
T-HOG: An effective gradient-based descriptor for single line text regions
T-REX2: Towards Generic Object Detection via text-visual Prompt Synergy
T-Skeleton: Accurate scene text detection via instance-aware skeleton embedding
t-SS3: A text classifier with dynamic n-grams for early risk detection over text streams
t-SS3: A text classifier with dynamic n-grams for early risk detection over text streams
t-Test feature selection approach based on term frequency for text categorization
T-VSL: text-Guided Visual Sound Source Localization in Mixtures
T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional text-to-Image Generation
T2ishield: Defending Against Backdoors on text-to-image Diffusion Models
T2TD: text-3D Generation Model Based on Prior Knowledge Guidance
T2V-CompBench: A Comprehensive Benchmark for Compositional text-to-video Generation
T2V2T: text-to-Video-to-Text Fusion for Text-to-Video Retrieval
T2V2T: text-to-Video-to-Text Fusion for Text-to-Video Retrieval
T2V2T: text-to-Video-to-Text Fusion for Text-to-Video Retrieval
T2VBench: Benchmarking Temporal Dynamics for text-to-Video Generation
T2VLAD: Global-Local Sequence Alignment for text-Video Retrieval
TA2V: text-Audio Guided Video Generation
TAB: text-Align Anomaly Backbone Model for Industrial Inspection Tasks
Tablet identification using support vector machine based text recognition and error correction by enhanced n-grams algorithm
TACMT: text-aware cross-modal transformer for visual grounding on high-resolution SAR images
TACo: Token-aware Cascade Contrastive Learning for Video-text Alignment
TACT: text attention based CNN-Transformer network for polyp segmentation
TADA! text to Animatable Digital Avatars
Tag2Pix: Line Art Colorization Using text Tag With SECat and Changing Loss
Tag: text Prompt Augmentation for Zero-shot Out-of-distribution Detection
Tagging Webcast text in Baseball Videos by Video Segmentation and Text Alignment
Tagging Webcast text in Baseball Videos by Video Segmentation and Text Alignment
Tailored Visions: Enhancing text-to-Image Generation with Personalized Prompt Rewriting
Tailoring text for automatic layouting of newspaper pages
TalkCLIP: Talking Head Generation with text-Guided Expressive Speaking Styles
TAM-TR: text-guided attention multi-modal transformer for object detection in UAV images
Taming Mode Collapse in Score Distillation for text-to-3D Generation
Taming Stable Diffusion for text to 360° Panorama Image Generation
TAP: text-Aware Pre-training for Text-VQA and Text-Caption
TAP: text-Aware Pre-training for Text-VQA and Text-Caption
TAP: text-Aware Pre-training for Text-VQA and Text-Caption
TAPS3D: text-Guided 3D Textured Shape Generation from Pseudo Supervision
Target-level Sentiment Analysis Based on Image and text Fusion
TASDF-Stega: High Capacity Secure text-Audio Joint Steganography Using Diffusion Latent Space
Task Grouping for Multilingual text Recognition
TC4D: Trajectory-Conditioned text-to-4D Generation
TCATD: text Contour Attention for Scene Text Detection
TCATD: text Contour Attention for Scene Text Detection
TCFF-Adapter: text-Driven Adaption of CLIP for Few-Shot Image Classification
TCP: text-Guided Cascade Network for Pedestrian Crossing Intention Prediction
TE141K: Artistic text Benchmark for Text Effect Transfer
TE141K: Artistic text Benchmark for Text Effect Transfer
Teachtext: CrossModal Generalized Distillation for Text-Video Retrieval
Teachtext: CrossModal text-video retrieval through generalized distillation
TECA: text-Guided Generation and Editing of Compositional 3D Avatars
TeCH: text-Guided Reconstruction of Lifelike Clothed Humans
Technique for Segmentation of Gurmukhi text, A
Tecm-clip: text-based Controllable Multi-attribute Face Image Manipulation
TediGAN: text-Guided Diverse Face Image Generation and Manipulation
TEDRA: text-Based Editing of Dynamic and Photoreal Actors
Tela: text to Layer-wise 3d Clothed Human Generation
Tell Me What Happened: Unifying text-guided Video Completion via Multimodal Masked Video Generation
Tell Your Story: text-Driven Face Video Synthesis with High Diversity via Adversarial Learning
Tem-adapter: Adapting Image-text Pretraining for Video Question Answer
TeMO: Towards text-Driven 3D Stylization for Multi-Object Meshes
Template Based Segmentation of Touching Components in Handwritten text Lines
Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-text Retrieval
Temporal prompt guided visual-text-object alignment for zero-shot video captioning
Temporal video segmentation with natural language using text-video cross attention and Bayesian order-priors
TEMSA:text enhanced modal representation learning for multimodal sentiment analysis
Tensor representation learning based image patch analysis for text identification and recognition
Tensor Voting Based text Localization in Natural Scene Images
Term relevance dependency model for text classification
TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis
TeViR: text-to-Video Reward With Diffusion Models for Efficient Reinforcement Learning
TEXDC: text-driven Disease-aware 4d Cardiac Cine MRI Images Generation
TexFusion: Synthesizing 3D textures with Text-Guided Image Diffusion Models
Texgen: text-guided 3d Texture Generation with Multi-view Sampling and Resampling
text alignment in early printed books combining deep learning and dynamic programming
text alignment with handwritten documents
text analysis using local energy
text and Documents in the Deep Learning Era
text and Image Guided 3D Avatar Generation and Manipulation
text and Image Sharpening of Scanned Images in the JPEG Domain
text and Layout Information Extraction from Document Files of Various Formats Based on the Analysis of Page Description Language
text and Non-Text Latent Feature Disentanglement for Screen Content Image Compression
text and Non-Text Latent Feature Disentanglement for Screen Content Image Compression
text and non-text segmentation based on connected component features
text and non-text segmentation based on connected component features
text and non-text separation in offline document images: a survey
text and non-text separation in offline document images: a survey
text and picture segmentation by the distribution analysis of wavelet coefficients
text and User Generic Model for Writer Verification Using Combined Pen Pressure Information From Ink Intensity and Indented Writing on Paper
text Area Detection in Digital Documents Images Using Textural Features
text area localization under complex-background using wavelet decomposition
text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution, A
text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution, A
text Augmented Correlation Transformer For Few-shot Classification & Segmentation
text baseline detection, a single page trained system
text Baseline Recognition Using a Recurrent Convolutional Neural Network
text binarization in color documents
text Block Segmentation in Comic Speech Bubbles
text box proposals for handwritten word spotting from documents
text Categorization Approach for Music Style Recognition, A
text Categorization: A Symbolic Approach
text Change Detection in Multilingual Documents Using Image Comparison
text Classification and Document Layout Analysis of Paper Fragments
text classification with the support of pruned dependency patterns
text Co-Detection in Multi-View Scene
text Compression-Aided Transformer Encoding
text data extraction from microfilm images of punched cards
text degradations and OCR training
text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning
text Detection and Localization in Complex Scene Images using Constrained AdaBoost Algorithm
text Detection and Recognition in Imagery: A Survey
text detection and recognition in images and video frames
text detection and recognition in natural scene with edge analysis
text Detection and Recognition in Real World Images
text detection and recognition in urban scenes
text Detection and Recognition on Traffic Panels From Street-Level Imagery Using Visual Appearance
text detection and restoration in natural scene images
text Detection and Translation from Natural Scenes
text detection based on convolutional neural networks with spatial pyramid pooling
text Detection for Video Analysis
text detection from natural scene images using topographic maps and sparse representations
text Detection from Natural Scene Images: Towards a System for Visually Impaired Persons
text detection from scene images using sparse representation
text detection in color scene images based on unsupervised clustering of multi-channel wavelet features
text detection in continuous tone image segments
text Detection in Digital Images Captured with Low Resolution Under Nonuniform Illumination Conditions
text detection in images based on unsupervised classification of edge-based features
text detection in images based on unsupervised classification of high-frequency wavelet coefficients
text detection in images using sparse representation with discriminative dictionaries
text detection in manga by combining connected-component-based and region-based classifications
text Detection in Natural Images Using Bio-inspired Models
text Detection in Natural Images Using Localized Stroke Width Transform
text Detection in Natural Scene Images by Stroke Gabor Words
text detection in natural scene images with user-intention
text detection in natural scene with edge analysis
text detection in natural scenes using Gradient Vector Flow-Guided symmetry
text detection in nature scene images using two-stage nontext filtering
text detection in scene images based on exhaustive segmentation
text detection in stores using a repetition prior
text Detection of Two Major Indian Scripts in Natural Scene Images
text detection on camera acquired document images using supervised classification of connected components in wavelet domain
text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, A
text Detection Using Edge Gradient and Graph Spectrum
text Detection, Find Text in General Scenes, Scene Text
text Detection, Find Text in General Scenes, Scene Text
text Detection, Find Text in General Scenes, Scene Text
text detection, localization, and tracking in compressed video
text Detection, Scene Text, Curved Text, Arbitrary Orientation
text Detection, Scene Text, Curved Text, Arbitrary Orientation
text Detection, Scene Text, Curved Text, Arbitrary Orientation
text Detection, Tracking and Recognition in Video: A Comprehensive Survey
text Detector Based on the Specific Text Prompt, A
text Detector Based on the Specific Text Prompt, A
text discrimination method and related apparatus
text Driven Face-Video Synthesis Using GMM and Spatial Correlation
text Driven Temporal Segmentation of Cricket Videos
text effects transfer via distribution-aware texture synthesis
text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
text Encryption: Hybrid cryptographic method using Vigenere and Hill Ciphers.
text Enhancement by PDE's Based Methods
text Enhancement for Laser Copiers
text Enhancement Network for Cross-Domain Scene Text Detection
text Enhancement Network for Cross-Domain Scene Text Detection
text enhancement with asymmetric filter for video OCR
text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model
text extraction from color documents-clustering approaches in three and four dimensions
text Extraction from Colored Book and Journal Covers
text extraction from degraded document images
text extraction from gray scale document images using edge information
text extraction from gray scale historical document images using adaptive local connectivity map
text Extraction from Grey Scale Page Images by Simple Edge Detectors
text extraction from images captured via mobile and digital devices
text extraction from name cards with complex design
text extraction from scene images by character appearance and structure modeling
text Extraction from Street Level Images
text Extraction from Video Using Conditional Random Fields
text extraction from web images based on a split-and-merge segmentation method using colour perception
text extraction in complex color documents
text Extraction in Digital News Video Using Morphology
text Extraction in MPEG Compressed Video for Content-based Indexing
text extraction in real scene images on planar planes
text Extraction Using Component Analysis and Neuro-fuzzy Classification on Complex Backgrounds
text Extraction Using Pyramid
text Flow: A Unified Text Detection System in Natural Scene Images
text Flow: A Unified Text Detection System in Natural Scene Images
text From Corners: A Novel Approach to Detect Text and Caption in Videos
text From Corners: A Novel Approach to Detect Text and Caption in Videos
text generation and multi-modal knowledge transfer for few-shot object detection
text Generation, Text Synthesis, Text Placement on Maps
text Generation, Text Synthesis, Text Placement on Maps
text Generation, Text Synthesis, Text Placement on Maps
text Geolocation Prediction via Self-Supervised Learning
text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis
text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis
text Growing on Leaf
text Guided Person Image Synthesis
text Identification for Document Image Analysis Using a Neural Network
text Identification in Complex Background Using SVM
text identification in noisy document images using Markov random field
text Image Classifier Using Image-Wise Annotation
text Image Clean-Up and Thresholding: A Comparative Study
text Image Compression Using Soft Pattern Matching
text Image Deblurring Using Kernel Sparsity Prior
text Image Deblurring Using Text-Specific Properties
text Image Deblurring Using Text-Specific Properties
text Image Deblurring via Intensity Extremums Prior
text in Everything
text in Scenes, Stroke Based, Contour Based
text in the dark: Extremely low-light text image enhancement
text in the dark: Extremely low-light text image enhancement
text independent speaker gender recognition using lip movement
text Independent Writer Identification for Bengali Script
text independent writer identification of Arabic manuscripts and the effects of writers increase
text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features
text information extraction in images and video: a survey
text Input System Using Online Overlapped Handwriting Recognition for Mobile Devices
text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
text line and word segmentation of handwritten documents
text line bounding system
text Line Characterization by Connected Component Transformations
text Line Detection for Heterogeneous Documents
text Line Detection in Corrupted and Damaged Historical Manuscripts
text Line Detection in Document Images: Towards a Support System for the Blind
text line detection in handwritten documents
text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach
text Line Detection Method for Mathematical Formula Recognition, A
text line extraction for historical document images
text line extraction from multi-skewed handwritten documents
text line extraction in document images
text Line Extraction in Documents
text line extraction in graphical documents using background and foreground information
text Line Extraction in Handwritten Document with Kalman Filter Applied on Low Resolution Image
text Line Extraction Method Using Domain-Based Active Contour Model
text line extraction strategy for palm leaf manuscripts
text Line Extraction Using Adaptive Partial Projection for Palm Leaf Manuscripts from Thailand
text Line Extraction Using DMLP Classifiers for Historical Manuscripts
text Line Extraction Using Fully Convolutional Network and Energy Minimization
text line segmentation and word recognition in a system for general writer independent handwriting recognition
text Line Segmentation Based on Morphology and Histogram Projection
text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis
text line segmentation in Chinese handwritten text images
text line segmentation in Chinese handwritten text images
text Line Segmentation in Handwritten Documents Using Mumford-Shah Model
text Line Segmentation in Images of Handwritten Historical Documents
text Line Segmentation of Historical Arabic Documents
text line segmentation of historical documents: a survey
text line segmentation using a fully convolutional network in handwritten document images
text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis
text Localization and Extraction from Complex Color Images
text Localization and Extraction from Complex Gray Images
text Localization and Recognition in Complex Scenes Using Local Features
text Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions
text Localization in Born-Digital Images of Advertisements
text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
text Localization in Natural Scene Images Based on Conditional Random Field
text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search
text Localization in Web Images Using Probabilistic Candidate Selection Model
text localization using image cues and text line information
text localization using image cues and text line information
text localization, enhancement and binarization in multimedia documents
text locating from natural scene images using image intensities
text location in complex images
text Mining in Remotely Sensed Phenology Studies: A Review on Research Development, Main Topics, and Emerging Issues
text Mining the Contributors to Rail Accidents
text Motion Translator: A Bi-directional Model for Enhanced 3d Human Motion Generation from Open-vocabulary Descriptions
text OCR by Solving a Cryptogram
text only Analysis, Natural Language
text optimization with latent inversion for non-rigid image editing
text Page Recognition Using Grey-Level Features and Hidden Markov-Models
text Parsing Using Spatial Information for Recognizing Addresses in Mail Pieces
text Particles Multi-band Fusion for Robust Text Detection
text Particles Multi-band Fusion for Robust Text Detection
text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the Wild
text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the Wild
text Prior Guided Scene Text Image Super-Resolution
text Prior Guided Scene Text Image Super-Resolution
text Prompt Region Decomposition for Effective Facial Expression Recognition
text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
text Query based Traffic Video Event Retrieval with Global-Local Fusion Embedding
text Query to Web Image to Video: A Comprehensive Ad-hoc Video Search
text reading algorithm for natural images, A
text Recognition - Real World Data and Where to Find Them
text Recognition and Retrieval System for e-Business Image Management, A
text recognition from grey level images using hidden Markov models
text Recognition in Images Based on Transformer with Hierarchical Attention
text recognition in multimedia documents: A study of two neural-based OCRs using and avoiding character segmentation
text Recognition in Real Scenarios with a Few Labeled Samples
text Recognition in the Wild: A Survey
text recognition of low-resolution document images
text Recognition System for Japanese Documents
text recognition using deep BLSTM networks
text Recognition: From Pixels to Meaning
text Region Conditional Generative Adversarial Network for Text Concealment in the Wild
text Region Conditional Generative Adversarial Network for Text Concealment in the Wild
text region extraction and text segmentation on camera-captured document style images
text region extraction and text segmentation on camera-captured document style images
text Region Extraction from Quality Degraded Document Images
text Region Extraction From Scene Images Using AGF and MSER
text region extraction in a document image based on the Delaunay tessellation
text retrieval from early printed books
text scanner with text detection technology on image sequences
text scanner with text detection technology on image sequences
text search for medieval manuscript images
text segmentation and recognition in complex background based on markov random field
text Segmentation by Clustering Cohesion
text Segmentation for MRC Document Compression
text Segmentation from Complex Background Using Sparse Representations
text segmentation in color images using tensor voting
text Segmentation in Colour Posters from the Spanish Civil War Era
text segmentation in natural scenes using Toggle-Mapping
text Segmentation in Unconstrained Hand-Drawings in Whiteboard Photos
text Segmentation of Consumer Magazines in PDF Format
text Segmentation Using Gabor Filters for Automatic Document Processing
text segmentation using superpixel clustering
text selection by structured light marking for hand-held cameras
text Separation from Mixed Documents Using a Tree-Structured Classifier
text Similarity Measurement Method Based on BiLSTM-SECapsNet Model
text Spotting Transformers
text String Detection From Natural Scenes by Structure-Based Partition and Grouping
text String Extraction from Images of Color-Printed Documents
text Synopsis Generation for Egocentric Videos
text to 3D Synthesis, Text to 3D Generation
text to 3D Synthesis, Text to 3D Generation
text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning
text to Image Generation with Semantic-Spatial Aware GAN
text To Image Synthesis With Erudite Generative Adversarial Networks
text to image synthesis with multi-granularity feature aware enhancement Generative Adversarial Networks
text to Image, Image Based Rendering
text to photo-realistic image synthesis via chained deep recurrent generative adversarial network
text to Video Synthesis, Text to Motion
text to Video Synthesis, Text to Motion
text to visual synthesis with appearance models
text Verification in an Automated System for the Extraction of Bibliographic Data
text vs. Non-Text Regions
text vs. Non-Text Regions
text watermarking algorithm based on word classification and inter-word space statistics, A
text with Knowledge Graph Augmented Transformer for Video Captioning
text zone classification using unsupervised feature learning
text- and speech-based phonotactic models for spoken language identification of Basque and Spanish
text-anchored Score Composition: Tackling Condition Misalignment in Text-to-image Diffusion Models
text-anchored Score Composition: Tackling Condition Misalignment in Text-to-image Diffusion Models
text-Attentional Convolutional Neural Network for Scene Text Detection
text-Attentional Convolutional Neural Network for Scene Text Detection
text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification
text-aware balloon extraction from manga
text-aware image dehazing using stroke width transform
text-Based Audio Retrieval by Learning From Similarities Between Audio Captions
text-Based Fine-Grained Emotion Prediction
text-based Geometric Normalization for Robust Watermarking of Digital Maps
text-based image retrieval using progressive multi-instance learning
text-Based Localization of Moments in a Video Corpus
text-based Person Search via Attribute-aided Matching
text-Based Person Search via Cross-Modal Alignment Learning
text-based person search via fine-grained cross-modal semantic alignment
text-Based Temporal Localization of Novel Events
text-based visual context modulation neural model for multimodal machine translation, A
text-Centric multimodal sentiment analysis with asymmetric fine-tuning
text-Conditional Attribute Alignment Across Latent Spaces for 3D Controllable Face Image Synthesis
text-Conditioned Generative Model of 3D Strand-Based Human Hairstyles
text-conditioned Resampler For Long Form Video Understanding
text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
text-Controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion
text-Controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion
text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation
text-Driven Automatic Frame Generation Using MPEG-4 Synthetic/Natural Hybrid Coding for 2-D Head-and-Shoulder Scene
text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
text-Driven Generative Domain Adaptation with Spectral Consistency Regularization
text-Driven Image Editing via Learnable Regions
text-Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge
text-driven Stylization of Video Objects
text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method
text-Edge-Box: An Object Proposal Approach for Scene Texts Localization
text-Edge-Box: An Object Proposal Approach for Scene Texts Localization
text-Enhanced Data-Free Approach for Federated Class-Incremental Learning
text-Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal Attention
text-Enriched Air Traffic Flow Modeling and Prediction Using Transformers
text-free diffusion inpainting using reference images for enhanced visual fidelity
text-graphics separation to detect logo and stamp from color document images: A spectral approach
text-Guided 3D Face Synthesis: From Generation to Editing
text-guided camouflaged object detection
text-Guided Coarse-to-Fine Fusion Network for robust remote sensing visual question answering
text-guided distillation learning to diversify video embeddings for text-video retrieval
text-guided distillation learning to diversify video embeddings for text-video retrieval
text-Guided Explorable Image Super-Resolution
text-Guided Eyeglasses Manipulation With Spatial Constraints
text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning
text-Guided Facial Image Manipulation for Wild Images via Manipulation Direction-Based Loss
text-guided Fourier Augmentation for long-tailed recognition
text-Guided Generation and Refinement Model for Image Captioning, A
text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
text-Guided Human Image Manipulation via Image-Text Shared Space
text-Guided Human Image Manipulation via Image-Text Shared Space
text-Guided Multi-Class Multi-Object Tracking for Fine-Grained Maritime Rescue
text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine
text-Guided Object Detector for Multi-modal Video Question Answering
text-Guided Patch Scoring and Local Distortion Guidance for Image Quality Assessment
text-Guided Prototype Generation for Occluded Person Re-Identification
text-Guided Reconstruction Network for Sentiment Analysis With Uncertain Missing Modalities
text-Guided Semantic Alignment Network With Spatial-Frequency Interaction for Infrared-Visible Image Fusion Under Extreme Illumination
text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
text-guided Video Masked Autoencoder
text-guided visual representation learning for medical image retrieval systems
text-guided weakly supervised framework for dynamic facial expression recognition
text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
text-Image Alignment for Diffusion-Based Perception
text-image separation in Devanagari documents
text-image super-resolution through anchored neighborhood regression with multiple class-specific dictionaries
text-Independent Online Writer Identification Using Hidden Markov Models
text-independent Persian writer identification based on feature relation graph (FRG), A
text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
text-independent speaker recognition using graph matching
text-independent speaker verification with ant colony optimization feature selection and support vector machine
text-independent voice conversion using deep neural network based phonetic level features
text-Independent Writer Identification and Verification on Offline Arabic Handwriting
text-Independent Writer Identification and Verification Using Textural and Allographic Features
text-Independent Writer Identification Based on Fusion of Dynamic and Static Features
text-Independent Writer Identification on Online Arabic Handwriting
text-independent writer identification using convolutional neural network
text-independent writer identification using SIFT descriptor and contour-directional feature
text-independent writer recognition using multi-script handwritten texts
text-independent writer recognition using multi-script handwritten texts
text-indicated writer verification using hidden Markov models
text-Injected Discriminative Model for Remote Sensing Visual Grounding
text-instance graph: Exploring the relational semantics for text-based visual question answering
text-instance graph: Exploring the relational semantics for text-based visual question answering
text-Line Detection in Camera-Captured Document Images Using the State Estimation of Connected Components
text-line examination for document forgery detection
text-line Extraction and Character Recognition of Document Headlines with Graphical Designs Using Complementary Similarity Measure
text-Line Extraction and Character Recognition of Japanese Newspaper Headlines With Graphical Designs
text-Line Extraction in Handwritten Chinese Documents Based on an Energy Minimization Framework
text-Line Extraction Using a Convolution of Isotropic Gaussian Filter with a Set of Line Filters
text-mining based journal splitting
text-only weakly supervised learning framework for text spotting via text-to-polygon generator, A
text-only weakly supervised learning framework for text spotting via text-to-polygon generator, A
text-only weakly supervised learning framework for text spotting via text-to-polygon generator, A
text-Pose Estimation in 3D Using Edge-Direction Distributions
text-RGNNs: Relational Modeling for Heterogeneous Text Graphs
text-RGNNs: Relational Modeling for Heterogeneous Text Graphs
text-Scene Retrieval for Driving Scenes in Transportation Cyber-Physical Systems
text-to-3D Generation with Bidirectional Diffusion Using Both 2D and 3D Priors
text-to-3D using Gaussian Splatting
text-to-Floorplan Synthesis via Graph-Conditioned Diffusion Processes
text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
text-to-image Editing by Image Information Removal
text-to-Image Generation Grounded by Fine-Grained User Attention
text-to-Image Generation via Semi-Supervised Training
text-to-Image Models for Counterfactual Explanations: A Black-Box Approach
text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network
text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer
text-to-Image Synthesis for Domain Generalization in Face Anti-Spoofing
text-to-image synthesis with self-supervised bi-stage generative adversarial network
text-to-image synthesis with self-supervised learning
text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark
text-to-image via mask anchor points
text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction
text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction
text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
text-to-Traffic Generative Adversarial Network for Traffic Situation Generation
text-to-video: a semantic search engine for internet videos
text-Tracking Wearable Camera System for the Blind
text-tracking wearable camera system for visually-impaired people
text-Video Completion Using Structure Repair and Texture Propagation
text-Video Knowledge Guided Prompting for Weakly Supervised Temporal Action Localization
text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders
text-Video Retrieval With Global-LocalSemantic Consistent Learning
text-Visual Prompting for Efficient 2D Temporal Video Grounding
text/continuous tone image decision processor
text/graphic labelling of ancient printed documents
text/graphic separation using a sparse representation with multi-learned dictionaries
text/Graphics Segmentation in Architectural Floor Plans
text/Graphics Separation Revisited
text/image separation method
text/Non-Text Image Classification in the Wild with Convolutional Neural Networks
text/Non-Text Image Classification in the Wild with Convolutional Neural Networks
text/Non-text Ink Stroke Classification in Japanese Handwriting Based on Markov Random Fields
text/Non-text Ink Stroke Classification in Japanese Handwriting Based on Markov Random Fields
text/shape classifier for mobile applications with handwriting input
text2Avatar: Articulated 3D Avatar Creation With Text Instructions
text2Concept: Concept Activation Vectors Directly from Text
text2HOI: Text-Guided 3D Motion Generation for Hand-Object Interaction
text2LiDAR: Text-guided Lidar Point Cloud Generation via Equirectangular Transformer
text2LIVE: Text-Driven Layered Image and Video Editing
text2Mesh: Text-Driven Neural Stylization for Meshes
text2Performer: Text-Driven Human Video Generation
text2place: Affordance-aware Text Guided Human Placement
text2Pos: Text-to-Point-Cloud Cross-Modal Localization
text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
text2Scene: Text-driven Indoor Scene Stylization with Part-Aware Details
text2Sketch: Learning Face Sketch from Facial Attribute Text
text2Tex: Text-driven Texture Synthesis via Diffusion Models
text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
text2Video: An End-to-end Learning Framework for Expressing Text With Videos
textAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
textAdapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition
textAug: Test Time Text Augmentation for Multimodal Person Re-Identification
textBoxes++: A Single-Shot Oriented Scene Text Detector
textCatcher: a method to detect curved and challenging text in natural scenes
textContourNet: A Flexible and Effective Framework for Improving Scene Text Detection Architecture With a Multi-Task Cascade
textCraftor: Your Text Encoder can be Image Quality Controller
textDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask
textDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models
textdiffuser-2: Unleashing the Power of Language Models for Text Rendering
textDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting
textFace: Text-to-Style Mapping Based Face Generation and Manipulation
textField: Learning a Deep Direction Field for Irregular Scene Text Detection
textFinder: An Automatic System to Detect and Recognize Text in Images
textinvision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
textinvision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
textManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
textMesh: Generation of Realistic 3D Meshes From Text Prompts
textMountain: Accurate scene text detection via instance segmentation
textNeRF: A Novel Scene-Text Image Synthesis Method Based on Neural Radiance Fields
textNet: Irregular Text Reading from Images with an End-to-End Trainable Network
textOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
textPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts
textProposals: A text-specific selective search algorithm for word spotting in the wild
textron: Weakly Supervised Multilingual Text Detection through Data Programming
textRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images
texts as Images in Prompt Tuning for Multi-Label Image Recognition
texts as points: Scene text detection with point supervision
texts as points: Scene text detection with point supervision
textSLAM: Visual SLAM With Semantic Planar Text Features
textSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
textSRNet: Scene Text Super-Resolution Based on Contour Prior and Atrous Convolution
textStyleBrush: Transfer of Text Aesthetics From a Single Example
textual Alchemy: CoFormer for Scene Text Understanding
textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching
textual Visual Semantic Dataset for Text Spotting
textual-visual Logic Challenge: Understanding and Reasoning in Text-to-image Generation
texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm
TFRGAN: Leveraging text Information for Blind Face Restoration with Extreme Degradation
TF²: Few-Shot text-Free Training-Free Defect Image Generation for Industrial Anomaly Inspection
TG-TSGNet: A text-Guided Arbitrary-Resolution Terrain Scene Generation Network
There and Back Again: 3D Sign Language Generation from text Using Back-Translation
Thinking Fast and Slow: Efficient text-to-Visual Retrieval with Transformers
Three decision levels strategy for Arabic and Latin texts differentiation in printed and handwritten natures
Three-Dimensional Lip Motion Network for text-Independent Speaker Recognition
Thresholding video images for text detection
TI2V-Zero: Zero-Shot Image Conditioning for text-to-Video Diffusion Models
TIAM - A Metric for Evaluating Alignment in text-to-Image Generation
Tibet: Identifying and Evaluating Biases in text-to-image Generative Models
Ticker: An Adaptive Single-Switch text Entry Method for Visually Impaired Users
TIED: A Cycle Consistent Encoder-Decoder Model for text-to-Image Retrieval
TieNet: text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays
TIFA: Accurate and Interpretable text-to-Image Faithfulness Evaluation with Question Answering
Tightness-Aware Evaluation Protocol for Scene text Detection
TIMTQE: Benchmarking Machine Translation Quality Estimation for text Images
TIPS: text-Induced Pose Synthesis
TISE: Bag of Metrics for text-to-Image Synthesis Evaluation
TJCMNet: An Efficient Vision-text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification
Tk-text: Multi-shaped Scene Text Detection via Instance Segmentation
Tk-text: Multi-shaped Scene Text Detection via Instance Segmentation
TKDN: Scene text Detection via Keypoints Detection
TLDR: text Based Last-Layer Retraining for Debiasing Image Classifiers
TlTScore: Towards Long-Tail Effects in text-to-Visual Evaluation with Generative Foundation Models
TLWSR: Weakly supervised real-world scene text image super-resolution using text label
TLWSR: Weakly supervised real-world scene text image super-resolution using text label
TM2D: Bimodality Driven 3D Dance Generation via Music-text Integration
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and texts
TMR: text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis
To Speak or to text: Effects of Display Type and I/O Style on Mobile Virtual Humans Nurse Training
To text or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging
To text or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging
To text or not to text- drivers' interpretation of traffic situations as the basis for their decision to (not) engage in text messaging
Token-Mixer: Bind Image and text in One Embedding Space for Medical Image Reporting
TokenBinder: text-Video Retrieval with One-to-Many Alignment Paradigm
TokenCompose: text-to-Image Diffusion with Token-Level Supervision
Tokenfocus-VQA: Enhancing text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
Tool for Ground-Truthing text Lines and Characters in Off-Line Handwritten Chinese Documents, A
Top-down and bottom-up cues for scene text recognition
Topic Language Model Adaption for Recognition of Homologous Offline Handwritten Chinese text Image
Total-text: toward orientation robustness in scene text detection
Total-text: toward orientation robustness in scene text detection
Toward Automation in text-Based Video Retrieval with LLM Assistance
Toward Integrated Scene text Reading
Toward Open-World text-Driven Face Generation and Manipulation via StyleGAN3
Toward real text manipulation detection: New dataset and new solution
Toward text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
Toward Understanding WordArt: Corner-Guided Transformer for Scene text Recognition
Toward Verifiable and Reproducible Human Evaluation for text-to-Image Generation
Towards Accurate Scene text Recognition With Semantic Reasoning Networks
Towards Accurate text-based Image Captioning with Content Diversity Exploration
Towards an Extensible and text-Oriented Analytical Semantic Trajectory Framework
Towards Automated Transcription of Label text from Pinned Insect Collections
Towards Cycle-Consistent Models for text and Image Retrieval
Towards Effective Usage of Human-Centric Priors in Diffusion Models for text-based Human Image Generation
Towards End-to-End text Spotting in Natural Scenes
Towards End-to-End text Spotting with Convolutional Recurrent Neural Networks
Towards End-to-End Unified Scene text Detection and Layout Analysis
Towards Fast and Accurate Image-text Retrieval With Self-Supervised Fine-Grained Alignment
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-text Pre-Training
Towards Generic text-Line Extraction
Towards High-Fidelity text-Guided 3D Face Generation and Manipulation Using only Images
Towards Implicit text-Guided 3D Shape Generation
Towards Improved text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Towards Improved text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Towards Improved text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Towards Interactive Facial Image Inpainting by text or Exemplar Image
Towards Language-Free Training for text-to-Image Generation
Towards Modelling an Attention-Based text Localization Process
Towards Open Domain text-driven Synthesis of Multi-person Motions
Towards open-set text recognition via label-to-prototype learning
Towards robust and efficient text sign reading from a mobile phone
Towards Robust Curve text Detection With Conditional Spatial Expansion
Towards Robust Tampered text Detection in Document Image: New Dataset and New Solution
Towards Robust text-Guided Image Compression Under Modality Missing
Towards Scalable Human-aligned Benchmark for text-guided Image Editing
Towards Specific Domain Prompt Learning via Improved text Label Optimization
Towards text-guided 3D Scene Composition
Towards the Unseen: Iterative text Recognition by Distilling from Errors
Towards Unconstrained End-to-End text Spotting
Towards Understanding and Quantifying Uncertainty for text-to-Image Generation
Towards Understanding Cross and Self-Attention in Stable Diffusion for text-Guided Image Editing
Towards Unified Scene text Spotting Based on Sequence Generation
Towards Weakly Supervised text-to-Audio Grounding
Towards Weakly-Supervised text Spotting using a Multi-Task Transformer
Towards Zero-Shot Multi-Speaker Multi-Accent text-to-Speech Synthesis
TP-LReID: Lifelong person re-identification using text prompts
TP2O: Creative text Pair-to-object Generation Using Balance Swap-Sampling
TPA-Seg: Multi-Class Nucleus Segmentation Using text Prompts and Cross-Attention
TPA3D: Triplane Attention for Fast text-to-3d Generation
TPD-STR: text Polygon Detection with Split Transformers
TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue text Cues
TPWGAN: Wavelet-aware text prior guided super-resolution for scene text images
TPWGAN: Wavelet-aware text prior guided super-resolution for scene text images
Trace Controlled text to Image Generation
Tracking Based Multi-Orientation Scene text Detection: A Unified Framework With Dynamic Programming
Traffic Video Event Retrieval via text Query using Vehicle Appearance and Motion Attributes
Training on severely degraded text-line images
Training-Free Color-Style Disentanglement for Constrained text-to-Image Synthesis
Training-Free Location-Aware text-to-Image Synthesis
Training-free subject-enhanced attention guidance for compositional text-to-image generation
TrAME: Trajectory-Anchored Multi-View Editing for text-Guided 3D Gaussian Manipulation
Transcript Mapping for Handwritten text Lines Using Conditional Random Fields
Transferable Adversarial Attacks for Deep Scene text Detection
Transferring Image-CLIP to Video-text Retrieval via Temporal Relations
Transferring Knowledge From text to Video: Zero-Shot Anticipation for Procedural Actions
Transform invariant text extraction
Transformation of arc-form-text to linear-form-text suitable for OCR
Transformation of arc-form-text to linear-form-text suitable for OCR
Transformer models for enhancing AttnGAN based text to image generation
Transformer Reasoning Network for Image-text Matching and Retrieval
Transformer-based text Detection in the Wild
Transparent text Detection and Background Recovery
TransPixeler: Advancing text-to-Video Generation with Transparency
TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships
Tree structure for word extraction from handwritten text lines
TriCoLo: Trimodal Contrastive Loss for text to Shape Retrieval
TriMatch: Triple Matching for text-to-Image Person Re-Identification
Trinity Detector: text-Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection
TRIS: A multimodal and multitask framework for unifying text-image retrieval and referring image segmentation
TRTST: Arbitrary High-Quality text-Guided Style Transfer With Transformers
True color distributions of scene text and background
TS-RNN: text Steganalysis Based on Recurrent Neural Networks
TS2-Net: Token Shift and Selection Transformer for text-Video Retrieval
TSA-SCC: text Semantic-Aware Screen Content Coding With Ultra Low Bitrate
TSINIT: A Two-Stage Inpainting Network for Incomplete text
TTD: text-tag Self-distillation Enhancing Image-text Alignment in CLIP to Alleviate Single Tag Bias
TTD: text-tag Self-distillation Enhancing Image-text Alignment in CLIP to Alleviate Single Tag Bias
TTDNet: An End-to-End Traffic text Detection Framework for Open Driving Environments
TTS: Hilbert Transform-Based Generative Adversarial Network for Tattoo and Scene text Spotting
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for text-to-Video Generation
Tuning-Free Image Customization with Image and text Guidance
Turbo3D: Ultra-fast text-to-3D Generation
Turboedit: Instant text-based Image Editing
TurboFill: Adapting Few-step text-to-image Model for Fast Image Inpainting
Turning a CLIP Model into a Scene text Detector
Turning a CLIP Model Into a Scene text Spotter
TV Commercial Detection Based on Shot Change and text Extraction
TV Program Classification Based on Face and text Processing
TV program segmentation using text-visual analysis
TVI-MFAN: A text-Visual Interaction Multilevel Feature Alignment Network for Visual Grounding in Remote Sensing
TVMTrailer: A text-Video-Music AIGC Framework for Film Trailer Generation
TWD: A New Deep E2E Model for text Watermark/Caption and Scene Text Detection in Video
TWD: A New Deep E2E Model for text Watermark/Caption and Scene Text Detection in Video
Twitter Stream Analysis, Tweets, texts, SMS, Internet
Two approaches for text segmentation in web images
Two combination stages of clustered One-Class Classifiers for writer identification from text fragments
Two Stage SVM and kNN text Documents Classifier
Two-Level Rectification Attention Network for Scene text Recognition, A
Two-Pass Clustering Technique for Orientation-Invariant and Language-Independent text Localization
Two-stage hybrid binarization around fringe map based text line segmentation for document images
two-stage method for text line detection in historical documents, A
Two-stage Multimodality Fusion for High-performance text-based Visual Question Answering
Two-stage partial image-text clustering (TPIT-C)
two-stage scheme for text detection in video images, A
Two-Stage Seamless text Erasing on Real-World Scene Images
Txt2Img-MHN: Remote Sensing Image Generation From text Using Modern Hopfield Networks
Type-2 Fuzzy GMMs for Robust text-Independent Speaker Verification in Noisy Environments
Type-R: Automatically Retouching Typos for text-to-Image Generation
Typing in Mid Air: Assessing One- and Two-Handed text Input Methods of the Microsoft HoloLens 2
Typographical Features for Scene text Recognition
Typography With Decor: Intelligent text Style Transfer
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation
UATVR: Uncertainty-Adaptive text-Video Retrieval
Udifftext: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
UFineBench: Towards text-based Person Retrieval with Ultra-Fine Granularity
UFOGen: You Forward Once Large Scale text-to-Image Generation via Diffusion GANs
UHaT: Urdu handwritten text dataset
Ump: Unified Modality-Aware Prompt Tuning for text-Video Retrieval
Unambiguous Scene text Segmentation With Referring Expression Comprehension
Unambiguous text Localization and Retrieval for Cluttered Scenes
Unambiguous text Localization, Retrieval, and Recognition for Cluttered Scenes
Unconstrained end-to-end text reading with feature rectification
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from text to Image via CLIP Inversion
Uncorrelated Geo-text Inhibition Method Based on Voronoi K-Order and Spatial Correlations in Web Maps
Uncovering the Disentanglement Capability in text-to-Image Diffusion Models
Uncurated Image-text Datasets: Shedding Light on Demographic Bias
Understand Layout and Translate text: Unified Feature-Conductive End-to-End Document Image Translation
Understanding and Mitigating Toxicity in Image-text Pretraining Datasets: A Case Study on LLaVA
Understanding Handwritten text in a Structured Environment
Understanding Plane Geometry Problems by Integrating Relations Extracted from text and Diagram
Understanding Video Scenes through text: Insights from Text-based Video Question Answering
Understanding Video Scenes through text: Insights from Text-based Video Question Answering
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized text-to-Image Generation
Unidream: Unifying Diffusion Priors for Relightable text-to-3d Generation
Unified Adaptive Relevance Distinguishable Attention Network for Image-text Matching
Unified Approach for text-and Image-Guided 4D Scene Generation, A
Unified Coarse-to-Fine Alignment for Video-text Retrieval
Unified Contrastive Learning in Image-text-Label Space
Unified Framework for Multioriented text Detection and Recognition, A
Unified Framework for Tracking Based text Detection and Recognition from Web Videos, A
unified framework of data augmentation using large language models for text-based cross-modal retrieval, A
Unified learning for image-text alignment via multi-scale feature fusion
unified method for augmented incremental recognition of online handwritten Japanese and English text, A
Unified Performance Evaluation for OCR Zoning: Calculating Page Segmentation's Score, That Includes text Zones, Tables and Non-text Objects
Unified Performance Evaluation for OCR Zoning: Calculating Page Segmentation's Score, That Includes text Zones, Tables and Non-text Objects
Unified Pre-training with Pseudo texts for Text-To-Image Person Re-identification
Unified Pre-training with Pseudo texts for Text-To-Image Person Re-identification
Unified Prompt Attack Against text-to-Image Generation Models
Unified text Extraction Method for Instructional Videos, A
Unifying Vision, text, and Layout for Universal Document Processing
UniMultNet: Action recognition method based on multi-scale feature fusion and video-text constraint guidance
Uniprocessor: A text-induced Unified Low-level Image Processor
unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network, A
Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing text-to-Speech System in Hindi
UniTAB: Unifying text and Box Outputs for Grounded Vision-Language Modeling
Uniter: Universal Image-text Representation Learning
UniTMGE: Uniform text-Motion Generation and Editing Model via Diffusion
Unleashing text-to-Image Diffusion Models for Visual Perception
Unleashing text-to-image Diffusion Prior for Zero-shot Image Captioning
Unlocking textual and Visual Wisdom: Open-vocabulary 3d Object Detection Enhanced by Comprehensive Guidance from Text and Image
Unpaired Image-text Matching via Multimodal Aligned Conceptual Knowledge
Unsupervised Alignment of News Video and text Using Visual Patterns and Textual Concepts
Unsupervised Approach for Video text Localization, An
Unsupervised Block Covering Analysis for text-Line Segmentation of Arabic Ancient Handwritten Document Images
Unsupervised categorization of heterogeneous text images based on fractals
Unsupervised clustering of text entities in heterogeneous grey level documents
Unsupervised Co-Generation of Foreground-Background Segmentation from text-to-Image Synthesis
Unsupervised Compositional Concepts Discovery with text-to-Image Generative Models
Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-text Correspondences in Remote Sensing, An
Unsupervised deep learning for text line segmentation
Unsupervised Domain Adaptation via Class Aggregation for text Recognition
Unsupervised Domain Adaptation with Imbalanced Character Distribution for Scene text Recognition
Unsupervised Image and text Fusion for Travel Information Enhancement
Unsupervised language model adaptation for handwritten Chinese text recognition
Unsupervised Prompt Tuning for text-Driven Object Detection
Unsupervised refinement of color and stroke features for text binarization
Unsupervised Segmentation of text Fragments in Real Scenes
Unsupervised Speech text Localization in Comic Images
Unsupervised text Segmentation Using Color and Wavelet Features
Unsupervised text-to-image synthesis
Unsupervised writer adaptation applied to handwritten text recognition
Unveiling and Mitigating Memorization in text-to-image Diffusion Models Through Cross Attention
UP-Person: Unified Parameter-Efficient Transfer Learning for text-Based Person Retrieval
Urdu handwritten text recognition: a survey
Usage-Oriented Performance Evaluation for text Localization Algorithms
Use of a Dictionary in Conjunction with a Handwritten texts Recognizer
Use of an Evolutive Base of Models in a System for Reading Printed texts
Use of Captions and Other Collateral text in Understanding Photographs
Use of Collateral text in Image Interpretation
Use of Collateral text in Understanding Photos in Documents
Use of Global Context in Text Recognition, The
Use of the Hough transform to separate merged text/graphics in forms
USER: Unified Semantic Enhancement With Momentum Contrast for Image-text Retrieval
Using a boosted tree classifier for text segmentation in hand-annotated documents
Using a Probabilistic Syllable Model to Improve Scene text Recognition
Using Adaptive Run Length Smoothing Algorithm for Accurate text Localization in Images
Using an Exact Performance of Hough Transform for Image text Segmentation
Using Biographical texts as Linked Data for Prosopographical Research and Applications
Using double attention for text tattoo localisation
Using Hidden Markov Models as a Tool for Handwritten text Line Segmentation
Using histogram representation and Earth Mover's Distance as an evaluation tool for text detection
Using irregular pyramid for text segmentation and binarization of gray scale images
Using Kernel Density Classifier with Topic Model and Cost Sensitive Learning for Automatic text Categorization
Using Large text To Image Models with Structured Prompts for Skin Disease Identification: A Case Study
Using Mouse Feedback in Computer Assisted Transcription of Handwritten text Images
Using Multimodal Contrastive Knowledge Distillation for Video-text Retrieval
Using Multiple Frame Integration for the text Recognition of Video
Using Object Information for Spotting text
Using pyramid of histogram of oriented gradients on natural scene text recognition
Using Readers' Highlighting on Monochromatic Documents for Automatic text Transcription and Summarization
Using Scale-Space Anisotropic Smoothing for text Line Extraction in Historical Documents
Using Shape and Layout Information to Find Signatures, text, and Graphics
Using text to Teach Image Retrieval
Using Typical Testors for Feature Selection in text Categorization
Using web search engines to improve text recognition
Using Webcast text for Semantic Event Detection in Broadcast Sports Video
UT-GAN: A Novel Unpaired textual-Attention Generative Adversarial Network for Low-Light Text Image Enhancement
Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames
Uyghur Language text Detection in Complex Background Images Using Enhanced MSERs
Uyghur text Localization with Fast Component Detection
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for text-to-Image Generative Models
Variable-Length Speaker Conditioning in Flow-Based text-to-Speech
Variance Based Image Binarization Scheme and Its Application in text Segmentation, A
Variational Bayes Method for Handwritten text Line Segmentation, A
Variational Distribution Learning for Unsupervised text-to-Image Generation
Variational DNN embeddings for text-independent speaker verification
VATr++: Choose Your Words Wisely for Handwritten text Generation
VCD-texture: Variance Alignment Based 3D-2D Co-Denoising for Text-Guided Texturing
Vector Field Decomposition-Based Flow Matching for Zero-Shot Cross-Lingual text-to-Speech
Vector Quantized Diffusion Model for text-to-Image Synthesis
VectorFusion: text-to-SVG by Abstracting Pixel-Based Diffusion Models
VerbDiff: text-Only Diffusion Models with Enhanced Interaction Awareness
Verisimilar Image Synthesis for Accurate Detection and Recognition of texts in Scenes
Versatile Diffusion: text, Images and Variations All in One Diffusion Model
Vertical bar detection for gauging text similarity of document images
Vesselness for text detection in historical document images
VGSG: Vision-Guided Semantic-Group Network for text-Based Person Search
VicTR: Video-conditioned text Representations for Activity Recognition
Video Analysis -- Captions, text, Video Text
Video Analysis -- Captions, text, Video Text
Video and text Matching with Conditioned Embeddings
Video and text semantic center alignment for text-video cross-modal retrieval
Video and text semantic center alignment for text-video cross-modal retrieval
Video captioning with text-based dynamic attention and step-by-step learning
Video Diffusion, Video Sysnthesis, text to Video
Video Frame-wise Explanation Driven Contrastive Learning for Procedural text Generation
Video Generation from text Employing Latent Path Construction for Temporal Modeling
Video google: A text retrieval approach to object matching in videos
Video Question Answering Using Clip-Guided Visual-text Attention
Video Question Answering with Iterative Video-text Co-tokenization
Video Scene text Frames Categorization for Text Detection and Recognition
Video Scene text Frames Categorization for Text Detection and Recognition
Video Script Identification Based on text Lines
Video search in concept subspace: a text-like paradigm
Video Search with CLIP and Interactive text Query Reformulation
Video text detection and recognition: Dataset and benchmark
Video text Detection System Based on Automated Training, A
Video text Detection With Robust Feature Representation
Video text Extraction Using the Fusion of Color Gradient and Log-Gabor Filter
video text location method based on background classification, A
Video text recognition using feature compensation as category-dependent feature extraction
Video text recognition using sequential Monte Carlo and error voting methods
Video text Tracking with a Spatio-Temporal Complementary Model
Video, text, and Speech-Driven Realistic 3-D Virtual Head for Human-Machine Interface, A
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Video-text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Video-text Compliance: Activity Verification Based on Natural Language Instructions
Video-text Representation Learning via Differentiable Weak Temporal Alignment
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-text Models
VideoDirector: Precise Video Editing via text-to-Video Models
VideoDreamer: Customized Multi-Subject text-to-Video Generation With Disen-Mix Finetuning on Language-Video Foundation Models
VideoMage: Multi-Subject and Motion Customization of text-to-Video Diffusion Models
ViewDiff: 3D-Consistent Image Generation with text-to-Image Models
ViLEM: Visual-Language Error Modeling for Image-text Retrieval
ViLNM: Visual-Language Noise Modeling for text-to-Image Person Retrieval
VimTS: A Unified Video and Image text Spotter for Enhancing the Cross-Domain Generalization
VinTAGe: Joint Video and text Conditioning for Holistic Audio Generation
VIRES: Video Instance Repainting via Sketch and text Guided Generation
Vision and text Transformer for Predicting Answerability on Visual Question Answering
Vision-Aware text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
Vision-Language Matching for text-to-Image Synthesis via Generative Adversarial Networks
Vision-Language Pre-Training for Boosting Scene text Detectors
Vision-Language Relational Transformer for Video-to-text Generation
ViSTA: Vision and Scene text Aggregation for Cross-Modal Retrieval
Visual and text prompts guided interpretable network for universal low-dose CT MAR
Visual Attention Based Approach to text Extraction, A
Visual enhancement of incised text
Visual Re-ranking with Natural Language Understanding for text Spotting
Visual Semantic Reasoning for Image-text Matching
Visual Semantics: Extracting Visual Information from text Accompanying Pictures
Visual speaker authentication with random prompt texts by a dual-task CNN framework
Visual text Correction
Visual text Generation in the Wild
Visual text Recognition Through Contextual Processing
Visual Word Embedding for text Classification
Visual-Aware text as Query for Referring Video Object Segmentation
Visual-relation Conscious Image Generation from Structured-text
Visual-textual Capsule Routing for Text-Based Video Segmentation
Visualizing Unstructured text Sequences Using Iterative Visual Clustering
Visually-Enabled Active Deep Learning for (Geo) text and Image Classification: A Review
VisualRAG: Knowledge-Guided Retrieval Augmentation for Image-text Matching
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
ViTA: An Efficient Video-to-text Algorithm using VLM for RAG-based Video Analysis System
Viterbi algorithm as an aid in text recognition, The
Vividdreamer: Invariant Score Distillation for Hyper-realistic text-to-3d Generation
VMC: Video Motion Customization Using Temporal Attention Adaption for text-to-Video Diffusion Models
VODiff: Controlling Object Visibility Order in text-to-Image Generation
VOLTER: Visual Collaboration and Dual-Stream Fusion for Scene text Recognition
VolTex: Food Volume Estimation Using text-Guided Segmentation and Neural Surface Reconstruction
VoP: text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval
Vox-E: text-guided Voxel Editing of 3D Objects
VP3D: Unleashing 2D Visual Prompt for text-to-3D Generation
VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image-text Matching
VSRNet: End-to-end video segment retrieval with text query
VTC: Improving Video-text Retrieval with User Comments
VTD-FCENet: A Real-Time HD Video text Detection with Scale-Aware Fourier Contour Embedding
VTPL: Visual and text prompt learning for visual-language models
VTQA: Visual text Question Answering via Entity Alignment and Cross-Media Reasoning
VX2text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
W-A net: Leveraging Atrous and Deformable Convolutions for Efficient text Detection
Wacnet: Word Segmentation Guided Characters Aggregation Net for Scene text Spotting With Arbitrary Shapes
Was: Dataset and Methods for Artistic text Segmentation
Watch Your Steps: Local Image and Scene Editing by text Instructions
Watch Your Strokes: Improving Handwritten text Recognition with Deformable Convolutions
Watermark Removal Attack Against text-to-Image Generative Model Watermarking
Watermarking JBIG2 text Region for Image Authentication
Watermarking text document images using edge direction histograms
WaterVG: Waterway Visual Grounding Based on text-Guided Vision and mmWave Radar
Wave: Warping Ddim Inversion Features for Zero-shot text-to-video Editing
Wavelet feature domain adaptive noise reduction using learning algorithm for text-independent speaker recognition
Wavelet feature selection based neural networks with application to the text independent speaker identification
Wavelet-gradient-fusion for video text binarization
Weak supervision for generating pixel-level annotations in scene text segmentation
Weakly Supervised Attention Rectification for Scene text Recognition
Weakly Supervised Salient Object Detection with text Supervision
Weakly Supervised text-based Person Re-Identification
Weakly Supervised Video Moment Retrieval From text Queries
Weakly Supervised Video Representation Learning with Unaligned text for Sequential Videos
Weakly-Supervised 3D Spatial Reasoning for text-Based Visual Question Answering
Weakly-Supervised Alignment of Video with text
Weakly-Supervised text-driven Contrastive Learning for Facial Behavior Understanding
WEB Image Classification Based on the Fusion of Image and text Classifiers
Webly Supervised Image-text Embedding with Noisy Tag Refinement
WECROMCL: Weakly Supervised Cross-modality Contrastive Learning for Transcription-only Supervised text Spotting
Weighted Graph Embedding Feature with Bi-Directional Long Short-Term Memory Classifier for Multi-Document text Summarization
Well-calibrated confidence measures for multi-label text classification with a large number of labels
WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels
Wetext: Scene Text Detection under Weak Supervision
WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text
What Are You Talking About? text-to-Image Coreference
What does scene text tell us?
What If We Only Use Real Datasets for Scene text Recognition? Toward Scene Text Recognition With Fewer Labels
What If We Only Use Real Datasets for Scene text Recognition? Toward Scene Text Recognition With Fewer Labels
What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions
What is the Real Need for Scene text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties
What Is Wrong With Scene text Recognition Model Comparisons? Dataset and Model Analysis
What Machines See Is Not What They Get: Fooling Scene text Recognition Models With Adversarial Text Images
What Machines See Is Not What They Get: Fooling Scene text Recognition Models With Adversarial Text Images
When IC meets text: Towards a rich annotated integrated circuit text dataset
When IC meets text: Towards a rich annotated integrated circuit text dataset
Where you edit is what you get: text-guided image editing with region-based attention
Which super-resolution algorithm is proper for Farsi text image sequences
Who's Waldo? Linking People Across text and Images
Whole is Greater than Sum of Parts: Recognizing Scene text Words
Wikipedia-based semantic tensor space model for text analytics, A
Word Extraction from On-Line Handwritten text Lines
Word Image Matching as a Technique for Degraded text Recognition
Word segmentation in handwritten Korean text lines based on gap clustering techniques
Word segmentation of printed text lines based on gap clustering and special symbol detection
Word separation of unconstrained handwritten text lines in PCR forms
Word Shape Analysis in a Knowledge-Based System for Reading text
Word spotting and recognition via a joint deep embedding of image and text
Wordfence: text detection in natural images with border awareness
Wordrobe: text-guided Generation of Textured 3d Garments
Words Matter: Scene text for Image Classification and Retrieval
Words or Vision: Do Vision-Language Models Have Blind Faith in text?
WordSup: Exploiting Word Annotations for Character Based text Detection
WOUAF: Weight Modulation for User Attribution and Fingerprinting in text-to-Image Diffusion Models
Write a Classifier: Predicting Visual Classifiers from Unstructured text
Writer identification using text line based features
Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition
Writing speed normalization for on-line handwritten text recognition
X-Edit: Detecting and Localizing Edits in Images Altered by text-Guided Diffusion Models
X-Mesh: Towards Fast and Accurate text-driven 3D Stylization via Dynamic Textual Guidance
X-Pool: Cross-Modal Language-Video Attention for text-Video Retrieval
You'll Never Walk Alone: A Sketch and text Duet for Fine-Grained Image Retrieval
Your Student is Better than Expected: Adaptive Teacher-Student Collaboration for text-Conditional Diffusion Models
Zero-Painter: Training-Free Layout Control for text-to-Image Synthesis
Zero-Shot Composed Image Retrieval Considering Query-Target Relationship Leveraging Masked Image-text Pairs
Zero-Shot Contrastive Loss for text-Guided Diffusion Image Style Transfer
Zero-shot skeleton-based action recognition with dual visual-text alignment
Zero-shot spatial layout conditioning for text-to-image diffusion models
Zero-Shot Styled text Image Generation, but Make It Autoregressive
Zero-Shot Temporal Action Detection by Learning Multimodal Prompts and text-Enhanced Actionness
Zero-Shot text Classification with Semantically Extended Graph Convolutional Network
Zero-Shot text-Driven Dynamic Neural Radiance Fields Stylization
Zero-Shot text-Guided Object Generation with Dream Fields
Zero-Shot text-to-Parameter Translation for Game Character Auto-Creation
Zero-Shot Video Moment Retrieval With Angular Reconstructive text Embeddings
ZeroCap: Zero-Shot Image-to-text Generation for Visual-Semantic Arithmetic
Zone identification in the printed Gujarati text
4233 for text

Index for "t"


Last update:26-Feb-26 11:52:11
Use price@usc.edu for comments.