_ | language | _ |
12-in-1: Multi-Task Vision and | language | Representation Learning |
2D Appearance Based Techniques for Tracking the Signer Configuration in Sign | language | Video Recordings |
2D Oxide Picture | language | s and Their Properties |
3D immersive karaoke for the learning of foreign | language | pronunciation |
3D Modelling of Interior Spaces: Learning the | language | of Indoor Architecture |
3D visual pronunciation of Mandarine Chinese for | language | learning |
3d-array Token Petri Nets Generating Tetrahedral Picture | language | s |
3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural | language | |
@ CREPE: Can Vision- | language | Foundation Models Reason Compositionally? |
Abductive natural | language | inference by interactive model with structural loss |
ABINet++: Autonomous, Bidirectional and Iterative | language | Modeling for Scene Text Spotting |
Abstract families of matrices and picture | language | s |
Abstract Visual Reasoning Enabled by | language | |
Accelerating Vision- | language | Pretraining with Free Language Modeling |
Accelerating Vision- | language | Pretraining with Free Language Modeling |
Action Recognition in Still Images Using Word Embeddings from Natural | language | Descriptions |
Active Class Selection for Dataset Acquisition in Sign | language | Recognition |
Active Learning Approach for Statistical Spoken | language | Understanding, An |
Active Perception for Visual- | language | Navigation |
Active scene recognition with vision and | language | |
Active Visual Information Gathering for Vision- | language | Navigation |
Activity detection in conversational sign | language | video for mobile telecommunication |
ADAPT: Vision- | language | Navigation with Modality-Aligned Action Prompts |
Adapting Grounded Visual Question Answering Models to Low Resource | language | s |
Adaptive anti-spam filtering for agglutinative | language | s: A special case for Turkish |
Adaptive Cross-Modal Prototypes for Cross-Domain Visual- | language | Retrieval |
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and- | language | Inference |
Adaptive Spatio-Temporal Graph Enhanced Vision- | language | Representation for Video QA |
Adaptive Training for Robust Spoken | language | Understanding |
Adaptive Zone-aware Hierarchical Planner for Vision- | language | Navigation |
Advances in phonetics-based sub-unit modeling for transcription alignment and sign | language | recognition |
Advancing High-Resolution Video- | language | Representation with Large-Scale Video Transcriptions |
Adversarial Attribute-Text Embedding for Person Search With Natural | language | Query |
Adversarial Reinforced Instruction Attacker for Robust Vision- | language | Navigation |
AerialVLN: Vision-and- | language | Navigation for UAVs |
Affine-invariant modeling of shape-appearance images applied on sign | language | handshape classification |
Ai Explainability. A Bridge Between Machine Vision and Natural | language | Processing |
Airbert: In-Domain Pretraining for Vision-and- | language | Navigation |
Algorithm partition and parallel recognition of general context-free | language | s using fixed-size VLSI architecture |
Align and Prompt: Video-and- | language | Pre-training with Entity Prompts |
Aligned Image-Word Representations Improve Inductive Transfer Across Vision- | language | Tasks |
Aligning Source Visual and Target | language | Domains for Unpaired Video Captioning |
Aligning Subtitles in Sign | language | Videos |
Aligning vision- | language | for graph inference in visual dialog |
alignment based similarity measure for hand detection in cluttered sign | language | video, An |
ALIP: Adaptive | language | -Image Pre-training with Synthetic Caption |
All in One: Exploring Unified Video- | language | Pre-Training |
All You Can Embed: Natural | language | based Vehicle Retrieval with Spatio-Temporal Transformers |
American Sign | language | alphabet recognition using Microsoft Kinect |
American Sign | language | Lexicon Video Dataset, The |
American Sign | language | Phrase Verification in an Educational Game for Deaf Children |
American Sign | language | , ASL Recognition |
American Sign | language | : The Phonological Base |
Analysis and Description of Blinking in French Sign | language | for Automatic Generation |
Analysis and Predictive Modeling of Body | language | Behavior in Dyadic Interactions From Multimodal Interlocutor Cues |
Analysis of efficient lip reading method for various | language | s |
Analysis of Expressiveness of Portuguese Sign | language | Speakers |
Analysis of Sign | language | Gestures Using Size Functions and Principal Component Analysis |
Analysis of the Possibilities to Adapt the Foreign | language | Speech Recognition Engines for the Lithuanian Spoken Commands Recognition |
Analytical Method and Research of Uyghur | language | Chunks Based on Digital Forensics |
Anonymizing Temporal Phrases in Natural | language | Text to be Posted on Social Networking Services |
Anonysign: Novel Human Appearance Synthesis for Sign | language | Video Anonymisation |
Appearance-Based Recognition of Words in American Sign | language | |
application of semantic classification trees to natural | language | understanding, The |
Application of Stochastic | language | s to Fingerprint Pattern Recognition, An |
Applying (3+2+1)D Residual Neural Network with Frame Selection for Hong Kong Sign | language | Recognition |
Applying tactile | language | s for 3D navigation |
Applying the T5 | language | model and duration units normalization to address temporal common sense understanding on the MCTACO dataset |
Approach Based on Phonemes to Large Vocabulary Chinese Sign | language | Recognition, An |
approach for stemming in symbolically compressed Indian | language | imaged documents, An |
Approach to Estimate Perplexity Values for | language | Models Based on Phrase Classes, An |
Arabic Diacritics Restoration Using Maximum Entropy | language | Models |
Arabic Handwritten Word Spotting Using | language | Models |
Arabic script web page | language | identifications using decision tree neural networks |
ArabSign: A Multi-modality Dataset and Benchmark for Continuous Arabic Sign | language | Recognition |
Architecture Independent Programming | language | for Low-Level Vision, An |
ARNOLD: A Benchmark for | language | -Grounded Task Learning With Continuous States in Realistic 3D Scenes |
ArtEmis: Affective | language | for Visual Art |
Assessment of Image Manipulation Using Natural | language | Description: Quantification of Manipulation Direction |
Assistive Malaysian Sign | language | Application for D/HH Learning Using Visual Phonics |
Asymmetric 3D face model for Speech | language | Pathologist applications |
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural | language | Query |
Attention Based Natural | language | Grounding by Navigating Virtual Environment |
Attention-Based 3D-CNNs for Large-Vocabulary Sign | language | Recognition |
Attention-Based Natural | language | Person Retrieval |
Attentive Sequence to Sequence Translator for Localizing Video Clips by Natural | language | , An |
Audio Retrieval With Natural | language | Queries: A Benchmark Study |
Audiovisual Generalised Zero-shot Learning with Cross-modal Attention and | language | |
Augmenting Vision | language | Pretraining by Learning Codebook with Visual Semantics |
Australian sign | language | recognition |
Autism Blogs: Expressed Emotion, | language | Styles and Concerns in Personal and Community Settings |
Automated Aircraft Trajectory Prediction Based on Formal Intent-Related | language | Processing |
Automated extraction of signs from continuous sign | language | sentences using Iterated Conditional Modes |
Automated Python | language | -Based Tool for Creating Absence Samples in Groundwater Potential Mapping, An |
Automatic and Efficient Human Pose Estimation for Sign | language | Videos |
Automatic and Efficient Long Term Arm and Hand Tracking for Continuous Sign | language | TV Broadcasts |
Automatic continuous speech recogniser for Dravidian | language | s using the auto associative neural network |
Automatic Dense Annotation of Large-Vocabulary Sign | language | Videos |
Automatic detection of relevant head gestures in american sign | language | communication |
Automatic generation of Analog Hardware Description | language | (AHDL) code from cell culture images |
Automatic Generation of German Sign | language | Glosses from German Words |
Automatic generation of HMM topology for sign | language | recognition |
Automatic hand trajectory segmentation and phoneme transcription for sign | language | |
Automatic Pronunciation Transliteration for Chinese-English Mixed | language | Keyword Spotting |
Automatic Quality Control of Transportation Reports Using Statistical | language | Processing |
Automatic Recognition of Colloquial Australian Sign | language | |
Automatic recognition of fingerspelled words in British Sign | language | |
Automatic sign | language | analysis: A survey and the future beyond lexical meaning |
Automatic sign | language | identification |
Automatic skin segmentation and tracking in sign | language | recognition |
Automatic Summary Creation by Applying Natural | language | Processing on Usntructured Medical Records |
Bayesian Prompt Learning for Image- | language | Model Generalization |
Behavioral Analysis of Vision-and- | language | Navigation Agents |
Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and | language | |
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and- | language | Models |
Benchmark and Baseline for | language | -driven Image Editing, A |
BERTHop: An Effective Vision-and- | language | Model for Chest X-ray Disease Diagnosis |
Beyond the Nav-Graph: Vision-and- | language | Navigation in Continuous Environments |
Bi-channel sensor fusion for automatic sign | language | recognition |
Biases of Pre-Trained | language | Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection, The |
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision- | language | Models |
Bidirectional | language | Model for Handwriting Recognition |
Bird's-Eye-View Scene Graph for Vision- | language | Navigation |
Bitstream syntax description | language | for 3D MPEG-4 view-dependent texture streaming |
Black Box Few-Shot Adaptation for Vision- | language | models |
Black-Box Attacks on Image Activity Prediction and its Natural | language | Explanations |
Blind Image Quality Assessment via Vision- | language | Correspondence: A Multitask Learning Perspective |
Block-based histogram of optical flow for isolated sign | language | recognition |
BMRN: Boundary Matching and Refinement Network for Temporal Moment Localization with Natural | language | |
Body | language | Based Individual Identification in Video Using Gait and Actions |
Body Posture Estimation in Sign | language | Videos |
Boosted subunits: a framework for recognising sign | language | from videos |
Borrowing Knowledge From Pre-trained | language | Model: A New Data-efficient Visual Learning Paradigm |
Bottom Up Top Down Detection Transformers for | language | Grounding in Images and Point Clouds |
Brazilian Sign | language | Recognition Using Kinect |
Breaking Common Sense: WHOOPS! A Vision-and- | language | Benchmark of Synthetic and Compositional Images |
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and- | language | Navigation |
Bridging Vision and | language | Encoders: Parameter-Efficient Tuning for Referring Image Segmentation |
BSL-1K: Scaling Up Co-articulated Sign | language | Recognition Using Mouthing Cues |
BUS: Efficient and Effective Vision- | language | Pre-training with Bottom-Up Patch Summarization |
C2SLR: Consistency-enhanced Continuous Sign | language | Recognition |
C2ST: Cross-modal Contextualized Sequence Transduction for Continuous Sign | language | Recognition |
cache-based natural | language | model for speech recognition, A |
Can a Machine Generate Humanlike | language | Descriptions for a Remote Sensing Image? |
Can | language | Models Learn to Listen? |
Can RNNs reliably separate script and | language | at word and line level? |
Candidate fusion: Integrating | language | modelling into a sequence-to-sequence handwritten word recognition architecture |
CANZSL: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural | language | |
Captured Motion Data Processing for Real Time Synthesis of Sign | language | |
catchment feature model for multimodal | language | analysis, The |
Categorizing Concepts with Basic Level for Vision-to- | language | |
Category-Based | language | Models for Handwriting Recognition of Marriage License Books |
Causal Attention for Vision- | language | Tasks |
CDVA/VCM: | language | for Intelligent and Autonomous Vehicles |
Cepstral Domain Teager Energy for Identifying Perceptually Similar | language | s |
CGiS, a new | language | for Data-parallel GPU Programming |
ChaLearn LAP Large Scale Signer Independent Isolated Sign | language | Recognition Challenge: Design, Results and Future Research |
ChaLearn Looking at People: Sign | language | Recognition in the Wild and Large Scale Signer Independent Isolated SLR Challenge |
ChangeCLIP: Remote sensing change detection with multimodal vision- | language | representation learning |
character recognizer for Turkish | language | , A |
Characteristics of Bottom-Up Parsable edNLC Graph | language | s for Syntactic Pattern Recognition |
Characterization and Distinction Between Closely Related South Slavic | language | s on the Example of Serbian and Croatian |
ChestXRayBERT: A Pretrained | language | Model for Chest Radiology Report Summarization |
Chinese sign | language | recognition based on trajectory and hand shape features |
Chinese sign | language | recognition system based on SOFM/SRN/HMM, A |
Choosing Basic-Level Concept Names Using Visual and | language | Context |
Chronologicon Hibernicum: A Probabilistic Chronological Framework for Dating Early Irish | language | Developments and Literature |
CiCo: Domain-Aware Sign | language | Retrieval via Cross-Lingual Contrastive Learning |
CIGLI: Conditional Image Generation from | language | & Image |
CiT: Curation in Training for Effective Vision- | language | Data |
City Geography Markup | language | : CityGML |
CLAMP: Prompt-based Contrastive Learning for Connecting | language | and Animal Pose |
Classification of brain activities during | language | and music perception |
Classification of extreme facial events in sign | language | videos |
Classification of Noisy Free-Text Prostate Cancer Pathology Reports Using Natural | language | Processing |
Classifying Ambiguities in a Visual Spatial | language | |
CLEF Cross | language | Image Retrieval Track (ImageCLEF) 2004, The |
CLEVR: A Diagnostic Dataset for Compositional | language | and Elementary Visual Reasoning |
CLID: A Chunk-Level Intent Detection Framework for Multiple Intent Spoken | language | Understanding |
CLIMS: Cross | language | Image Matching for Weakly Supervised Semantic Segmentation |
CLIP goes 3D: Leveraging Prompt Tuning for | language | Grounded 3D Recognition |
CLIP, Contrastive | language | -Image Pre-Training |
CLIP-Adapter: Better Vision- | language | Models with Feature Adapters |
CLIP-FG: Selecting Discriminative Image Patches by Contrastive | language | -Image Pre-Training for Fine-Grained Image Classification |
CLIP-Guided Vision- | language | Pre-training for Question Answering in 3D Scenes |
CLIP-S4: | language | -Guided Self-Supervised Semantic Segmentation |
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural | language | |
CLIP2: Contrastive | language | -Image-Point Pretraining from Real-World Point Cloud Data |
ClipCrop: Conditioned Cropping Driven by Vision- | language | Model |
CLIPPING: Distilling CLIP-Based Models with a Student Base for Video- | language | Retrieval |
CLIPPO: Image-and- | language | Understanding from Pixels Only |
Closing the Loop Between Vision and | language | |
Clover: Towards A Unified Video- | language | Alignment and Fusion Model |
CLUE: Contrastive | language | -guided learning for referring video object segmentation |
Co-Guiding for Multi-Intent Spoken | language | Understanding |
Collaborative Multilingual Continuous Sign | language | Recognition: A Unified Framework |
Collaborative Spatial-Temporal Modeling for | language | -Queried Video Actor Segmentation |
Collaborative Static and Dynamic Vision- | language | Streams for Spatio-Temporal Video Grounding |
Collage of Iso-Picture | language | s and P Systems |
Color Universal | language | and Dictionary of Names |
Color-Based Hands Tracking System for Sign | language | Recognition |
Combination of Tangent Distance and an Image Distortion Model for Appearance-Based Sign | language | Recognition |
Combinational sign | language | recognition |
Combining Cepstral and Prosodic Features in | language | Identification |
Combining Two Synchronisation Methods in a Linguistic Model to Describe Sign | language | |
Compact and Efficient Multitask Learning in Vision, | language | and Speech |
Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque | language | Identification |
comparative study on the sign- | language | communication systems between Korea and Japan through 2D and 3D character models on the Internet, A |
Comparing On-Line Recognition of Japanese and Western Script in Preparation for Recognizing Multi- | language | Documents |
Comparing World City Networks by | language | : A Complex-Network Approach |
comparison between end-to-end approaches and feature extraction based approaches for Sign | language | recognition, A |
Comparison Between Etymon- and Word-Based Chinese Sign | language | Recognition Systems, A |
Comparison of HMM and DTW for Isolated Word Recognition System of Punjabi | language | |
Complete System for the Specification and the Generation of Sign | language | Gestures, A |
complexity of Bayesian networks specified by propositional and relational | language | s, The |
Comprehensive Attribute Prediction Learning for Person Search by | language | |
Comprehensive Facial Expression Synthesis Using Human-Interpretable | language | |
comprehensive neural-based approach for text recognition in videos using natural | language | processing, A |
Comprehensive Study on Deep Learning-Based Methods for Sign | language | Recognition, A |
Comprehensive Visual Features and Pseudo Labeling for Robust Natural | language | -based Vehicle Retrieval |
Computational Intelligibility Model for Assessment and Compression of American Sign | language | Video, A |
Computational linguistics processing in indigenous | language | |
Computer Graphic Modeling of American Sign | language | |
Computer Vision and Natural | language | Processing: Recent Approaches in Multimedia and Robotics |
Computer vision based approach for Indian Sign | language | character recognition |
Computer-Assisted Audiovisual | language | Learning |
comRAT-C: A computational compound Remote Associates Test solver based on | language | data and its comparison to human performance |
Conceptual representations between video signals and natural | language | descriptions |
Conditional Prompt Learning for Vision- | language | Models |
Conditional Random Fields in Speech, Audio, and | language | Processing |
Conditional Sentence Generation and Cross-Modal Reranking for Sign | language | Translation |
Confusion Network Based Recurrent Neural Network | language | Modeling for Chinese OCR Error Detection |
Connecting | language | and Vision for Natural Language-Based Vehicle Retrieval |
Connecting | language | and Vision for Natural Language-Based Vehicle Retrieval |
Connecting Vision and | language | with Localized Narratives |
Connecting Vision and | language | with Video Localized Narratives |
Construction of Computational Lexicon for Malay | language | |
Content4All Open Research Sign | language | Translation Datasets |
Context Matters: Self-Attention for Sign | language | Recognition |
Context Supplied by Text or | language | |
Context-aware Alignment and Mutual Masking for 3D- | language | Pre-training |
Contextually Customized Video Summaries Via Natural | language | |
Continuous 3D Multi-Channel Sign | language | Production via Progressive Transformers and Mixture Density Networks |
Continuous Chinese Sign | language | Recognition System, A |
Continuous recognition of motion based gestures in sign | language | |
Continuous sign | language | recognition based on hierarchical memory sequence network |
Continuous sign | language | recognition using level building based on fast hidden Markov model |
Continuous Sign | language | Recognition via Reinforcement Learning |
Continuous Sign | language | Recognition with Correlation Network |
Continuous Sign | language | Recognition with Iterative Spatiotemporal Fine-tuning |
Continuous sign | language | recognition: Towards large vocabulary statistical recognition systems handling multiple signers |
Contrastive Learning for Natural | language | -Based Vehicle Retrieval |
Contrastive Vision- | language | Pre-training with Limited Resources |
Contribution of recurrent connectionist | language | models in improving LSTM-based Arabic text recognition in videos |
Conversational Agent Module for French Sign | language | Using Kinect Sensor |
COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision- | language | Representation |
CopyCat: An American Sign | language | game for deaf children |
CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign | language | Recognition |
COTS: Collaborative Two-Stream Vision- | language | Pre-Training Model for Cross-Modal Retrieval |
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision- | language | Navigation |
Counterfactual Vision and | language | Learning |
Counterfactual Vision-and- | language | Navigation via Adversarial Path Sampler |
Counterfactual VQA: A Cause-Effect Look at | language | Bias |
Coupled HMM-based multi-sensor data fusion for sign | language | recognition |
Coupled-dynamic learning for vision and | language | : Exploring Interaction between different tasks |
CoWs on Pasture: Baselines and Benchmarks for | language | -Driven Zero-Shot Object Navigation |
Creating word-level | language | models for handwriting recognition |
Creating word-level | language | models for large-vocabulary handwriting recognition |
Cross Transferring Activity Recognition to Word Level Sign | language | Detection |
Cross-Attention BERT-Based Framework for Continuous Sign | language | Recognition, A |
Cross-Aware Early Fusion With Stage-Divided Vision and | language | Transformer Encoders for Referring Image Segmentation |
Cross-Dataset Study on the Brazilian Sign | language | Translation, A |
Cross- | language | framework for word recognition and spotting of Indic scripts |
Cross- | language | Sensitive Words Distribution Map: A Novel Recognition-Based Document Understanding Method for Uighur and Tibetan |
Cross-lingual few-shot sign | language | recognition |
Cross-Lingual Vocal Emotion Recognition in Five Native | language | s of Assam Using Eigenvalue Decomposition |
Cross-Modal Knowledge Adaptation for | language | -Based Person Search |
Cross-modal Map Learning for Vision and | language | Navigation |
Cross-modal Target Retrieval for Tracking by Natural | language | |
CrowdCLIP: Unsupervised Crowd Counting via Vision- | language | Model |
CSLDS: Chinese sign | language | dialog system |
CSLT-AK: Convolutional-embedded transformer with an action tokenizer and keypoint emphasizer for sign | language | translation |
CTP: Towards Vision- | language | Continual Pretraining via Compatible Momentum Contrast and Topology Preservation |
Curriculum Learning for Data-Efficient Vision- | language | Alignment |
Cursive Script, Word Level Recognition, Word Spotting, | language | Model |
Curve Matching from the View of Manifold for Sign | language | Recognition |
cViL: Cross-Lingual Training of Vision- | language | Models using Knowledge Distillation |
CVML: An XML-based computer vision markup | language | |
CVT-SLR: Contrastive Visual-Textual Transformation for Sign | language | Recognition with Variational Alignment |
Cyberbullying Detection in Code-Mixed | language | s: Dataset and Techniques |
Data augmentation and | language | model adaptation using singular value decomposition |
data model and query | language | for spatio-temporal decision support, A |
Data Structures for Image Processing in a C | language | and Unix Environment |
Data-Efficient | language | -Supervised Zero-Shot Learning with Self-Distillation |
database for handwriting recognition research in Sinhala | language | , A |
Dataset for Interactive Vision- | language | Navigation with Unknown Command Feasibility, A |
Dauphin: A Signal Processing | language | - Statistical Signal Processing Made Easy |
DCA-based unimodal feature-level fusion of orthogonal moments for Indian sign | language | dataset |
Deaf-and-mute sign | language | generation system |
Dealing with Space in Natural | language | Processing |
DeAR: Debiasing Vision- | language | Models with Additive Residuals |
Decoding the | language | of Human Movement |
Deep learning based | language | and orientation recognition in document analysis |
Deep Learning of Mouth Shapes for Sign | language | |
Deep motion templates and extreme learning machine for sign | language | recognition |
Deep Natural | language | Inference Predictor Without Language-specific Training Data, A |
Deep Natural | language | Inference Predictor Without Language-specific Training Data, A |
Deep Neural Framework for Continuous Sign | language | Recognition by Iterative Training, A |
Deep Neural Network Approaches to Speaker and | language | Recognition |
Deep Relational Reasoning for the Prediction of | language | Impairment and Postoperative Seizure Outcome Using Preoperative DWI Connectome Data of Children With Focal Epilepsy |
Deep Sign: Enabling Robust Statistical Continuous Sign | language | Recognition via Hybrid CNN-HMMs |
Deep Sign: Hybrid CNN-HMM for Continuous Sign | language | Recognition |
Definition and recovery of kinematic features for recognition of American sign | language | movements |
Dense Recognition of Spoken | language | s |
DenseCLIP: | language | -Guided Dense Prediction with Context-Aware Prompting |
Depression Detection Using Deep Learning and Natural | language | Processing Techniques: A Comparative Study |
Describing Textures Using Natural | language | |
Descriptive and Prescriptive | language | s for Mobility Tasks: Are They Different? |
Design Information Extraction and Visual Representation based on Artificial Intelligence Natural | language | Processing Techniques |
Design of a Query | language | for Accessing Spatial Analysis in the Web Environment |
DeSIRe: Deep Signer-Invariant Representations for Sign | language | Recognition |
Detecting Coarticulation in Sign | language | using Conditional Random Fields |
Detecting Hand-Head Occlusions in Sign | language | Video |
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on | language | Embedding |
Detection of Dynamic Structures of Speech Fundamental Frequency in Tonal | language | s |
Detection of Dyslexic Children Using Machine Learning and Multimodal Hindi | language | Eye-Gaze-Assisted Learning System |
Detection of toxicity in social media based on Natural | language | Processing methods |
Determination of the Script and | language | Content of Document Images |
Devanagari OCR using a recognition driven segmentation framework and stochastic | language | models |
development of hierarchical visual | language | s, The |
Devising interactive access techniques for Indian | language | document images |
Discriminative Bimodal Networks for Visual Localization and Detection with Natural | language | Queries |
Discriminative Exemplar Coding for Sign | language | Recognition With Kinect |
Distilling Cross-Temporal Contexts for Continuous Sign | language | Recognition |
Distilling Knowledge of Bidirectional | language | Model for Scene Text Recognition |
Distilling Large Vision- | language | Model with Out-of-Distribution Generalizability |
Distilling Vision- | language | Pre-Training to Collaborate with Weakly-Supervised Temporal Action Localization |
Distorted Pattern Analysis with the Help of Node Label Controlled Graph | language | s |
Distribution-Aware Prompt Tuning for Vision- | language | Models |
Document | language | Classification: Hierarchical Model with Deep Learning Approach |
domain specific | language | for spatial simulation scenarios, A |
Domain-Specific | language | for Land Administration System Transactions |
DORi: Discovering Object Relationships for Moment Localization of a Natural | language | Query in a Video |
Dravidian | language | Identification System, A |
Dreamwalker: Mental Planning for Continuous Vision- | language | Navigation |
Drinking From a Firehose: Continual Learning With Web-Scale Natural | language | |
Dual Modality Prompt Tuning for Vision- | language | Pre-Trained Model |
Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural | language | Description of Motions |
DUN: Dual-path Temporal Matching Network for Natural | language | -based Vehicle Retrieval |
Dynamic Cross-Feature Fusion for American Sign | language | Translation |
Dynamic Gesture Recognition System for the Korean Sign | language | (KSL), A |
Dynamic hand gesture recognition of sign | language | using geometric features learning |
Dynamic Inference with Grounding Based Vision and | language | Models |
Dynamic Multimodal Instance Segmentation Guided by Natural | language | Queries |
Dynamic Sign | language | Recognition Based on Improved R(2+1)D Algorithm |
Dynamic Skin Detection in Color Images for Sign | language | Recognition |
Dynamic-static unsupervised sequentiality, statistical subunits and lexicon for sign | language | recognition |
Dynamically programmed automata for quasi context sensitive | language | s as a tool for inference support in pattern recognition-based real-time control expert systems |
e-ViL: A Dataset and Benchmark for Natural | language | Explanations in Vision-Language Tasks |
e-ViL: A Dataset and Benchmark for Natural | language | Explanations in Vision-Language Tasks |
ECO: Ensembling Context Optimization for Vision- | language | Models |
Edit Distance of Regular | language | s |
Editorial: Spatio-Temporal Data Models and | language | s |
Effect of Spatial and Temporal Occlusion on Word Level Sign | language | Recognition, The |
Effect of Various Visual Speech Units on | language | Identification Using Visual Speech Recognition |
Effective End-to-End Vision | language | Pretraining With Semantic Visual Loss |
Effective Uyghur | language | Text Detection in Complex Background Images for Traffic Prompt Identification |
Efficient approximations to model-based joint tracking and recognition of continuous sign | language | |
Efficient Brazilian Sign | language | Recognition: A Study on Mobile Devices |
Efficient Parallel Audio Generation Using Group Masked | language | Modeling |
Efficient sign | language | video representation |
Egocentric Biochemical Video-and- | language | Dataset |
EgoTV: Egocentric Task Verification from Natural | language | Task Descriptions |
EgoVLPv2: Egocentric Video- | language | Pre-training with Fusion in the Backbone |
Embil: An English-manipuri Bi-lingual Benchmark for Scene Text Detection and | language | Identification |
Embodied | language | Grounding With 3D Visual Feature Representations |
Emotion Correlation Mining Through Deep Learning Models on Natural | language | Text |
Emotion recognition in never-seen | language | s using a novel ensemble method with emotion profiles |
Emotion recognition using MLP and GMM for Oriya | language | |
Emotion Recognition, Body Gestures, Body | language | |
Emotional Expression in Virtual Agents Through Body | language | |
emotional expression of punctuation in the network | language | , The |
Empirical Study of End-to-End Video- | language | Transformers with Masked Visual Modeling, An |
Empirical Study of | language | CNN for Image Captioning, An |
Empirical Study of Training End-to-End Vision-and- | language | Transformers, An |
End-to-end subtitle detection and recognition for videos in East Asian | language | s via CNN ensemble |
Enhanced continuous sign | language | recognition using PCA and neural network features |
Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign | language | Recognition |
Enhanced Sign | language | Recognition Using Weighted Intrinsic-Mode Entropy and Signer's Level of Deafness |
Enhancing a Sign | language | Translation System with Vision-Based Features |
Enhancing Video Summarization via Vision- | language | Embedding |
Enhancing Visual Grounding in Vision- | language | Pre-Training With Position-Guided Text Prompts |
Envedit: Environment Editing for Vision-and- | language | Navigation |
Environment-Agnostic Multitask Learning for Natural | language | Grounded Navigation |
eP-ALM: Efficient Perceptual Augmentation of | language | Models |
Episodic Transformer for Vision-and- | language | Navigation |
Equivariant Similarity for Vision- | language | Foundation Models |
Error Detection in Highly Inflectional | language | s |
Error-correcting tree | language | inference |
European | language | Determination from Image |
Evaluating the Immediate Applicability of Pose Estimation for Sign | language | Recognition |
Evaluation of Automatically Generated Video Captions Using Vision and | language | Models |
Evaluation of features for automated transcription of dual-handed sign | language | alphabets |
Evaluation of neural network | language | models in handwritten Chinese text recognition |
Evaluation of threshold model HMMS and Conditional Random Fields for recognition of spatiotemporal gestures in sign | language | |
Ex. FRAF: An extensible | language | including graphical operations |
EXIF as | language | : Learning Cross-Modal Associations between Images and Camera Metadata |
Expanding | language | -Image Pretrained Models for General Video Recognition |
Expanding Training Set for Chinese Sign | language | Recognition |
Experience with and Requirements for a Gesture Description | language | for Synthetic Animation |
Experiments on Recognising the Handshape in Blobs Extracted from Sign | language | Videos |
Experiments with an On-Line Picture | language | |
Explaining Face Presentation Attack Detection Using Natural | language | |
Explaining Vision and | language | through Graphs of Events in Space and Time |
Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign | language | and Semaphoric Hand Gestures |
Exploiting the Logits: Joint Sign | language | Recognition and Spell-Correction |
Exploiting Unlabeled Data with Vision and | language | Models for Object Detection |
Exploration of Interactive Foreign | language | Teaching Mode Based on Artificial Intelligence |
Exploring | language | Hierarchy for Video Grounding |
Exploring Temporal Concurrency for Video- | language | Representation Learning |
Exploring the Effect of Primitives for Compositional Generalization in Vision-and- | language | |
Exploring Vision- | language | Models for Imbalanced Learning |
Expressive | language | and Interface for Image Querying, An |
Extracting a Domain Theory from Natural | language | to Construct a Knowledge Base for Visual Recognition |
Extraction of 3D hand shape and posture from images sequences from sign | language | recognition |
Extraction of Hand Features for Recognition of Sign | language | Words |
Extraction of Spelling Variations from | language | Structure for Noisy Text Correction |
eye as the window of the | language | ability: Estimation of English skills by analyzing eye movement while reading documents, The |
F-SCP: An automatic prompt generation method for specific classes based on visual | language | pre-training models |
Face Modeling and Animation | language | for MPEG-4 XMT Framework |
Facial expressions in American sign | language | : Tracking and recognition |
FAME-ViL: Multi-Tasking Vision- | language | Model for Heterogeneous Fashion Tasks |
Fashion IQ: A New Dataset Towards Retrieving Images by Natural | language | Feedback |
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision- | language | Pre-Training |
FashionViL: Fashion-Focused Vision-and- | language | Representation Learning |
FashionVLP: Vision | language | Transformer for Fashion Retrieval with Feedback |
Fast multi- | language | LSTM-based online handwriting recognition |
Fast sign | language | recognition benefited from low rank approximation |
FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-Gram | language | Model |
Feature Selection Based on Mutual Information for | language | Recognition |
FedVLN: Privacy-Preserving Federated Vision-and- | language | Navigation |
Few-shot Font Style Transfer between Different | language | s |
Film | language | |
Filter Flow Visual Querying | language | and Interface for Spatial Databases, A |
Filtering, Distillation, and Hard Negatives for Vision- | language | Pre-Training |
Financial Document Processing Based on Staff Line and Description | language | |
Find and Focus: Retrieve and Localize Video Events with Natural | language | Queries |
Finding a Structure: Evaluating Different Modelling | language | s Regarding Their Suitability of Designing Agent-Based Models |
Finding locations of Flickr resources using | language | models and similarity search |
FindIt: Generalized Localization with Natural | language | Queries |
Fine-Grained Image Classification via Combining Vision and | language | |
FineHand: Learning Hand Shapes for American Sign | language | Recognition |
Fingerspelling Detection in American Sign | language | |
Finite fuzzy automata, regular fuzzy | language | s, and pattern recognition |
FLAG3D: A 3D Fitness Activity Dataset with | language | Instruction |
FLAVA: A Foundational | language | And Vision Alignment Model |
FLIP: Cross-domain Face Anti-spoofing with | language | Guidance |
fMRI Study of Chinese Sign | language | in Functional Cortex of Prelingual Deaf Signers, An |
Focus and Align: Learning Tube Tokens for Video- | language | Pre-Training |
Fooling Vision and | language | Models Despite Localization and Attention Mechanism |
Formal Intent-Based Trajectory Description | language | s |
Formal | language | Modeling and Simulations of Incident Management |
Formal specification of image processing primitives in a functional | language | |
Formal System for Texture | language | s, A |
FPGA 2D-convolution unit based on the CAPH | language | , An |
fractal-based image processing | language | : Formal Modeling, A |
Framework for Motion Recognition with Applications to American Sign | language | and Gait Recognition, A |
Framework for Recognizing the Simultaneous Aspects of American Sign | language | , A |
Framework for Sign | language | Sentence Recognition by Commonsense Context, A |
Framework for the Recognition of Nonmanual Markers in Segmented Sequences of American Sign | language | , A |
Free-text keystroke dynamics authentication for Arabic | language | |
French Sign | language | Processing: Verb Agreement |
French Sign | language | : Proposition of a Structural Explanation by Iconicity |
From Captions to Explanations: A Multimodal Transformer-based Architecture for Natural | language | Explanation Generation |
From Chinese Rooms to Irish Rooms: New Words on Visions for | language | |
From GIS to BIM and back again: A Spatial Query | language | for 3D building models and 3D city models |
From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large | language | Models |
From Scarcity to Understanding: Transfer Learning for the Extremely Low Resource Irish Sign | language | |
From Two to One: A New Scene Text Recognizer with Visual | language | Modeling Network |
Fully Convolutional Networks for Continuous Sign | language | Recognition |
Fusing Pre-Trained | language | Models with Multimodal Prompts through Reinforcement Learning |
Fusing Two Directions in Cross-Domain Adaption for Real Life Person Search by | language | |
Fusion-Attention Network for person search with free-form natural | language | |
Fusion-based Spatiotemporal Convolutions with Constant Temporal Snapshots for Sign | language | Recognition |
Fuzzy Analysis of Classifier Handshapes from 3D Sign | language | Data |
Fuzzy handwriting description | language | : FOHDEL |
Fuzzy Positioning Modeling of Natural | language | Location Description |
G3raphGround: Graph-Based | language | Grounding |
Gaussian Mixture Representation of Gesture Kinematics for On-Line Sign | language | Video Annotation, A |
Gaussian Process Dynamical Models for hand gesture interpretation in Sign | language | |
Gaussian Segmentation and Tokenization for Low Cost | language | Identification |
Generalized Decoding for Pixel, Image, and | language | |
Generating coherent natural | language | annotations for video streams |
Generating Construction Safety Observations via Clip-based Image- | language | Embedding |
Generating Multi-sentence Natural | language | Descriptions of Indoor Scenes |
Generating Natural | language | Description of Human Behavior from Video Images |
Generating natural | language | tags for video information management |
Generating Verbal Descriptions of Colored Objects: Towards Grounding | language | in Perception |
Generative Negative Text Replay for Continual Vision- | language | Pretraining |
Geodesic active regions for segmentation and tracking of human gestures in sign | language | videos |
Geographic Information Retrieval Method for Geography Mark-Up | language | Data |
Geographic Named Entity Recognition by Employing Natural | language | Processing and an Improved BERT Model |
Geometric Features for Improving Continuous Appearance-based Sign | language | Recognition |
Geometric transformations in a lazy functional | language | |
Geotagging Text Content With | language | Models and Feature Mining |
GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and- | language | Navigation |
Gesture and Sign | language | Recognition with Temporal Residual Networks |
GIVL: Improving Geographical Inclusivity of Vision- | language | Models with Pre-Training Methods |
Globetrotter: Connecting | language | s by Connecting Images |
Gloss Attention for Gloss-free Sign | language | Translation |
Gloss-free Sign | language | Translation: Improving from Visual-Language Pretraining |
Gloss-free Sign | language | Translation: Improving from Visual-Language Pretraining |
Glove-Based Continuous Arabic Sign | language | Recognition in User-Dependent Mode |
Going Beyond Nouns With Vision & | language | Models Using Synthetic Data |
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision- | language | Models |
Grammar Based Analysis, | language | Issues, Natural Language |
Grammar Based Analysis, | language | Issues, Natural Language |
Grammar/Prosody Modelling in Greek Sign | language | : Towards the Definition of Built-In Sign Synthesis Rules |
Graph image | language | techniques supporting radiological, hand image interpretations |
Graph-Based Multimodal Sequential Embedding for Sign | language | Translation |
Graphic | language | s |
Graphical object recognition using statistical | language | models |
Greedy-layer pruning: Speeding up transformer models for natural | language | processing |
GridMM: Grid Memory Map for Vision-and- | language | Navigation |
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and | language | Pre-training |
Grounded Entity-Landmark Adaptive Pre-training for Vision-and- | language | Navigation |
Grounded | language | -Image Pre-training |
Grounding | language | in Perception |
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive | language | -Image Pre-training |
Guest Editorial Introduction to the Special Section on Video and | language | |
Guest Editorial: Image and | language | Understanding |
Guest Editorial: | language | in Vision |
Guest Editorial: Special Issue on Affective Speech and | language | Synthesis, Generation, and Conversion |
Ham2Pose: Animating Sign | language | Notation into Pose Sequences |
Hand detection in American Sign | language | depth data using domain-driven random forest regression |
Hand Pose Guided 3D Pooling for Word-level Sign | language | Recognition |
Hand shape estimation under complex backgrounds for sign | language | recognition |
Hand Tracking Using Optical-Flow Embedded Particle Filter in Sign | language | Scenes |
Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign | language | Recognition Using Nested Dynamic Programming |
Hands in Focus: Sign | language | Recognition Via Top-Down Attention |
Handshapes and Movements: Multiple-Channel American Sign | language | Recognition |
HandTalker II: a Chinese sign | language | recognition and synthesis system |
Handwriting Recognition Algorithm in Different | language | s: Survey |
Handwritten Character Extraction Algorithm for Multi- | language | Document Image, A |
Head Pose Estimation for Sign | language | Video |
Hexagonal Pattern | language | s |
Hidden Markov Model for | language | Syntax in Text Recognition, A |
Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to | language | |
Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign | language | Translation |
Hierarchical Structures and Complexities of Parallel Isometric | language | s |
Hierarchical Vision- | language | Alignment for Video Captioning |
Hierarchical Visual-textual Graph for Temporal Activity Localization via | language | |
HierVL: Learning Hierarchical Video- | language | Embeddings |
High Level | language | for Parallel Image Processing, A |
high performance centroid-based classification approach for | language | identification, A |
High Performance Chinese/English Mixed OCR with Character Level | language | Identification |
High-fidelity 3D Face Generation from Natural | language | Descriptions |
HiTeA: Hierarchical Temporal-Aware Video- | language | Pre-training |
HiVLP: Hierarchical Interactive Video- | language | Pre-Training |
HMM-Based Continuous Sign | language | Recognition Using Stochastic Grammars |
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision- | language | Models |
HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and- | language | Navigation |
HOP: History-and-Order Aware Pretraining for Vision-and- | language | Navigation |
How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision- | language | Models? |
How important is motion in sign | language | translation? |
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign | language | |
Human Body | language | Analysis: A Preliminary Study Based on Kinect Skeleton Tracking |
Human Part-wise 3D Motion Context Learning for Sign | language | Recognition |
Hybrid Grammar | language | Model for Handwritten Historical Documents Recognition |
Hybrid | language | Model for Handwritten Chinese Sentence Recognition, A |
Hybrid word/Part-of-Arabic-Word | language | Models for arabic text document recognition |
I can't believe there's no images!: Learning Visual Tasks Using Only | language | Supervision |
I2MVFormer: Large | language | Model Generated Multi-View Document Supervision for Zero-Shot Image Classification |
iCLIP: Bridging Image Classification and Contrastive | language | -Image Pre-training for Visual Recognition |
Identification of Latin-Based | language | s through Character Stroke Categorization |
IFSeg: Image-free Semantic Segmentation via Vision- | language | Model |
Image as a Foreign | language | : BEIT Pretraining for Vision and Vision-Language Tasks |
Image as a Foreign | language | : BEIT Pretraining for Vision and Vision-Language Tasks |
Image character recognition using deep convolutional neural network learned from different | language | s |
Image | language | s in intelligent radiological palm diagnostics |
Image Retrieval on Real-life Images with Pre-trained Vision-and- | language | Models |
Image Statistics of American Sign | language | : Comparison with Faces and Natural Scenes |
Image Texture Analysis Method for Minority | language | Identification, An |
Image-Based and Sensor-Based Approaches to Arabic Sign | language | Recognition |
Image-Based Keyword Recognition in Oriental | language | Document Images |
Image-based word recognition in oriental | language | document images |
Image-Sensitive | language | Modeling for Automatic Speech Recognition |
Implementation of Three Text to Speech Systems for Kurdish | language | |
Improved Fusion of Visual and | language | Representations by Dense Symmetric Co-attention for Visual Question Answering |
Improved Speaker and Navigator for Vision-and- | language | Navigation |
Improved Visual Fine-tuning with Natural | language | Supervision |
Improving Commonsense in Vision- | language | Models via Knowledge Graph Riddles |
Improving Continuous Sign | language | Recognition with Cross-Lingual Signs |
Improving Deep Visual Representation for Person Re-identification by Global and Local Image- | language | Association |
Improving handwritten Chinese text recognition using neural network | language | models and convolutional neural network shape models |
Improving HMM-Based Keyword Spotting with Character | language | Models |
Improving Inconspicuous Attributes Modeling for Person Search by | language | |
Improving | language | -supervised object detection with linguistic structure analysis |
Improving Mandarin End-to-End Speech Recognition With Word N-Gram | language | Model |
Improving Medical Vision- | language | Contrastive Pretraining With Semantics-Aware Triage |
Improving Sign | language | Translation with Monolingual Data by Sign Back-Translation |
Improving the performance of Kalman filter for hand tracking in Persian sign | language | video |
Improving the Quality of Video-to- | language | Models by Optimizing Annotation of the Training Material |
Improving Vision-and- | language | Navigation by Generating Future-View Image Semantics |
Improving Vision-and- | language | Navigation with Image-text Pairs from the Web |
Incorporating facial features into a multi-channel gesture recognition system for the interpretation of Irish Sign | language | sequences |
Incorporating | language | Syntax in Visual Text Recognition with a Statistical-Model |
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained | language | Model |
Indian Sign | language | Generation System |
Indian Sign | language | Recognition Using Kinect Sensor |
Indirect Identification of Perinatal Psychosocial Risks From Natural | language | |
InDiReCT: | language | -Guided Zero-Shot Deep Metric Learning for Images |
Inference of even linear grammars and its application to picture description | language | s |
Inference of finite-state transducers from regular | language | s |
Inference of K-Testable | language | s in the Strict Sense and Application to Syntactic Pattern Recognition |
Inference of regular | language | s using state merging algorithms with search |
Inference of Reversible Tree | language | s |
Inferring Visual Persuasion via Body | language | , Setting, and Deep Features |
Influence of Handshape Information on Automatic Sign | language | Recognition |
Influence of | language | models and candidate set size on contextual postprocessing for chinese script recognition |
Ink Markup | language | : InkML |
Insight: A Data Flow | language | for Programming Vision Algorithms in a Reconfigurable Computational Network |
instant semantics acquisition system of live soccer video with application to live event alert and on-the-fly | language | selection, An |
Integrated Visual | language | and Software Development Environment, An |
Integrating | language | Guidance Into Image-Text Matching for Correcting False Negatives |
Integrating | language | Guidance into Vision-based Deep Metric Learning |
Integrating | language | Model in Handwritten Chinese Text Recognition |
Integrating natural | language | processing with image document analysis: what we learned from two real-world applications |
Integrating Natural- | language | Understanding with Document Structure-Analysis |
Integrating Phonological Knowledge in ASR Systems for Spanish | language | |
Integrating Vision and | language | for First-Impression Personality Analysis |
Integrating Vision and | language | : Semantic Description of Traffic Events from Image Sequences |
Integration of Gesture and Verbal | language | : A Formal Semantics Approach |
Integration of Natural | language | and Vision Processing |
Integration of Natural- | language | and Vision Processing: Computational Models and Systems |
Integration of Natural- | language | and Vision Processing: Grounding Representations |
Integration of Natural- | language | and Vision Processing: Intelligent Multimedia |
Integration of Natural- | language | and Vision Processing: More Computational Models and Systems |
Integration of Natural- | language | and Vision Processing: Theory |
Intellectual property management and protection for MPEG multimedia content: A structured | language | for interoperable IPMP systems |
Interaction-Integrated Network for Natural | language | Moment Localization |
Interactive High-Level | language | System for Picture Processing, An |
Interactive Image Retrieval by Natural | language | |
Interactive Machine Translation Framework for Modernizing the | language | of Historical Documents, An |
INTERLIS | language | for Modelling Legal 3D Spaces and Physical 3D Objects by Including Formalized Implementable Constraints and Meaningful Code Lists |
Interpretation of Spatial | language | in a Map Navigation Task |
Interpreter for a | language | for Describing Assemblies, An |
Interpreting the Fuzzy Semantics of Natural- | language | Spatial Relation Terms with the Fuzzy Random Forest Algorithm |
Intriguing Aspects of Oriental | language | s |
Introducing | language | Guidance in Prompt-based Continual Learning |
Introduction and Analysis of an Event-Based Sign | language | Dataset |
Invariants Extraction Method Applied in an Omni- | language | Old Document Navigating System |
Investigation and modeling of the structure of texting | language | |
Investigation Into The Common Semantics Of | language | And Vision, An |
Irish Sign | language | Recognition Using Principal Component Analysis and Convolutional Neural Networks |
Is BERT Blind? Exploring the Effect of Vision-and- | language | Pretraining on Visual Language Understanding |
Is BERT Blind? Exploring the Effect of Vision-and- | language | Pretraining on Visual Language Understanding |
Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural | language | into Structured Query Language within a Spatial Database |
Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural | language | into Structured Query Language within a Spatial Database |
Is context all you need? Scaling Neural Sign | language | Translation to Large Domains of Discourse |
Is Multimodal Vision Supervision Beneficial to | language | ? |
ISD-QA: Iterative Distillation of Commonsense Knowledge from General | language | Models for Unsupervised Question Answering |
ISOcat Data Categories for Signed | language | Resources |
Isolated Sign | language | Recognition based on Tree Structure Skeleton Images |
Isolated Sign | language | Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks |
Isolated spoken word recognition using packed-MFCC on padded-voice signal for unscripted | language | s |
Iterative Alignment Network for Continuous Sign | language | Recognition |
Iterative Reference Driven Metric Learning for Signer Independent Isolated Sign | language | Recognition |
Iterative Vision-and- | language | Navigation |
Japanese | language | model based on bigrams and its application to on-line character recognition |
Joint Object and State Recognition Using | language | Knowledge |
Joint Visual Grounding and Tracking with Natural | language | Specification |
Jointly Modeling Embedding and Translation to Bridge Video and | language | |
Journal of Visual | language | s and Computing |
K-gram Extensions of Terminal Distinguishable | language | s |
Kaleido-BERT: Vision- | language | Pre-training on Fashion Domain |
KERM: Knowledge Enhanced Reasoning for Vision-and- | language | Navigation |
Kinematic Gesture Representation Based on Shape Difference VLAD for Sign | language | Recognition, A |
Knowledge Representation for the Generation of Quantified Natural | language | Descriptions of Vehicle Traffic in Image Sequences |
Knowledge-Augmented Visual Question Answering With Natural | language | Explanation |
Knowledge-Aware Prompt Tuning for Generalizable Vision- | language | Models |
Knowledge-Based Computer Vision: Integrated Programming | language | and Data Management System Design |
KoreALBERT: Pretraining a Lite BERT Model for Korean | language | Understanding |
Korean Sign | language | Dataset for Action Recognition, The |
KSL-Guide: A Large-scale Korean Sign | language | Dataset Including Interrogative Sentences for Guiding the Deaf and Hard-of-Hearing |
L-CoDer: | language | -Based Colorization with Color-Object Decoupling Transformer |
L-CoIns: | language | -based Colorization With Instance Awareness |
Label2Label: A | language | Modeling Framework for Multi-attribute Learning |
LANA: A | language | -Capable Navigator for Instruction Following and Generation |
| language | Adaptive Methodology for Handwritten Text Line Segmentation |
| language | Adaptive Weight Generation for Multi-Task Visual Grounding |
| language | and Script Identification Based on Steerable Pyramid Features |
| language | and vision based person re-identification for surveillance systems using deep learning with LIP layers |
| language | and Vision: A Single Perceptual Mechanism |
| language | and Visual Relations Encoding for Visual Question Answering |
| language | as Queries for Referring Video Object Segmentation |
| language | Features Matter: Effective Language Representations for Vision-Language Tasks |
| language | Features Matter: Effective Language Representations for Vision-Language Tasks |
| language | Features Matter: Effective Language Representations for Vision-Language Tasks |
| language | for construction of belief networks, A |
| language | for Human Action, A |
| language | for pattern recognition, A |
| language | Guided Local Infiltration for Interactive Image Retrieval |
| language | Identification as Improvement for Lip-Based Biometric Visual Systems |
| language | Identification Based on Phone Decoding for Basque and Spanish |
| language | identification for handwritten document images using a shape codebook |
| language | Identification for Interactive Handwriting Transcription of Multilingual Documents |
| language | Identification for Printed Text Independent of Segmentation |
| language | identification from handwritten documents |
| language | Identification in Degraded and Distorted Document Images |
| language | identification in web documents using discrete HMMs |
| language | identification of character images using machine learning techniques |
| language | Identification of On-Line Documents Using Word Shapes |
| language | Identification Using Spectrogram Texture |
| language | Identification: Examining the Issues |
| language | in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification |
| language | in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification |
| language | Independent Lip Reading |
| language | Independent Searching Tools for Cultural Heritage on the Querylab Platform |
| language | Independent Skew Estimation Technique Based on Gaussian Mixture Models: A Case Study on South Indian Scripts |
| language | independent unsupervised learning of short message service dialect |
| language | Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting |
| language | Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting |
| language | Model Integration for the Recognition of Handwritten Medieval Documents |
| language | model using variable length tokens for open-vocabulary Hangul text recognition, A |
| language | Model-Based on Semantically Clustered Words in a Chinese Character-Recognition System, A |
| language | modeling for bag-of-visual words image categorization |
| language | Modeling on Location-Based Social Networks |
| language | Modelization and Categorization for Voice-Activated QA |
| language | Models are Causal Knowledge Extractors for Zero-shot Video Question Answering |
| language | Models for Handwritten Short Message Services |
| language | of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities, The |
| language | of Encoded Line Patterns, The |
| language | Person Search with Pair-based Weighting Loss |
| language | Proficiency Classification During Computer-Based Test with EEG Pattern Recognition Methods |
| language | Recognition, Multi-Language Documents |
| language | Recognition, Multi-Language Documents |
| language | Reinforced Superposition Multimodal Fusion for Sentiment Analysis |
| language | Translation, Grammar Based Analysis |
| language | , Vision and Metaphor |
| language | -Agnostic Visual-Semantic Embeddings |
| language | -Attention Modular-Network for Relational Referring Expression Comprehension in Videos |
| language | -Augmented Pixel Embedding for Generalized Zero-Shot Learning |
| language | -Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V&L Models |
| language | -Aware Spatial-Temporal Collaboration for Referring Video Segmentation |
| language | -aware weak supervision for salient object detection |
| language | -Based Image Editing with Recurrent Attentive Models |
| language | -Based Image Manipulation Built on Language-Guided Ranking |
| language | -Based Image Manipulation Built on Language-Guided Ranking |
| language | -based querying of image collections on the basis of an extensible ontology |
| language | -Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation |
| language | -Conditioned Graph Networks for Relational Reasoning |
| language | -Driven Artistic Style Transfer |
| language | -Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model |
| language | -enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language |
| language | -enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language |
| language | -Free Layout Analysis |
| language | -free Training for Zero-shot Video Grounding |
| language | -Grounded Indoor 3D Semantic Segmentation in the Wild |
| language | -Guided Audio-Visual Source Separation via Trimodal Consistency |
| language | -Guided Face Animation by Recurrent StyleGAN-Based Generator |
| language | -Guided Global Image Editing via Cross-Modal Cyclic Mechanism |
| language | -guided graph parsing attention network for human-object interaction recognition |
| language | -Guided Multi-Granularity Context Aggregation for Temporal Sentence Grounding |
| language | -guided Multi-Modal Fusion for Video Action Recognition |
| language | -Guided Music Recommendation for Video via Prompt Analogies |
| language | -Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning |
| language | -Independent OCR Using a Continuous Speech Recognition System |
| language | -Independent Text Lines Extraction Using Seam Carving |
| language | -Independent Text-Line Extraction Algorithm for Handwritten Documents |
| language | 2Pose: Natural Language Grounded Pose Forecasting |
| language | s and Architectures for Image Processing |
| language | s for constrained binary segmentation based on maximum a posteriori probability labeling |
LANIT: | language | -Driven Image-to-Image Translation for Unlabeled Data |
LapsCore: | language | -guided Person Search via Color Reasoning |
Large Lexicon Detection of Sign | language | |
Large Vocabulary Sign | language | Recognition Based on Fuzzy Decision Trees |
Large-scale Learning of Sign | language | by Watching TV (Using Co-occurrences) |
Large-Vocabulary Continuous Sign | language | Recognition Based on Transition-Movement Models |
LASP: Text-to-Text Optimization for | language | -Aware Soft Prompting of Vision and Language Models |
LASP: Text-to-Text Optimization for | language | -Aware Soft Prompting of Vision and Language Models |
Latent support vector machine for sign | language | recognition with Kinect |
LAVENDER: Unifying Video- | language | Understanding as Masked Language Modeling |
LAVENDER: Unifying Video- | language | Understanding as Masked Language Modeling |
LAVT: | language | -Aware Vision Transformer for Referring Image Segmentation |
Layout and | language | : exploring text block discovery in tables using linguistic resources |
Layout and | language | : Preliminary Investigations in Recognizing the Structure of Tables |
LC-MSM: | language | -Conditioned Masked Segmentation Model for unsupervised domain adaptation |
Learning and the | language | of thought |
Learning Bidimensional Context Dependent Models Using a Context Sensitive | language | |
Learning by Hallucinating: Vision- | language | Pre-training with Weak Supervision |
Learning by Planning: | language | -Guided Global Image Editing |
Learning Cross-Modal Representations for | language | -Based Image Manipulation |
Learning Disentanglement with Decoupled Labels for Vision- | language | Navigation |
Learning Domain Invariant Prompt for Vision- | language | Models |
learning environment for sign | language | , A |
Learning from Unlabeled 3D Environments for Vision-and- | language | Navigation |
Learning from What is Already Out There: Few-shot Sign | language | Recognition with Online Dictionaries |
Learning Joint Visual Semantic Matching Embeddings for | language | -guided Retrieval |
Learning | language | to symbol and language to vision mapping for visual grounding |
Learning | language | to symbol and language to vision mapping for visual grounding |
Learning Local | language | s and Their Application to DNA Sequence Analysis |
Learning Models for Object Recognition from Natural | language | Descriptions |
Learning of context-sensitive | language | s described by augmented regular expressions |
Learning of Patterns and Picture | language | s |
Learning Open-Vocabulary Semantic Segmentation Models From Natural | language | Supervision |
Learning sign | language | by watching TV (using weakly aligned subtitles) |
Learning signs from subtitles: A weakly supervised approach to sign | language | recognition |
Learning the Semantics in Image Retrieval: A Natural | language | Processing Approach |
Learning to combine the modalities of | language | and video for temporal moment localization |
Learning to Compose and Reason with | language | Tree Structures for Visual Grounding |
Learning to Exploit Temporal Structure for Biomedical Vision- | language | Processing |
Learning to Follow and Generate Instructions for | language | -Capable Navigation |
Learning to Generate | language | -Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space |
Learning to Generate Scene Graph from Natural | language | Supervision |
Learning to Name Classes for Vision and | language | Models |
Learning to Prompt CLIP for Monocular Depth Estimation: Exploring the Limits of Human | language | |
Learning to Prompt for Open-Vocabulary Object Detection with Vision- | language | Model |
Learning to Prompt for Vision- | language | Models |
Learning to Scale Multilingual Representations for Vision- | language | Tasks |
Learning to Segment Actions from Visual and | language | Instructions via Differentiable Weak Sequence Alignment |
Learning Trajectory-Word Alignments for Video- | language | Tasks |
Learning Trans-Dimensional Random Fields with Applications to | language | Modeling |
Learning Transferable Human-Object Interaction Detector with Natural | language | Supervision |
Learning Video Representations from Large | language | Models |
Learning Vision-and- | language | Navigation from YouTube Videos |
Learning Visual Representation from Modality-Shared Contrastive | language | -Image Pre-training |
Learning Visual Representations via | language | -Guided Sampling |
Length-sensitive | language | -bound Recognition Network for Multilingual Text Recognition, A |
LERF: | language | Embedded Radiance Fields |
Less is More: CLIPBERT for Video-and- | language | Learning via Sparse Sampling |
Let the robot tell: Describe car image with natural | language | via LSTM |
Leveraging per Image-Token Consistency for Vision- | language | Pre-Training |
Leveraging Pretrained Image Classifiers for | language | -Based Segmentation |
Leveraging Symbolic Knowledge Bases for Commonsense Natural | language | Inference Using Pattern Theory |
Leveraging Visual Prompts To Guide | language | Modeling for Referring Video Object Segmentation |
LexLIP: Lexicon-Bottlenecked | language | -Image Pre-Training for Large-Scale Image-Text Sparse Retrieval |
Limits on the Application of Frequency-Based | language | Models to OCR |
Linear Spaces of Meanings: Compositional Structures in Vision- | language | Models |
Linguistic Feature Vector for the Visual Interpretation of Sign | language | , A |
Linguistically-aware attention for reducing the semantic gap in vision- | language | tasks |
Lip contour extraction for | language | learning in VEC3D |
Lip Reading for Low-resource | language | s by Learning and Combining General Speech Knowledge and Language-specific Knowledge |
Lip Reading for Low-resource | language | s by Learning and Combining General Speech Knowledge and Language-specific Knowledge |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large | language | Models |
LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on | language | Models |
Local picture | language | s |
Local-Global Context Aware Transformer for | language | -Guided Video Segmentation |
Locality-Aware Transformer for Video-Based Sign | language | Translation |
Localized Latent Updates for Fine-Tuning Vision- | language | Models |
Localizing Characteristic Points on a Vertebra Contour by Using Shape | language | |
Localizing Moments in Video with Natural | language | |
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision- | language | Models |
Long Term Arm and Hand Tracking for Continuous Sign | language | TV Broadcasts |
Long-short term memory neural networks | language | modeling for handwriting recognition |
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and- | language | Navigation |
Loss Re-Scaling VQA: Revisiting the | language | Prior Problem From a Class-Imbalance View |
Low Rank | language | Models for Small Training Sets |
LViT: | language | Meets Vision Transformer in Medical Image Segmentation |
LWDOS: | language | for Writing Descriptors of Outline Shapes |
M3L: | language | -based Video Editing via Multi-Modal Multi-Level Transformers |
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi- | language | , Multi-Annotation Category Dataset for Modern Document Layout Analysis |
MABAN: Multi-Agent Boundary-Aware Network for Natural | language | Moment Retrieval |
MAC: Mining Activity Concepts for | language | -Based Temporal Localization |
Machine Recognition of Indian | language | Characters Using a Tree Structure Based on Primitives |
machine vision extension to the Ruby programming | language | using OpenCV and FFI, A |
MAD: A Scalable Dataset for | language | Grounding in Videos from Movie Audio Descriptions |
Madura: A | language | for Learning Vision Programs from Examples |
MAGNet: Multi-Region Attention-Assisted Grounding of Natural | language | Queries at Phrase Level |
MAGVLT: Masked Generative Vision-and- | language | Transformer |
Making the Most of Text Semantics to Improve Biomedical Vision- | language | Processing |
MAN: Moment Alignment Network for Natural | language | Moment Retrieval via Iterative Graph Adjustment |
Mandarin dictation machine based upon a hierarchical recognition approach and Chinese natural | language | analysis, A |
Mandarin | language | Learning System for Nasal Voice User |
Manifold Interpolation for an Efficient Hand Shape Recognition in the Irish Sign | language | |
MAP: Multimodal Uncertainty-Aware Vision- | language | Pre-training Model |
MapScript: A Map Algebra Programming | language | Incorporating Neighborhood Analysis |
Marathi | language | Speech Synthesizer Using Concatenative Synthesis Strategy (Spoken in Maharashtra, India) |
Markov Logic: A Unifying | language | for Structural and Statistical Pattern Recognition |
MaskCLIP: Masked Self-Distillation Advances Contrastive | language | -Image Pretraining |
Masked Autoencoding Does Not Help Natural | language | Supervision at Scale |
Masked Batch Normalization to Improve Tracking-Based Sign | language | Recognition Using Graph Convolutional Networks |
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with | language | Knowledge |
MDX2SPARQL: Semantic query mapping of OLAP query | language | to SPARQL |
Medical-Based Pictogram: Comprehension of Visual | language | with Semiotic Theory |
MedKLIP: Medical Knowledge Enhanced | language | -Image Pre-Training for X-ray Diagnosis |
MemBridge: Video- | language | Pre-Training With Memory-Augmented Inter-Modality Bridge |
Memory and Expectations in Learning, | language | , and Visual Understanding |
Merging The Autoview Image Processing | language | With Prolog |
MERLOT RESERVE: Neural Script Knowledge through Vision and | language | and Sound |
Meta-Explore: Exploratory Hierarchical Vision-and- | language | Navigation Using Scene Object Spectrum Grounding |
Meta-Personalizing Vision- | language | Models to Find Named Instances in Video |
Method and apparatus for automatic | language | determination of Asian language documents |
Method and apparatus for automatic | language | determination of Asian language documents |
Method for Analyzing Spatial Relationships Between Words in Sign | language | Recognition, A |
Method for Recognizing a Sequence of Sign | language | Words Represented in a Japanese Sign Language Sentence, A |
Method for Recognizing a Sequence of Sign | language | Words Represented in a Japanese Sign Language Sentence, A |
Method for recognizing multi- | language | printed documents using strokes and non-strokes of characters |
MILES: Visual BERT Pre-training with Injected | language | Semantics for Video-Text Retrieval |
Minimal Training, Large Lexicon, Unconstrained Sign | language | Recognition |
Mining the Urdu | language | -Based Web Content for Opinion Extraction |
Mis?-) Using DRT for Generation of Natural | language | Text from Image Sequences |
Misalign, Contrast then Distill: Rethinking Misalignments in | language | -Image Pretraining |
Mixed SIGNals: Sign | language | Production via a Mixture of Motion Primitives |
MLSLib: A Lip Sync Library for Multi Agents and | language | s |
MLSLT: Towards Multilingual Sign | language | Translation |
Modalities Combination for Italian Sign | language | Extraction and Recognition |
Modality Combination Techniques for Continuous Sign | language | Recognition |
Modeling Musical Style with | language | Models for Composer Recognition |
Modeling visual interactive systems through dynamic visual | language | s |
Modelling and recognition of the linguistic components in American Sign | language | |
Modelling and segmenting subunits for sign | language | recognition based on hand motion analysis |
Modulating Bottom-Up and Top-Down Visual Processing via | language | -Conditional Filters |
Moment-based Adversarial Training for Embodied | language | Comprehension |
Mongo2SPARQL: Automatic and semantic query conversion of MongoDB query | language | to SPARQL |
Morality Classification in Natural | language | Text |
Moroccan sign | language | recognition based on machine learning |
Most and Least Retrievable Images in Visual- | language | Query Systems |
Motion Capture System for Sign | language | Synthesis: Overview and Related Issues, A |
Motion | language | of Stereo Image Sequence |
MotionLM: Multi-Agent Motion Forecasting as | language | Modeling |
Moving GeoPQL: a pictorial | language | towards spatio-temporal queries |
MPCCT: Multimodal vision- | language | learning paradigm with context-based compact Transformer |
MPEG video markup | language | and its applications to robust video transmission |
MPEG-4 Systems and Description | language | s: A Way Ahead in Audio Visual Information Representation, The |
MSR-VTT: A Large Video Description Dataset for Bridging Video and | language | |
Multi | language | text detection using fast stroke width transform |
multi-class classification strategy for Fisher scores: Application to signer independent sign | language | recognition, A |
Multi-granularity Retrieval System for Natural | language | -based Vehicle Retrieval, A |
Multi- | language | Online Handwriting Recognition |
Multi-Level Query Interaction for Temporal | language | Grounding |
Multi-lingual Phoneme Recognition and | language | Identification Using Phonotactic Information |
Multi-lingual scene text detection and | language | identification |
Multi-Modal Interaction Graph Convolutional Network for Temporal | language | Localization in Videos |
Multi-modal Sign | language | Spotting by Multi/one-shot Learning |
Multi-modality American Sign | language | recognition |
Multi-modality-based Arabic sign | language | recognition |
Multi-Scale 2D Temporal Adjacency Networks for Moment Localization With Natural | language | |
Multi-scale local-temporal similarity fusion for continuous sign | language | recognition |
Multi-stage Aggregated Transformer Network for Temporal | language | Localization in Videos |
Multi-task learning for natural | language | processing in the 2020s: Where are we going? |
Multi-Task Learning of Hierarchical Vision- | language | Representation |
Multi-Task Paired Masking With Alignment Modeling for Medical Vision- | language | Pre-Training |
Multi-view motion modelled deep attention networks (M2DA-Net) for video based sign | language | recognition |
Multifractal Characterization of Texts for Pattern Recognition: On the Complexity of Morphological Structures in Modern and Ancient | language | s |
Multimodal attention networks for low-level vision-and- | language | navigation |
Multimodal Embeddings From | language | Models for Emotion Recognition in the Wild |
Multimodal Error Correction with Natural | language | and Pointing Gestures |
Multimodal Features Alignment for Vision- | language | Object Tracking |
Multimodal Learning for Sign | language | Recognition |
Multimodal representation: Kneser-ney smoothing/skip-gram based neural | language | model |
Multimodal Transformer with Variable-Length Memory for Vision-and- | language | Navigation |
Multiple Handwritten Text Line Recognition Systems Derived from Specific Integration of a | language | Model |
Multiple Hypothesis Tracking with Sign | language | Hand Motion Constraints |
Multiview | language | Bias Reduction for Visual Question Answering |
Mutual Support of Data Modalities in the Task of Sign | language | Recognition |
Myhill-Nerode Theorem for Finite State Matrix Automata and Finite Matrix | language | s, A |
n-Grams and their implication to natural | language | understanding |
Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural | language | Processing |
Narrow-Band Video Communication System for the Transmission of Sign | language | over Ordinary Telephone Lines, A |
Natural Interface for Sign | language | Mathematics, A |
Natural | language | Description of Human Activities from Video Images Based on Concept Hierarchy of Actions |
Natural | language | Description of Time-Varying Scenes |
Natural | language | grammar induction with a generative constituent-context model |
Natural | language | letter based visual cryptography scheme |
Natural | language | Object Retrieval |
Natural | language | processing approach for appraisal of passenger satisfaction and service quality of public transportation |
Natural | language | Processing of Patents and Technical Documentation |
Natural | language | understanding by a robot: A pattern recognition problem |
Natural | language | Understanding by Combining Statistical Methods and Extended Context-Free Grammars |
Natural | language | Video Localization: A Revisit in Span-Based Question Answering Framework |
Natural | language | Video Moment Localization Through Query-Controlled Temporal Convolution |
Natural | language | Watermarking Using Semantic Substitution for Chinese Text |
Natural | language | -Assisted Sign Language Recognition |
Natural | language | -Assisted Sign Language Recognition |
Natural | language | -Based Vehicle Retrieval with Explicit Cross-Modal Representation Learning |
Neighbourhood Watch: Referring Expression Comprehension via | language | -Guided Graph Attention Networks |
NeRDi: Single-View NeRF Synthesis with | language | -Guided Diffusion as General Image Priors |
Neural Logic Vision | language | Explainer |
Neural network | language | models for off-line handwriting recognition |
Neural Networks Compression for | language | Modeling |
Neural Sign | language | Synthesis: Words Are Our Glosses |
Neural Sign | language | Translation |
Neural Sign | language | Translation by Learning Tokenization |
New approach of smoothing to extend | language | model in Lucene |
New Computer | language | for Electron Image Processing, A |
New Dataset for End-to-End Sign | language | Translation: The Greek Elementary School Dataset, A |
New Hierarchy of Two-Dimensional Array | language | s, A |
New implementation of Ogc Web Processing Service in Python programming | language | . Pywps-4 and issues we are facing with processing of large raster data using Ogc Wps |
new instrumented approach for translating American Sign | language | into sound and text, A |
New | language | -Independent Deep CNN for Scene Text Detection and Style Transfer in Social Media Images, A |
New Path: Scaling Vision-and- | language | Navigation with Synthetic Instructions and Imitation Learning, A |
NLE-DM: Natural- | language | Explanations for Decision Making of Autonomous Driving Based on Semantic Scene Understanding |
NLX-GPT: A Model for Natural | language | Explanations in Vision and Vision-Language Tasks |
NLX-GPT: A Model for Natural | language | Explanations in Vision and Vision-Language Tasks |
Non-Contrastive Learning Meets | language | -Image Pre-Training |
Non-Fluent Synthetic Target- | language | Data Improve Neural Machine Translation |
non-linear model of shape and motion for tracking finger spelt American sign | language | , A |
Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive | language | Identification |
novel approach to American Sign | language | (ASL) phrase verification using reversed signing, A |
novel approach to automatically extracting basic units from Chinese sign | language | , A |
Novel Attention-based Aggregation Function to Combine Vision and | language | , A |
Novel boosting framework for subunit-based sign | language | recognition |
Novel Chinese Sign | language | Recognition Method Based on Keyframe-Centered Clips, A |
novel natural | language | steganographic framework based on image description neural network, A |
Novel Sign | language | Recognition Framework Using Hierarchical Grassmann Covariance Matrix, A |
NuWA-LIP: | language | -guided Image Inpainting with Defect-free VQGAN |
Object Captioning and Retrieval with Natural | language | |
Object Oriented | language | for Image and Vision Execution (OLIVE), An |
Object Referring in Videos with | language | and Human Gaze |
Object Referring in Visual Scene with Spoken | language | |
Object-and-action Aware Model for Visual | language | Navigation |
Object-aware Video- | language | Pre-training for Retrieval |
object-oriented descriptive | language | to facilitate advanced handwritten form processing, An |
object-oriented form description | language | and approach to handwritten form processing, An |
Objects tracking in video: A object-oriented approach using Unified Modeling | language | |
OCR error correction of an Inflectional Indian | language | using morphological parsing |
OCR Error Detection and Correction of an Inflectional Indian | language | Script |
OCR for bilingual documents using | language | modeling |
OCR in Bangla: an Indo-Bangladeshi | language | |
OCR System to Read Two Indian | language | Scripts: Bangla and Devnagari (Hindi), An |
Offline Arabic Handwriting Identification Using | language | Diacritics |
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical | language | Models |
OMG: Observe Multiple Granularities for Natural | language | -Based Vehicle Retrieval |
OmniLabel: A Challenging Benchmark for | language | -Based Object Detection |
On a Relationship Between Graph L-Systems and Picture | language | s |
On Guiding Visual Attention with | language | Specification |
On intelligent avatar communication using Korean, Chinese and Japanese sign- | language | s: an overview |
On optimal order in modeling sequence of letters in words of common | language | as a Markov chain |
On Recognizable Infinite Array | language | s |
On Some Classes of 2D | language | s and Their Relations |
On the application of formal | language | and automata theory to pattern recognition |
On the influence of vocabulary size and | language | models in unconstrained handwritten text recognition |
On the | language | of Standard Discrete Planes and Surfaces |
On the linear computational complexity of the parser for quasi-context sensitive | language | s |
On the parsing of deterministic graph | language | s for syntactic pattern recognition |
On the Projection of PLLRs for Unbounded Feature Distributions in Spoken | language | Recognition |
On the recognition of the alphabet of the sign | language | through size functions |
On The Simultaneous Interpretation of Real World Image Sequences and Their Natural | language | Description: The System Soccer |
On the use of graph parsing for recognition of isolated hand postures of Polish Sign | language | |
On Using Classical Poetry Structure for Indian | language | Post-Processing |
One Step at a Time: Long-Horizon Vision-and- | language | Navigation with Milestones |
One-Stream Vision- | language | Memory Network for Object Tracking |
online reversed French Sign | language | dictionary based on a learning approach for signs classification, An |
Ontological Query | language | for Content Based Image Retrieval |
Open-Category Human-Object Interaction Pre-training via | language | Modeling Framework |
Open-Set Fine-Grained Retrieval via Prompting Vision- | language | Evaluator |
Open-Vocabulary One-Stage Detection with Hierarchical Visual- | language | Knowledge Distillation |
Open-World Semantic Segmentation via Contrasting and Clustering Vision- | language | Embedding |
Openfashionclip: Vision-and- | language | Contrastive Learning with Open-source Fashion Data |
OpenVL: Abstracting Vision Tasks Using a Segment-Based | language | Model |
Optical character recognition errors and their effects on natural | language | processing |
Optical modelling and | language | modelling trade-off for Handwritten Text Recognition |
Optimisation of both classifier and fusion based feature set for static American sign | language | recognition |
Optimizing PLLR Features for Spoken | language | Recognition |
Optimizing the integration of a statistical | language | model in HMM based offline handwritten text recognition |
OSCAR: Object-Semantics Aligned Pre-Training for Vision- | language | Tasks |
overview of the MPEG-7 Description Definition | language | (DDL) proposals, An |
overview of the MPEG-7 description definition | language | (DDL), An |
P System Model for Contextual Array | language | s, A |
Painter: Teaching Auto-regressive | language | Models to Draw Sketches |
Pallel parsing of tree | language | s for syntactic pattern recognition |
Parallel and Sequential Specification of a Context Sensitive | language | for Straight Lines on Grids |
Parallel Contextual Hexagonal Array Grammars and | language | s |
Parallel Hidden Markov Models for American Sign | language | Recognition |
parallel image processing | language | based on computational models, A |
Parallel Temporal Encoder For Sign | language | Translation |
Parametric Processes for the Implementation of HBIM: Visual Programming | language | for the Digitisation of the Index of Masonry Quality |
Parametric Representation of the Speaker's Lips for Multimodal Sign | language | And Speech Recognition |
Pars-OFF: A Benchmark for Offensive | language | Detection on Farsi Social Media |
Parsing and Translation of (Attributed) Expansive Graph | language | s for Scene Analysis |
Parsing with Probabilistic Strictly Locally Testable Tree | language | s |
PartGlot: Learning Shape Part Segmentation from | language | Reference Games |
Partial Commutation on Array | language | s |
PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image- | language | Models |
pattern description | language | : PADEL, A |
PERCEIVER-VL: Efficient Vision-and- | language | Modeling with Iterative Latent Attention |
Perceptual Grouping in Contrastive Vision- | language | Models |
perceptually optimised video coding system for sign | language | communication at low bit rates, A |
Performance of a SCFG-Based | language | Model with Training Data Sets of Increasing Size |
Persian | language | Model based on BiLSTM Model on COVID-19 Corpus |
person independent system for recognition of hand postures used in sign | language | , A |
Person Re-Identification with Vision and | language | |
Person Search with Natural | language | Description |
Person-Independent 3D Sign | language | Recognition |
Personalized pose estimation for body | language | understanding |
Personalized text snippet extraction using statistical | language | models |
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with | language | and Vision |
PETR: Rethinking the Capability of Transformer-Based | language | Model in Scene Text Recognition |
Phone-Segments Based | language | Identification for Spanish, Basque and English |
Phoneme analysis for multiple | language | s with fuzzy-based speaker identification |
Phonetic Unification of Multiple Accents for Spanish and Arabic | language | s |
Phrase Localization and Visual Relationship Detection with Comprehensive Image- | language | Cues |
PhraseCut: | language | -Based Image Segmentation in the Wild |
PICQUERY: A High Level Query | language | for Pictorial Database Management |
Picture | language | for Skeletal Polyhedra |
Picture | language | Machines |
Picture | language | s |
Picture | language | s: Formal Models for Picture Recognition |
Picture Query | language | s for Pictorial Database Systems |
Picture: A probabilistic programming | language | for scene perception |
PiGLET: Pixel-Level Grounding of | language | Expressions With Transformers |
PIN: A Novel Parallel Interactive Network for Spoken | language | Understanding |
PiSLTRc: Position-Informed Sign | language | Transformer With Content-Aware Convolution |
PLA: | language | -Driven Open-Vocabulary 3D Scene Understanding |
Place versus Space: From Points, Lines and Polygons in GIS to Place-Based Representations Reflecting | language | and Culture |
PLANG: A Picture | language | Schema for a Class of Pictures |
Poly: A two dimensional | language | for a class of polygons |
PolygloNet: Multilingual Approach for Scene Text Recognition Without | language | Constraints |
Polynomial Time Algorithm for Inferring Subclasses of Parallel Internal Column Contextual Array | language | s |
Porting Multilingual Subjectivity Resources across | language | s |
Pose-based Body | language | Recognition for Emotion and Psychiatric Symptom Interpretation |
Pose-based Sign | language | Recognition using GCN and BERT |
PoseFix: Correcting 3D Human Poses with Natural | language | |
PoseScript: 3D Human Poses from Natural | language | |
Position Models and | language | Modeling |
Position-Guided Text Prompt for Vision- | language | Pre-Training |
Postprocessing Statistical | language | Models for a Handwritten Chinese Character Recognizer |
Practical Cross-modal Manifold Alignment for Robotic Grounded | language | Learning |
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision- | language | Model |
Prefix Conditioning Unifies | language | and Label Supervision |
Pretrained | language | Models as Visual Planners for Human Assistance |
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision- | language | Models |
Prior-Aware Cross Modality Augmentation Learning for Continuous Sign | language | Recognition |
Priority Map for Vision-and- | language | Navigation with Trajectory Plans and Feature-Location Cues, A |
Probabilistic | language | Model for Hand Drawings, A |
Probabilistic logic with minimum perplexity: Application to | language | modeling |
ProbVLM: Probabilistic Adapter for Frozen Vison- | language | Models |
Product Aspects Identification Method by Using Translation-Based | language | Model, A |
Progress in Image Processing | language | s |
Progressive | language | -Customized Visual Feature Learning for One-Stage Visual Grounding |
Progressive Transformers for End-to-end Sign | language | Production |
Prompt-RSVQA: Prompting visual context to a | language | model for Remote Sensing Visual Question Answering |
Prompting large | language | model with context and pre-answer for knowledge-based VQA |
Prompting Large | language | Models with Answer Heuristics for Knowledge-Based Visual Question Answering |
Prompting Visual- | language | Models for Efficient Video Understanding |
Pronunciation Clustering and Modeling of Variability for Appearance-Based Sign | language | Recognition |
Proposal of Real World Video Stream Description | language | (VSDL-RW) and Its Application |
Proposal-free Temporal Moment Localization of a Natural- | language | Query in Video using Guided Attention |
ProtTrans: Toward Understanding the | language | of Life Through Self-Supervised Learning |
ProVLA: Compositional Image Search with Progressive Vision- | language | Alignment and Multimodal Fusion |
Pseudo-Q: Generating Pseudo | language | Queries for Visual Grounding |
Pseudodepth-SLR: Generating Depth Data for Sign | language | Recognition |
Q: How to Specialize Large Vision- | language | Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! |
QISampling: An Effective Sampling Strategy for Event-Based Sign | language | Recognition |
Quadtree Grammars for Picture | language | s |
Qualitative and Quantitative Characterisation of Style in Sign | language | Gestures, A |
Query Techniques, Query | language | s |
RA-CLIP: Retrieval Augmented Contrastive | language | -Image Pre-Training |
Rapid signer adaptation for continuous sign | language | recognition using a combined approach of eigenvoices, MLLR, and MAP |
Rapid Signer Adaptation for Isolated Sign | language | Recognition |
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for | language | -guided HOI detection |
Re-sampling for Chinese Sign | language | Recognition |
Re-scoring using image- | language | similarity for few-shot object detection |
Read and Attend: Temporal Localisation in Sign | language | Videos |
Read Like Humans: Autonomous, Bidirectional and Iterative | language | Modeling for Scene Text Recognition |
Read-only Prompt Optimization for Vision- | language | Few-shot Learning |
Real time Hand Gesture Recognition using different algorithms based on American Sign | language | |
Real Time Large Vocabulary Continuous Sign | language | Recognition Based on OP/Viterbi Algorithm |
Real time sign | language | recognition using depth sensor |
Real-Time American Sign | language | from Video Using Hidden Markov Models |
Real-Time American Sign | language | Recognition from Video Using Hidden Markov Models |
Real-Time American Sign | language | Recognition Using Desk and Wearable Computer Based Video |
Real-Time Continuous Gesture Recognition System for Sign | language | , A |
Real-time recognition of sign | language | gestures and air-writing using leap motion |
Real-Time Retrieval for Images of Documents in Various | language | s Using a Web Camera |
Real-time sign | language | letter and word recognition from depth data |
Real-time sign | language | recognition and speech conversion using VGG16 |
Real-Time Sign | language | Recognition Using a Consumer Depth Camera |
Real-time Visual Object Tracking with Natural | language | Description |
Recent Advances of Deep Learning for Sign | language | Recognition |
Recent developments of the syntactic pattern recognition model based on quasi-context sensitive | language | s |
Recognition and learning of a class of context-sensitive | language | s described by augmented regular expressions |
Recognition Approach to Gesture | language | Understanding, A |
recognition graph: | language | independent adaptable on-line cursive script recognition, The |
Recognition of Arabic Sign | language | Alphabet Using Polynomial Classifiers |
Recognition of gestures in Arabic sign | language | using neuro-fuzzy systems |
Recognition of Japanese Sign | language | from Image Sequence Using Color Combination |
Recognition of Local Features for Camera-based Sign | language | Recognition System |
Recognition of Sign | language | Motion Images |
Recognition of strong and weak connection models in continuous sign | language | |
Recognition of user-dependent and independent static hand gestures: Application to sign | language | |
Recognition Strategy | language | , The |
Recognition System for Home-Service-Related Sign | language | Using Entropy-Based K -Means Algorithm and ABC-Based HMM |
Recognitionwith raw canonical phonetic movement and handshape subunits on videos of continuous Sign | language | |
Recognizability of iso-picture | language | s by Wang systems |
Recognizable Picture | language | s |
Recognizable units in Pashto | language | for OCR |
Recognizing American Sign | language | Gestures from Within Continuous Videos |
Recognizing American Sign | language | Nonmanual Signal Grammar Errors in Continuous Videos |
Recognizing Continuous Grammatical Marker Facial Gestures in Sign | language | Video |
Recognizing Sign | language | from Brain Imaging |
Recognizing Spatiotemporal Gestures and Movement Epenthesis in Sign | language | |
Recovering the linguistic components of the manual signs in American Sign | language | |
Recurrent Convolutional Neural Networks for Continuous Sign | language | Recognition by Staged Optimization |
Reducing | language | Biases in Visual Question Answering with Visually-grounded Question Encoder |
Redundancy removal for isolated gesture in Indian sign | language | and recognition using multi-class support vector machine |
Refined Knowledge Transfer for | language | -Based Person Search |
RegionCLIP: Region-based | language | -Image Pretraining |
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision- | language | Models |
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision- | language | Navigation |
Reinforced Structured State-Evolution for Vision- | language | Navigation |
Relation Enhanced Vision | language | Pre-Training |
Relational Temporal Graph Reasoning for Dual-Task Dialogue | language | Understanding |
Relevant Features for Video-Based Continuous Sign | language | Recognition |
Remarks on some aspects of | language | structure and their relevance to pattern analysis |
remote sensing computer-assisted learning tool developed using the unified modeling | language | , A |
Reproducible Scaling Laws for Contrastive | language | -Image Learning |
Requirements for a Gesture Specification | language | : A Comparison of Two Representation Formalisms |
RES-StS: Referring Expression Speaker via Self-Training With Scorer for Goal-Oriented Vision- | language | Navigation |
research of a basic | language | of expert systems for pattern recognition-design and realization of LOG-BASIC programming language, The |
research of a basic | language | of expert systems for pattern recognition-design and realization of LOG-BASIC programming language, The |
Research on Extension of SPARQL Ontology Query | language | Considering the Computation of Indoor Spatial Relations |
Research on the Metaphorical Features of Computer | language | in English from the Perspective of Cognition |
Research on using sign | language | in outdoor advertising |
Resolving vision and | language | ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes |
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large | language | Models |
Retrospect to Multi-prompt Learning across Vision and | language | , A |
Reveal: Retrieval-Augmented Visual- | language | Pre-Training with Multi-Source Multimodal Knowledge Memory |
Review of Graphic | language | s (1972). |
Review of Machine Learning-Based Recognition of Sign | language | , A |
Review on sign | language | recognition methods for supporting communication between deaf and non-deaf persons |
Revisiting Image- | language | Networks for Open-Ended Phrase Detection |
Rewriting P Systems Generating Iso-picture | language | s |
RILS: Masked Visual Reconstruction in | language | Semantic Space |
RLIPv2: Fast Scaling of Relational | language | -Image Pre-training |
RMLVQA: A Margin Loss Approach For Visual Question Answering with | language | Biases |
Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision- | language | Navigation, The |
Robust Face Detection and Japanese Sign | language | Hand Posture Recognition for Human-Computer Interaction in an Intelligent Room |
Robust Image Processing | language | in the Context of Image Algebra, A |
Robust Person-Independent Visual Sign | language | Recognition |
Robust Real-time And Rotation-invariant American Sign | language | Alphabet Recognition Using Range Camera |
Robust Scene Text Detection for Multi-script | language | s Using Deep Learning |
Robust sign | language | recognition by combining manual and non-manual features based on conditional random field and support vector machine |
Robust Sign | language | Recognition with Hierarchical Conditional Random Fields |
Role of Iconic Gestures in Production and Comprehension of | language | : Evidence from Brain and Behavior, The |
Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce | language | |
Role of the Input in Natural | language | Video Description, The |
S3C: Semi-Supervised VQA Natural | language | Explanation via Self-Critical Learning |
S3DRGF: Spatial 3-D Relational Geometric Features for 3-D Sign | language | Representation and Recognition |
Sassy: A | language | and Optimizing Compiler for Image Processing on Reconfigurable Computing Systems |
SBNet: Segmentation-based Network for Natural | language | -based Vehicle Search |
Scalable frame resolution for efficient continuous sign | language | recognition |
Scale-Invariant Visual | language | Modeling for Object Categorization |
Scaling Data Generation in Vision-and- | language | Navigation |
Scaling | language | -Image Pre-Training via Masking |
Scaling Up Sign Spotting Through Sign | language | Dictionaries |
Scaling Up Vision- | language | Pretraining for Image Captioning |
Scanrefer: 3d Object Localization in RGB-D Scans Using Natural | language | |
Scene Text Recognition using Higher Order | language | Priors |
Script and | language | Determination from Document Images |
Script and | language | Identification for Handwritten Document Images |
Script and | language | Identification from Document Images |
Script and | language | Identification in Noisy and Degraded Document Images |
sEditor: A Prototype for a Sign | language | Interfacing System |
Seeing Out of tHe bOx: End-to-End Pre-training for Vision- | language | Representation Learning |
Seeing What You Miss: Vision- | language | Pre-training with Semantic Completion Learning |
Segment-Based Classes for | language | Modeling Within the Field of CSR |
Segmentation from Natural | language | Expressions |
Segmentation of the Face and Hands in Sign | language | Video Sequences Using Color and Motion Cues |
Segmentation-robust representations, matching, and modeling for sign | language | |
Selecting ghosts and queues from a car trackers output using a spatio-temporal query | language | |
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross- | language | Speech Emotion Recognition |
Self-Mutual Distillation Learning for Continuous Sign | language | Recognition |
Self-Supervised Learning for Semi-Supervised Temporal | language | Grounding |
Self-Training Vision | language | BERTs With a Unified Conditional Model |
semantic and | language | -based representation of an environmental scene, A |
Semantic Boundary Detection With Reinforcement Learning for Continuous Sign | language | Recognition |
Semantic Similarity/Relatedness for Cross | language | Plagiarism Detection |
Semantically-aware Spatio-temporal Reasoning Agent for Vision-and- | language | Navigation in Continuous Environments |
Semantics constrained dictionary learning for signer-independent sign | language | recognition |
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through | language | Grounding |
Semi-lexical | language | s: a formal basis for using domain knowledge to resolve ambiguities in deep-learning based computer vision |
Sentence level text classification in the Kannada | language | : A classifier's perspective |
Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with | language | Guidance, A |
Seq2seq Vs Sketch Filling Structure for Natural | language | to SQL Translation |
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision- | language | Pre-training Models |
SG-Net: Syntax Guided Transformer for | language | Representation |
SgVA-CLIP: Semantic-Guided Visual Adapting of Vision- | language | Models for Few-Shot Image Classification |
Shape geodesics for robust sign | language | recognition |
Shapeglot: Learning | language | for Shape Differentiation |
ShapeTalk: A | language | Dataset and Framework for 3D Shape Edits and Deformations |
Shifted-Delta MLP Features for Spoken | language | Recognition |
Shopping behavior recognition using a | language | modeling analogy |
Shortcomings in Vision and | language | |
Shuffle on Trajectories over Finite Array | language | s |
Siamese Natural | language | Tracker: Tracking by Natural Language Descriptions with Siamese Trackers |
Siamese Natural | language | Tracker: Tracking by Natural Language Descriptions with Siamese Trackers |
Sigmoid Loss for | language | Image Pre-Training |
Sign | language | analysis and recognition: A preliminary investigation |
Sign | language | by Cellphone |
Sign | language | detection using 3D visual cues |
Sign | language | Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network |
Sign | language | Generation, Sign Language Synthesis |
Sign | language | Generation, Sign Language Synthesis |
Sign | language | Gesture Recognition Using HMM |
Sign | language | Production: A Review |
Sign | language | Recognition Based on 3D Convolutional Neural Networks |
Sign | language | recognition based on adaptive HMMS with data augmentation |
Sign | language | recognition based on global-local attention |
Sign | language | Recognition Based on Hand and Body Skeletal Data |
Sign | language | Recognition Based on R(2+1)D With Spatial-Temporal-Channel Attention |
Sign | language | Recognition Based on Trajectory Modeling with HMMs |
Sign | language | Recognition by Combining Statistical DTW and Independent Classification |
Sign | language | Recognition for Assisting the Deaf in Hospitals |
Sign | language | Recognition in Virtual Reality |
Sign | language | recognition using 3-D Hopfield neural network |
Sign | language | recognition using a combination of new vision based features |
Sign | language | Recognition Using Convolutional Neural Networks |
Sign | language | Recognition Using Hilbert Curve Features |
Sign | language | Recognition Using Model-Based Tracking and a 3D Hopfield Neural-Network |
Sign | language | Recognition using Sequential Pattern Trees |
Sign | language | recognition with long short-term memory |
Sign | language | Recognition: an Application of the Theory of Size Functions |
Sign | language | spotting based on semi-Markov Conditional Random Field |
Sign | language | Spotting with a Threshold Model Based on Conditional Random Fields |
Sign | language | Transformers: Joint End-to-End Sign Language Recognition and Translation |
Sign | language | Transformers: Joint End-to-End Sign Language Recognition and Translation |
Sign | language | Translation from Instructional Videos |
Sign | language | Translation with Hierarchical Spatio-Temporal Graph Neural Network |
Sign | language | Translation with Iterative Prototype |
Sign | language | Understanding |
Sign | language | Video Retrieval with Free-Form Textual Queries |
Sign | language | , Fingerspelling |
Sign | language | , General, Other Languages, Chinese, Arabic |
Sign | language | , General, Other Languages, Chinese, Arabic |
Sign Pose-based Transformer for Word-level Sign | language | Recognition |
Sign Segmentation Using Dynamics and Hand Configuration for Semi-automatic Annotation of Sign | language | Corpora |
Sign, Attend and Tell: Spatial Attention for Sign | language | Recognition |
SignBERT+: Hand-Model-Aware Self-Supervised Pre-Training for Sign | language | Understanding |
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign | language | Recognition |
significance of facial features for automatic sign | language | recognition, The |
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign | language | Production |
SignNet II: A Transformer-Based Two-Way Sign | language | Translation Model |
SignPose: Sign | language | Animation Through 3D Pose Lifting |
SignTutor: An Interactive System for Sign | language | Tutoring |
SILFA: Sign | language | Facial Action Database for the Development of Assistive Technologies for the Deaf |
Sim-2-Sim Transfer for Vision-and- | language | Navigation in Continuous Environments |
Similarity Assessment Model for Chinese Sign | language | Videos |
similarity between probabilistic tree | language | s: Application to XML document families, A |
Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision- | language | Model, A |
Simple But Powerful, a | language | -Supervised Method for Image Emotion Classification |
Simple Multi-Modality Transfer Learning Baseline for Sign | language | Translation, A |
Simulation of process of forming the | language | for description and analysis of the forms of images |
Simultaneously Training and Compressing Vision-and- | language | Pre-Training Model |
SINC: Self-Supervised In-Context Learning for Vision- | language | Tasks |
Single-Stream Multi-level Alignment for Vision- | language | Pretraining |
Skeleton Aware Multi-modal Sign | language | Recognition |
Sketch Grammars: a formalism for describing and recognizing diagrammatic sketch | language | s |
SLAN: Self-Locator Aided Network for Vision- | language | Understanding |
SLIP: Self-supervision Meets | language | -Image Pre-training |
Slovo: Russian Sign | language | Dataset |
SMAUG: Sparse Masked Autoencoder for Efficient Video- | language | Pre-training |
Smoothed Disparity Maps for Continuous American Sign | language | Recognition |
Smoothing and compression with stochastic k-testable tree | language | s |
Soft Expert Reward Learning for Vision-and- | language | Navigation |
Soft memberships for spectral clustering, with application to permeable | language | distinction |
Some Issues in Sign | language | Processing |
Some Notes on Finite-State Picture | language | s |
Some results on picture | language | s |
Some Thoughts on Picture | language | s |
Spatial | language | for human-robot dialogs |
Spatial-Temporal Enhanced Network for Continuous Sign | language | Recognition |
Spatial-Temporal Multi-Cue Network for Sign | language | Recognition and Translation |
Spatialised Semantic Relations in French Sign | language | : Toward a Computational Modelling |
Spatio-Temporal Feature-Extraction Techniques for Isolated Gesture Recognition in Arabic Sign | language | |
Spatio-Temporal Person Retrieval via Natural | language | Queries |
Speaker Dependent ASRs for Huastec and Western-Huastec Nahuatl | language | s |
Speaking Louder than Words with Pictures Across | language | s |
Speaking the Same | language | : Matching Machine to Human Captions by Adversarial Training |
Speech Content Retrieval Model Based on Integrated Neural Network for Natural | language | Description, A |
Spoken | language | clustering in the i-vectors space |
Spoken | language | Identification for Indian Languages Using Split and Merge EM Algorithm |
Spoken | language | Identification for Indian Languages Using Split and Merge EM Algorithm |
Spoken | language | identification in unseen channel conditions using modified within-sample similarity loss |
Spoken | language | Recognition: From Fundamentals to Practice |
SRN/HMM system for signer-independent continuous sign | language | recognition, A |
SST-VLM: Sparse Sampling-twice Inspired Video- | language | Model |
Stability of three-way concepts and its application to natural | language | generation |
StableNet: Distinguishing the hard samples to overcome | language | priors in visual question answering |
Starting Point Selection and Multiple-Standard Matching for Video Object Segmentation With | language | Annotation |
Statistical | language | Models for On-Line Handwriting Recognition |
Statistical | language | models for on-line handwritten sentence recognition |
Statistical Machine Translation as a | language | Model for Handwriting Recognition |
Statistical machine translation of subtitles for highly inflected | language | pair |
Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign | language | Recognition |
Stochastic | language | models for style-directed layout analysis of document images |
Stochastic | language | s for Picture Analysis |
Stochastic models for semantic parsing, multi-faceted topic discovery, and causal event inference: Perspectives from natural | language | processing |
Stochastic Syntax-Directed Translation Schemata for Correction of Errors in Context-Free | language | s |
Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and- | language | Navigation |
Structured Multi-Level Interaction Network for Video Moment Localization via | language | Query |
Structured Scene Memory for Vision- | language | Navigation |
study on effects of implicit and explicit | language | model information for DBLSTM-CTC based handwriting recognition, A |
StylerDALLE: | language | -Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model |
Subspace models for document script and | language | identification |
SubUNets: End-to-End Hand Shape and Continuous Sign | language | Recognition |
Summarization of JBIG2 Compressed Indian | language | Textual Images |
survey on mouth modeling and analysis for Sign | language | recognition, A |
SuS-X: Training-Free Name-Only Transfer of Vision- | language | Models |
Symbol Knowledge Extraction from a Simple Graphical | language | |
Symmetric Network with Spatial Relationship Modeling for Natural | language | -based Vehicle Retrieval |
Syntactic Analysis of Context Free Plex | language | s for Pattern Recognition |
Syntactic Pattern Recognition: Stochastic | language | s |
Synthesis of incidental detail as composable components in a functional | language | |
Synthetic data generation technique in Signer-independent sign | language | recognition |
system for teaching sign | language | using live gesture feedback, A |
Systolic Pyramid Automata, Cellular Automata and Array | language | s |
T-recognition of T- | language | s, a new approach to describe and program the parallel pattern recognition capabilities of d-dimensional tessellation structures |
Tactical Rewind: Self-Correction via Backtracking in Vision-And- | language | Navigation |
Taguchi-TOPSIS based HOG parameter selection for complex background sign | language | recognition |
Take the Scenic Route: Improving Generalization in Vision-and- | language | Navigation |
Taking a HINT: Leveraging Explanations to Make Vision and | language | Models More Grounded |
Talk2Nav: Long-Range Vision-and- | language | Navigation with Dual Attention and Spatial Memory |
Talking with signs A simple method to detect nouns and numbers in a non-annotated signs | language | corpus |
TALL: Temporal Activity Localization via | language | Query |
TANA: The amalgam neural architecture for sarcasm detection in indian indigenous | language | combining LSTM and SVM with word-emoji embeddings |
Task Residual for Tuning Vision- | language | Models |
Task-Oriented Multi-Modal Mutual Learning for Vision- | language | Models |
Teaching Structured Vision and | language | Concepts to Vision and Language Models |
Teaching Structured Vision and | language | Concepts to Vision and Language Models |
Technical Perspective: Visualization Search: From Sketching to Natural | language | |
Techniques for | language | identification for hybrid Arabic-English document images |
TechWare: Speaker and Spoken | language | Recognition Resources |
Telescopic Vector Composition and Polar Accumulated Motion Residuals for Feature Extraction in Arabic Sign | language | Recognition |
Temporal Accumulative Features for Sign | language | Recognition |
Temporal Action Detection Using a Statistical | language | Model |
Temporal Lift Pooling for Continuous Sign | language | Recognition |
Temporal Moment Localization via Natural | language | by Utilizing Video Question Answers as a Special Variant and Bypassing NLP for Corpora |
Temporally | language | Grounding With Multi-Modal Multi-Prompt Tuning |
Test of Time: Instilling Video- | language | Models with a Sense of Time |
Test Sample Selection for Handwriting Recognition Through | language | Modeling |
Text and Layout Information Extraction from Document Files of Various Formats Based on the Analysis of Page Description | language | |
Text- and speech-based phonotactic models for spoken | language | identification of Basque and Spanish |
Text2Shape: Generating Shapes from Natural | language | by Learning Joint Embeddings |
Text2Sign: Towards Sign | language | Production Using Neural Machine Translation and Generative Adversarial Networks |
Thai sign | language | translation using Scale Invariant Feature Transform and Hidden Markov Models |
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and- | language | Navigation |
This Is My Unicorn, Fluffy: Personalizing Frozen Vision- | language | Representations |
Three-Dimensional Shape and Motion Reconstruction for the Analysis of American Sign | language | |
Three-Dimensional Sign | language | Recognition With Angular Velocity Maps and Connived Feature ResNet |
Thumb Modelling for the Generation of Sign | language | |
Tibetan | language | Model That Considers the Relationship Between Suffixes and Functional Words, A |
tinySLAM: A SLAM algorithm in less than 200 lines C- | language | program |
Too Large; Data Reduction for Vision- | language | Pre-Training |
Toolbox of Image Processing Using the Python | language | |
Topic | language | Model Adaption for Recognition of Homologous Offline Handwritten Chinese Text Image |
Topological Planning with Transformers for Vision-and- | language | Navigation |
Topology and | language | of Relationships in the Visual Genome Dataset, The |
TOUCHDOWN: Natural | language | Navigation and Spatial Reasoning in Visual Street Environments |
Toward a Motor Theory of Sign | language | Perception |
Toward Generation of 3-Dimensional Models of Objects Using 2-Dimensional Figures and Explanations in | language | |
Toward Modeling Sign | language | Coarticulation |
Toward Unified Token Learning for Vision- | language | Tracking |
Towards a one-way American Sign | language | translator |
Towards a visual Sign | language | dataset for home care services |
Towards Accurate Visual and Natural | language | -Based Vehicle Retrieval Systems |
Towards An American Sign | language | Interface |
Towards an Automatic Annotation of French Sign | language | Videos: Detection of Lexical Signs |
Towards an Exhaustive Evaluation of Vision- | language | Foundation Models |
Towards Automatic Body | language | Annotation |
Towards Bridged Vision and | language | : Learning Cross-Modal Knowledge Representation for Relation Extraction |
Towards coherent natural | language | description of video streams |
Towards Design of a Natural Picture Description | language | |
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video- | language | Retrieval |
Towards Fraudulent URL Classification with Large | language | Model based on Deep Learning |
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision- | language | Architecture |
Towards | language | -Free Training for Text-to-Image Generation |
Towards | language | -Guided Visual Recognition via Dynamic Convolutions |
Towards Learning a Generic Agent for Vision-and- | language | Navigation via Pre-Training |
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and- | language | Tasks |
Towards More Flexible and Accurate Object Tracking with Natural | language | : Algorithms and Benchmark |
Towards recognition of facial expressions in sign | language | : Tracking facial features under occlusion |
Towards sign | language | recognition based on body parts relations |
Towards subject independent continuous sign | language | recognition: A segment and merge approach |
Towards surveillance video search by natural | language | query |
Towards Unifying Medical Vision-and- | language | Pre-training via Soft Prompts |
Towards Vision- | language | Mechanistic Interpretability: A Causal Tracing Tool for BLIP |
Towards Zero-Shot Sign | language | Recognition |
Towers of Babel: Combining Images, | language | , and 3D Geometry for Learning Multimodal Vision |
Tracked-Vehicle Retrieval by Natural | language | Descriptions With Domain Adaptive Knowledge |
Tracked-Vehicle Retrieval by Natural | language | Descriptions with Multi-Contextual Adaptive Knowledge |
Tracking by Natural | language | Specification |
Tracking by Natural | language | Specification with Long Short-term Context Decoupling |
Tracking continuous emotional trends of participants during affective dyadic interactions using body | language | and speech information |
Tracking facial features under occlusions and recognizing facial expressions in sign | language | |
Tracking Using Dynamic Programming for Appearance-Based Sign | language | Recognition |
Training CNNs for 3-D Sign | language | Recognition With Color Texture Coded Joint Angular Displacement Maps |
Transfer Learning For Videos: From Action Recognition To Sign | language | Recognition |
Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different | language | s |
Transfer Learning in Sign | language | |
Transferable Representation Learning in Vision-and- | language | Navigation |
Transferring Cross-Domain Knowledge for Video Sign | language | Recognition |
Transferring Vision- | language | Models for Visual Recognition: A Classifier Perspective |
Transform-Retrieve-Generate: Natural | language | -Centric Outside-Knowledge Visual Question Answering |
Transformer vision- | language | tracking via proxy token guided cross-modal fusion |
Transformer-based | language | models for mental health issues: A survey |
Transformer-Based | language | -Person Search With Multiple Region Slicing |
Transition movement models for large vocabulary continuous sign | language | recognition |
Translating Video Content to Natural | language | Descriptions |
Translating video into | language | by enhancing visual and language representations |
Translating video into | language | by enhancing visual and language representations |
TransVG++: End-to-End Visual Grounding With | language | Conditioned Vision Transformer |
Two-Pass Clustering Technique for Orientation-Invariant and | language | -Independent Text Localization |
UC2: Universal Cross-lingual Cross-modal Vision-and- | language | Pre-training |
ULIP: Learning a Unified Representation of | language | , Images, and Point Clouds for 3D Understanding |
Understanding and Mitigating Overfitting in Prompt Tuning for Vision- | language | Models |
Understanding | language | Through Vision |
Understanding Motion in Sign | language | : A New Structured Translation Dataset |
Uni-NLX: Unifying Textual Explanations for Vision and Vision- | language | Tasks |
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision- | language | Tasks |
Unified Framework for | language | Guided Image Completion, An |
Unified Multi-modal Structure for Retrieving Tracked Vehicles through Natural | language | Descriptions, A |
Unified System for Segmentation and Tracking of Face and Hands in Sign | language | Recognition, A |
Unified Transformer with Isomorphic Branches for Natural | language | Tracking |
Unified Visual Relationship Detection with Vision and | language | Models |
Unified Visual-Semantic Embeddings: Bridging Vision and | language | With Structured Meaning Representations |
UniTAB: Unifying Text and Box Outputs for Grounded Vision- | language | Modeling |
Universal Multimodal Representation for | language | Understanding |
UniVTG: Towards Unified Video- | language | Temporal Grounding |
Unpaired Image Captioning by | language | Pivoting |
Unranked tree | language | s |
Unraveling a Decade: A Comprehensive Survey on Isolated Sign | language | Recognition |
Unreasonable Effectiveness of Large | language | -Vision Models for Source-free Video Domain Adaptation, The |
Unsupervised 3D Perception with 2D Vision- | language | Distillation for Autonomous Driving |
Unsupervised classification of extreme facial events using active appearance models tracking for sign | language | videos |
Unsupervised | language | Learning for Discovered Visual Concepts |
Unsupervised | language | model adaptation for handwritten Chinese text recognition |
Unsupervised Newspaper Segmentation Using | language | Context |
Unsupervised Vision-and- | language | Pretraining via Retrieval-based Multi-Granular Alignment |
Unsupervised Vision- | language | Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships |
Unsupervised Vision- | language | Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships |
Use of a priori descriptions in a high-level | language | and management of the uncertainty in a scene recognition system |
User oriented | language | model for face detection |
User-independent system for sign | language | finger spelling recognition |
Using ChatGPT and other AI-assisted tools to improve manuscripts readability and | language | |
Using | language | to Drive the Perceptual Grouping of Local Image Features |
Using | language | to Learn Structured Appearance Models for Image Annotation |
Using multiple sequence alignment and statistical | language | model to integrate multiple Chinese address recognition outputs |
Using Prolog to implement a compiler for a parallel image processing | language | |
Using Signing Space as a Representation for Sign | language | Processing |
Using String | language | s to Describe Picture Languages |
Using String | language | s to Describe Picture Languages |
Using word embeddings to generate data-driven human agent decision-making from natural | language | |
Utilizing Invariant Descriptors for Finger Spelling American Sign | language | Using SVM |
Utterance Generation With Variational Auto-Encoder for Slot Filling in Spoken | language | Understanding |
Uyghur | language | Text Detection in Complex Background Images Using Enhanced MSERs |
V&L Net Workshop on | language | for Vision |
V2A - Vision to Action: Learning Robotic Arm Actions Based on Vision and | language | |
V2S: Voice to Sign | language | Translation System for Malaysian Deaf People |
Variant Design in Immersive Virtual Reality: A Markup | language | for Scalable CSG Parts |
Variant of Pure Two-Dimensional Context-Free Grammars Generating Picture | language | s, A |
Variational Bayesian Sequence-to-sequence Networks for Memory-Efficient Sign | language | Translation |
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and- | language | Research |
Verbs in Action: Improving verb understanding in video- | language | models |
Verification Method for Viewpoint Invariant Sign | language | Recognition, A |
Vid2Seq: Large-Scale Pretraining of a Visual | language | Model for Dense Video Captioning |
video coding system for sign | language | communication at low bit rates, A |
Video Event Understanding Using Natural | language | Descriptions |
Video Object Grounding Using Semantic Roles in | language | Description |
Video Object Segmentation with | language | Referring Expressions |
Video Question Answering Using | language | -Guided Deep Compressed-Domain Video Feature |
Video scene classification based on natural | language | description |
Video Skimming and Characterization through the Combination of Image and | language | Understanding Techniques |
Video-and- | language | (VidL) models and their cognitive relevance |
Video-based Continuous Sign | language | Recognition Using Statistical Methods |
Video-Text Compliance: Activity Verification Based on Natural | language | Instructions |
VideoBERT: A Joint Model for Video and | language | Representation Learning |
Viewpoint Invariant Sign | language | Recognition |
VILA: Learning Image Aesthetics from User Comments with Vision- | language | Pretraining |
ViLEM: Visual- | language | Error Modeling for Image-Text Retrieval |
ViLLA: Fine-Grained Vision- | language | Representation Learning from Real-World Data |
ViLTA: Enhancing Vision- | language | Pre-training through Textual Augmentation |
VindLU: A Recipe for Effective Video-and- | language | Pretraining |
VinVL: Revisiting Visual Representations in Vision- | language | Models |
Violin: A Large-Scale Dataset for Video-and- | language | Inference |
Vision + | language | Applications: A Survey |
Vision and Action in the | language | -Ready Brain: From Mirror Neurons to SemRep |
Vision and | language | Integration Meets Multimedia Fusion |
Vision of Vision and | language | Comprises Action: An Example From Road Traffic, A |
Vision-and- | language | Algorithmic Reasoning Workshop |
Vision-and- | language | Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments |
Vision-based approach for American Sign | language | recognition using Edge Orientation Histogram |
Vision-Based Navigation With | language | -Based Assistance via Imitation Learning With Indirect Intervention |
Vision-Based Taiwanese Sign | language | Recognition, A |
Vision- | language | integration using constrained local semantic features |
Vision- | language | Matching for Text-to-Image Synthesis via Generative Adversarial Networks |
Vision- | language | Models Performing Zero-Shot Tasks Exhibit Disparities Between Gender Groups |
Vision- | language | Models, Language-Vision Models, VQA |
Vision- | language | Models, Language-Vision Models, VQA |
Vision- | language | Navigation |
Vision- | language | Navigation Policy Learning and Adaptation |
Vision- | language | Navigation with Random Environmental Mixup |
Vision- | language | Navigation With Self-Supervised Auxiliary Reasoning Tasks |
Vision- | language | Pre-Training for Boosting Scene Text Detectors |
Vision- | language | Pre-Training with Triple Contrastive Learning |
Vision- | language | Pre-Training: Basics, Recent Advances, and Future Trends |
Vision- | language | Transformer and Query Generation for Referring Segmentation |
Vision-to- | language | Tasks Based on Attributes and Attention Mechanism |
Visual Alignment Constraint for Continuous Sign | language | Recognition |
Visual Genome: Connecting | language | and Vision Using Crowdsourced Dense Image Annotations |
Visual | language | Framework for Plant Modeling Using L-System |
Visual | language | Identification from Facial Landmarks |
Visual | language | Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images |
Visual Re-ranking with Natural | language | Understanding for Text Spotting |
Visual Recognition of American Sign | language | Using Hidden Markov Models |
Visual Relationship Detection with | language | Priors |
Visual Sign | language | Recognition |
Visual Sign | language | Recognition Based on HMMs and Auto-regressive HMMs |
Visual Translator: Linking Perceptions And Natural- | language | Descriptions |
Visual- | language | Prompt Tuning with Knowledge-Guided Context Optimization |
VisualGPT: Data-efficient Adaptation of Pretrained | language | Models for Image Captioning |
Visually-Prompted | language | Model for Fine-Grained Scene Graph Generation in an Open World |
Vitaa: Visual-textual Attributes Alignment in Person Search by Natural | language | |
VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and- | language | Tasks |
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision- | language | Transformers |
VL-Match: Enhancing Vision- | language | Pretraining with Token-Level and Instance-Level Matching |
VL-PET: Vision-and- | language | Parameter-Efficient Tuning via Granularity Control |
VLANet: Video- | language | Alignment Network for Weakly-supervised Video Moment Retrieval |
VLCAP: Vision- | language | with Contrastive Learning for Coherent Video Paragraph Captioning |
VLCDoC: Vision- | language | contrastive pre-training model for cross-Modal document classification |
VLG-Net: Video- | language | Graph Matching Network for Video Grounding |
VLGrammar: Grounded Grammar Induction of Vision and | language | |
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and- | language | Navigation |
VLN_BERT: A Recurrent Vision-and- | language | BERT for Navigation |
VLPD: Context-Aware Pedestrian Detection via Vision- | language | Semantic Self-Supervision |
VLSlice: Interactive Vision-and- | language | Slice Discovery |
VLT: Vision- | language | Transformer and Query Generation for Referring Segmentation |
Voice Interaction for Augmented Reality Navigation Interfaces with Natural | language | Understanding |
Von Mises-Fisher Models in the Total Variability Subspace for | language | Recognition |
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural | language | Guidance |
Weakly Supervised Grounding for VQA in Vision- | language | Transformers |
Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign | language | Videos |
Weakly Supervised Metric Learning towards Signer Adaptation for Sign | language | Recognition |
Weakly supervised moment localization with natural | language | based on semantic reconstruction |
Weakly Supervised Temporal Adjacent Network for | language | Grounding |
Weakly Supervised Training of a Sign | language | Recognition System Using Multiple Instance Learning Density Matrices |
Wearable Computing Based American Sign | language | Recognizer, A |
Web Document Parsing: A New Approach to Modeling Layout- | language | Relations |
Web page summarization for handheld devices: a natural | language | approach |
WEDGE: A multi-weather autonomous driving dataset built from generative vision- | language | models |
Weighted Finite-State Transducer (WFST)-Based | language | Model for Online Indic Script Handwriting Recognition, A |
Weighted loss functions to make risk-based | language | identification fused decisions |
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal | language | Models |
What Value Do Explicit High Level Concepts Have in Vision to | language | Problems? |
Why Is Prompt Tuning for Vision- | language | Models Robust to Noisy Labels? |
Wide coverage natural | language | processing using kernel methods and neural networks for structured data |
Winoground: Probing Vision and | language | Models for Visio-Linguistic Compositionality |
Word Level Recognition, | language | Models |
Word Segments in Category-Based | language | Models for Automatic Speech Recognition |
Word-level Deep Sign | language | Recognition from Video: A New Large-scale Dataset and Methods Comparison |
Work on the Integration of | language | and Vision at the University of Torino |
Workshop and Challenges for New Frontiers in Visual | language | Reasoning: Compositionality, Prompts and Causality |
Written | language | Recognition Based on Texture Analysis |
X-DETR: A Versatile Architecture for Instance-wise Vision- | language | Tasks |
X-Pool: Cross-Modal | language | -Video Attention for Text-Video Retrieval |
X2-VLM: All-in-One Pre-Trained Model for Vision- | language | Tasks |
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision | language | Model |
YouRefIt: Embodied Reference Understanding with | language | and Gesture |
Zero-Shot Grounding of Objects From Natural | language | Queries |
Zero-Shot Human-Object Interaction (HOI) Classification by Bridging Generative and Contrastive Image- | language | Models |
Zero-shot Natural | language | Video Localization |
Zero-Shot Temporal Action Detection via Vision- | language | Prompting |
1663 for language