_ | modal | _ |
3-D motion estimation by integrating visual cues in 2-D multi- | modal | opti-acoustic stereo sequences |
3D convolutional neural networks by | modal | fusion |
3D Hand Pose Estimation with Disentangled Cross- | modal | Latent Space |
3D Medical Multi- | modal | Segmentation Network Guided by Multi-source Correlation Constraint |
3D+2D Face Localization Using Boosting in Multi- | modal | Feature Space |
3View deep canonical correlation analysis for cross- | modal | retrieval |
4D-Net for Learned Multi- | modal | Alignment |
AAFormer: A Multi- | modal | Transformer Network for Aerial Agricultural Images |
Accurate M-hausdorff distance similarity combining distance orientation for matching multi- | modal | sensor images |
Achieving High Multi- | modal | Registration Performance Using Simplified Hough-Transform with Improved Symmetric-SIFT |
ACMM: Aligned Cross- | modal | Memory for Few-Shot Image and Sentence Matching |
Actor-agnostic Multi-label Action Recognition with Multi- | modal | Query |
AdaMML: Adaptive Multi- | modal | Learning for Efficient Video Recognition |
Adapt Everywhere: Unsupervised Adaptation of Point-Clouds and Entropy Minimization for Multi- | modal | Cardiac Image Segmentation |
Adaptive automatic object recognition in single and multi- | modal | sensor data |
Adaptive Context-Aware Multi- | modal | Network for Depth Completion |
Adaptive Cross- | modal | Prototypes for Cross-Domain Visual-Language Retrieval |
Adaptive Cross- | modal | Transferable Adversarial Attacks From Images to Videos |
Adaptive decomposition method for multi- | modal | medical image fusion |
Adaptive habituation detection to build human computer interactive systems using a real-time cross- | modal | computation |
Adaptive Label-Aware Graph Convolutional Networks for Cross- | modal | Retrieval |
Adaptive Latent Diffusion Model for 3D Medical Image to Image Translation: Multi- | modal | Magnetic Resonance Imaging Study |
Adaptive Marginalized Semantic Hashing for Unpaired Cross- | modal | Retrieval |
Adaptive multi- | modal | stereo people tracking without background modelling |
Adaptive Semi-Supervised Feature Selection for Cross- | modal | Retrieval |
Adversarial Attack on Deep Cross- | modal | Hamming Retrieval |
Adversarial Graph Convolutional Network for Cross- | modal | Retrieval |
Adversarial Learning-Based Semantic Correlation Representation for Cross- | modal | Retrieval |
Adversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with Multi- | modal | Learning |
Adversarial-Metric Learning for Audio-Visual Cross- | modal | Matching |
AF: An Association-Based Fusion Method for Multi- | modal | Classification |
Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross- | modal | Retrieval |
AGRFNet: Two-stage cross- | modal | and multi-level attention gated recurrent fusion network for RGB-D saliency detection |
AIDE: A Vision-Driven Multi-View, Multi- | modal | , Multi-Tasking Dataset for Assistive Driving Perception |
Air Pollution Prediction with Multi- | modal | Data and Deep Neural Networks |
Algorithms of Multi- | modal | Route Planning Based on the Concept of Switch Point |
Aligning Salient Objects to Queries: A Multi- | modal | and Multi-object Image Retrieval Framework |
Alternating Co-Quantization for Cross- | modal | Hashing |
AMC: Attention Guided Multi- | modal | Correlation Learning for Image Search |
AMFuse: Add-Multiply-Based Cross- | modal | Fusion Network for Multi-Spectral Semantic Segmentation |
AMM-FuseNet: Attention-Based Multi- | modal | Image Fusion Network for Land Cover Mapping |
aMM: Towards adaptive ranking of multi- | modal | documents |
AnANet: Association and Alignment Network for Modeling Implicit Relevance in Cross- | modal | Correlation Classification |
Animation from Blur: Multi- | modal | Blur Decomposition with Motion Guidance |
Answer-checking in Context: A Multi- | modal | Fully Attention Network for Visual Question Answering |
Anticipative Feature Fusion Transformer for Multi- | modal | Action Anticipation |
Appearance Label Balanced Triplet Loss for Multi- | modal | Aerial View Object Classification |
Application of Multi- | modal | Features for Terrain Classification on a Mobile System |
Application of Multi- | modal | Fusion Attention Mechanism in Semantic Segmentation |
Applying Segment-Level Attention on Bi- | modal | Transformer Encoder for Audio-Visual Emotion Recognition |
Applying stochastic second-order entropy images to multi- | modal | image registration |
ARL Multi- | modal | Sensor: A research tool for target signature collection, algorithm validation, and emplacement studies, The |
Ask amp;Confirm: Active Detail Enriching for Cross- | modal | Retrieval with Partial Query |
Assisting Multi | modal | Named Entity Recognition by cross-modal auxiliary tasks |
Associating Multi- | modal | Brain Imaging Phenotypes and Genetic Risk Factors via a Dirty Multi-Task Learning Method |
Asymmetric Correlation Quantization Hashing for Cross- | modal | Retrieval |
Asymmetric cross- | modal | hashing with high-level semantic similarity |
Asymmetric Scalable Cross- | modal | Hashing |
Asymmetric Supervised Consistent and Specific Hashing for Cross- | modal | Retrieval |
Asymmetry-aware bilinear pooling in multi- | modal | data for head pose estimation |
Attention-Aware Deep Adversarial Hashing for Cross- | modal | Retrieval |
Attention-based Multi- | modal | Emotion Recognition from Art |
Attentive Cross- | modal | Fusion Network for RGB-D Saliency Detection |
Attentive, Multi- | modal | Laser Eye, An |
Attribute-Guided Multiple Instance Hashing Network for Cross- | modal | Zero-Shot Hashing |
Attribute-Image Person Re-identification via | modal | -Consistent Metric Learning |
Audio Visual Speaker Verification Based on Hybrid Fusion of Cross | modal | Features |
Audio-Visual Emotion Recognition With Preference Learning Based on Intended and Multi- | modal | Perceived Labels |
Audio-visual flow: A variational approach to multi- | modal | flow estimation |
Audio-Visual Instance Discrimination with Cross- | modal | Agreement |
Audiovisual Generalised Zero-shot Learning with Cross- | modal | Attention and Language |
Augmented Adversarial Training for Cross- | modal | Retrieval |
Autoencoder-Based Collaborative Attention GAN for Multi- | modal | Image Synthesis |
Automatic and Accurate Conflation of Different Road-Network Vector Data towards Multi- | modal | Navigation |
Automatic bi- | modal | emotion recognition system based on fusion of facial expressions and emotion extraction from speech |
Automatic image annotation based on Gaussian mixture model considering cross- | modal | correlations |
Automatic Multi- | modal | Dialogue Scene Indexing |
Automatic Quantification of Tumour Hypoxia From Multi- | modal | Microscopy Images Using Weakly-Supervised Learning Methods |
Automatic semantic modeling of structured data sources with cross- | modal | retrieval |
Automatic Sleep System Recommendation by Multi- | modal | RBG-Depth-Pressure Anthropometric Analysis |
AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi- | modal | Embeddings |
BadCM: Invisible Backdoor Attack Against Cross- | modal | Learning |
BalaGAN: Cross- | modal | Image Translation Between Imbalanced Domains |
Bayesian Characterization of Uncertainty in Multi- | modal | Image Registration |
BDNet: A BERT-based dual-path network for text-to-image cross- | modal | person re-identification |
BEAT: A Large-Scale Semantic and Emotional Multi- | modal | Dataset for Conversational Gestures Synthesis |
Benchmark for Multi- | modal | LiDAR SLAM with Ground Truth in GNSS-Denied Environments, A |
BEV-DG: Cross- | modal | Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation |
Beyond a Pre-Trained Object Detector: Cross- | modal | Textual and Visual Context for Image Captioning |
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross | modal | Attention |
Beyond the Deep Metric Learning: Enhance the Cross- | modal | Matching with Adversarial Discriminative Domain Regularization |
Bi-attention | modal | Separation Network for Multimodal Video Fusion |
Bi- | modal | authentication in mobile environments using session variability modelling |
Bi- | modal | biometric authentication on mobile phones in challenging conditions |
Bi- | modal | Compositional Network for Feature Disentanglement |
bi- | modal | face recognition framework integrating facial expression with facial appearance, A |
Bi- | modal | First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features |
Bi- | modal | Handwritten Text Corpus: Baseline Results, A |
Bi- | modal | Handwritten Text Recognition (BiHTR) ICPR 2010 Contest Report |
Bi- | modal | Progressive Mask Attention for Fine-Grained Recognition |
Bi- | modal | regression for Apparent Personality trait Recognition |
BiCro: Noisy Correspondence Rectification for Multi- | modal | ity Data via Bi-directional Cross-modal Similarity Consistency |
Bidirectional Cross- | modal | Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models |
Biometrics and forensics integration using deep multi- | modal | semantic alignment and joint embedding |
Biometrics, Cross- | modal | , Multi-Modal Systems, Multibiometrics, Combined Face and Other Features, Fusion |
Biometrics, Cross- | modal | , Multi-Modal Systems, Multibiometrics, Combined Face and Other Features, Fusion |
BIPS: Bi- | modal | Indoor Panorama Synthesis via Residual Depth-Aided Adversarial Learning |
Body Motion Analysis for Multi- | modal | Identity Verification |
Boosted Multi- | modal | Supervised Latent Dirichlet Allocation for Social Event Classification |
Boosting Entity-Aware Image Captioning With Multi- | modal | Knowledge Graph |
Boosting LiDAR-Based Semantic Labeling by Cross- | modal | Training Data Generation |
Boosting Multi- | modal | Model Performance with Adaptive Gradient Modulation |
Boundary-Aware RGBD Salient Object Detection With Cross- | modal | Feature Sampling |
Brain Tumor Cell Density Estimation from Multi- | modal | MR Images Based on a Synthetic Tumor Growth Model |
Brain tumor segmentation based on the dual-path network of multi- | modal | MRI images |
Branch-Fusion-Net for Multi- | modal | Continuous Dimensional Emotion Recognition |
Branding: Fusion of Meta Data and Musculoskeletal Radiographs for Multi- | modal | Diagnostic Recognition |
Bridging Music and Image via Cross- | modal | Ranking Analysis |
Bridging the Gap between Multi-focus and Multi- | modal | : A Focused Integration Framework for Multi-modal Image Fusion |
Bridging the Gap between Multi-focus and Multi- | modal | : A Focused Integration Framework for Multi-modal Image Fusion |
Building a multi- | modal | Arabic corpus (MMAC) |
Building a Multi- | modal | Thesaurus from Annotated Images |
C2ST: Cross- | modal | Contextualized Sequence Transduction for Continuous Sign Language Recognition |
C3Net: Cross- | modal | Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images |
C3Net: Cross- | modal | Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images |
CAD: Contextual Multi- | modal | Alignment for Dynamic AVQA |
Cali-NCE: Boosting Cross- | modal | Video Representation Learning with Calibrated Alignment |
Calibrank: Effective Lidar-Camera Extrinsic Calibration By Multi- | modal | Learning To Rank |
CAMP: Cross- | modal | Adaptive Message Passing for Text-Image Retrieval |
Cascade Attention Guided Residue Learning GAN for Cross- | modal | Translation |
CASIA-SURF CeFA: A Benchmark for Multi- | modal | Cross-Ethnicity Face Anti-spoofing |
CATNet: Cross- | modal | fusion for audio-visual speech recognition |
CAVER: Cross- | modal | View-Mixed Transformer for Bi-Modal Salient Object Detection |
CAVER: Cross- | modal | View-Mixed Transformer for Bi-Modal Salient Object Detection |
CCAFusion: Cross- | modal | Coordinate Attention Network for Infrared and Visible Image Fusion |
CCANet: A Collaborative Cross- | modal | Attention Network for RGB-D Crowd Counting |
CCL: Cross- | modal | Correlation Learning With Multigrained Fusion by Hierarchical Network |
CEFusion: Multi- | modal | medical image fusion via cross encoder |
ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi- | modal | References |
Characteristic Views Extraction | modal | Based-on Deep Reinforcement Learning for 3D Model Retrieval |
CHOP: An orthogonal hashing method for zero-shot cross- | modal | retrieval |
Class Concentration with Twin Variational Autoencoders for Unsupervised Cross- | modal | Hashing |
Class consistent multi- | modal | fusion with binary features |
Class-Agnostic Object Detection with Multi- | modal | Transformer |
classification of multi- | modal | data with hidden conditional random field, The |
Clip retrieval using multi- | modal | biometrics in meeting archives |
Closing the Domain Gap for Cross- | modal | Visible-Infrared Vehicle Re-identification |
Clothing generation by multi- | modal | embedding: A compatibility matrix-regularized GAN model |
Cluster-wise unsupervised hashing for cross- | modal | similarity search |
Clustering adaptive canonical correlations for high-dimensional multi- | modal | data |
CMC2R: Cross- | modal | collaborative contextual representation for RGBT tracking |
CMD: Self-supervised 3D Action Representation Learning with Cross- | modal | Mutual Distillation |
CMGFNet: A deep cross- | modal | gated fusion network for building extraction from very high-resolution remote sensing images |
CMGNet: Collaborative multi- | modal | graph network for video captioning |
CMIR-NET: A deep learning based model for cross- | modal | retrieval in remote sensing |
CMLocate: A cross- | modal | automatic visual geo-localization framework for a natural environment without GNSS information |
CMX: Cross- | modal | Fusion for RGB-X Semantic Segmentation With Transformers |
CNN Based Yeast Cell Segmentation in Multi- | modal | Fluorescent Microscopy Data |
Co-inference for Multi- | modal | Scene Analysis |
Coarse-to-Fine Semantic Alignment for Cross- | modal | Moment Localization |
Coercive region-level registration for multi- | modal | images |
Collaborative Diffusion for Multi- | modal | Face Generation and Editing |
Collaborative Multi- | modal | Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition, A |
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi- | modal | Highlight Detection in Movies |
Collaborative Quantization for Cross- | modal | Similarity Search |
Collaborative Uncertainty Benefits Multi-Agent Multi- | modal | Trajectory Forecasting |
Collecting Cross- | modal | Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception |
Collection of Visual Data in Climbing Experiments for Addressing the Role of Multi- | modal | Exploration in Motor Learning Efficiency |
Collective Affinity Learning for Partial Cross- | modal | Hashing |
Collective Reconstructive Embeddings for Cross- | modal | Hashing |
Color Texture Segmentation Based on the | modal | Energy of Deformable Surfaces |
Combined Spiral Transformation and Model-Driven Multi- | modal | Deep Learning Scheme for Automatic Prediction of TP53 Mutation in Pancreatic Cancer |
Combining Knowledge and Multi- | modal | Fusion for Meme Classification |
Combining Temporal and Multi- | modal | Approaches to Better Measure Accessibility to Banking Services |
Comparing Decision Fusion Paradigms Using k-NN Based Classifiers, Decision Trees and Logistic Regression in a Multi- | modal | Identity Verification Application |
Complementarity-aware cross- | modal | feature fusion network for RGB-T semantic segmentation |
Comprehensive Report on Machine Learning-Based Early Detection of Alzheimer's Disease Using Multi- | modal | Neuroimaging Data, A |
Computational Geometry-Based Scale-Space and | modal | Image Decomposition Application to Light Video-Microscopy Imaging |
Computing Range Flow from Multi- | modal | Kinect Data |
Conditional Sentence Generation and Cross- | modal | Reranking for Sign Language Translation |
Confidence-based dynamic cross- | modal | memory network for image aesthetic assessment |
Connected Vibrations: A | modal | Analysis Approach to Non-Rigid Motion Tracking |
Connecting Touch and Vision via Cross- | modal | Prediction |
Consistent multi- | modal | non-rigid registration based on a variational approach |
Constrained Bipartite Graph Learning for Imbalanced Multi- | modal | Retrieval |
Content-Based Music-Image Retrieval Using Self- and Cross- | modal | Feature Embedding Memory |
Contextual and Cross- | modal | Interaction for Multi-Modal Speech Emotion Recognition |
Contextual and Cross- | modal | Interaction for Multi-Modal Speech Emotion Recognition |
Continual Learning for Cross- | modal | Image-Text Retrieval Based on Domain-Selective Attention |
Continual learning in cross- | modal | retrieval |
Continuous cross- | modal | hashing |
Continuous Prediction of Lower-Limb Kinematics From Multi- | modal | Biomedical Signals |
Continuum regression for cross- | modal | multimedia retrieval |
Contra: (con)text (tra)nsformer for Cross- | modal | Video Retrieval |
Controlled Multi- | modal | Image Generation for Plant Growth Modeling |
COOKIE: Contrastive Cross- | modal | Knowledge Sharing Pre-training for Vision-Language Representation |
Correlation-Aware Attention Branch Network Using Multi- | modal | Data for Deterioration Level Estimation of Infrastructures |
Correspondence matching with | modal | clusters |
CORSMAL Challenge: Multi- | modal | Fusion and Learning for Robotics |
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross- | modal | Retrieval |
CPG3D: Cross- | modal | Priors Guided 3D Object Reconstruction |
Craquelurenet: Matching the Crack Structure In Historical Paintings for Multi- | modal | Image Registration |
Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross- | modal | Hashing |
CroMM-VSR: Cross- | modal | Memory Augmented Visual Speech Recognition |
CroMo: Cross- | modal | Learning for Monocular Depth Estimation |
CropAndWeed Dataset: a Multi- | modal | Learning Approach for Efficient Crop and Weed Manipulation, The |
Cross and Learn: Cross- | modal | Self-supervision |
Cross Domain Multi- | modal | Dataset for Robust Face Anti-spoofing, A |
Cross | modal | Compression With Variable Rate Prompt |
Cross | modal | Disambiguation |
Cross | modal | Distillation for Supervision Transfer |
Cross | modal | Focal Loss for RGBD Face Anti-Spoofing |
Cross | modal | metric learning with multi-level semantic relevance |
Cross | modal | Multiscale Fusion Net for Real-time RGB-D Detection |
Cross | modal | Retrieval with Querybank Normalisation |
Cross | modal | similarity learning with active queries |
Cross | modal | Transformer: Towards Fast and Robust 3D Object Detection |
Cross | modal | Video Representations for Weakly Supervised Active Speaker Localization |
Cross | modal | ity Knowledge Distillation for Multi-modal Aerial View Object Classification |
Cross-Domain Image Captioning via Cross- | modal | Retrieval and Model Adaptation |
Cross-Fertilization between Studies on ICT Practices of Use and Cross- | modal | Analysis of Verbal and Nonverbal Communication |
Cross-Level Multi- | modal | Features Learning With Transformer for RGB-D Object Recognition |
Cross-Lingual Text Image Recognition via Multi-Hierarchy Cross- | modal | Mimic |
Cross- | modal | 360° Depth Completion and Reconstruction for Large-Scale Indoor Environment |
Cross- | modal | 3D Shape Generation and Manipulation |
Cross- | modal | Adaptive Dual Association for Text-to-Image Person Retrieval |
Cross- | modal | Adversarial Reprogramming |
Cross- | modal | alignment and translation for missing modality action recognition |
Cross- | modal | Analysis of Speech, Gestures, Gaze and Facial Expressions |
Cross- | modal | and Hierarchical Modeling of Video and Text |
Cross- | modal | Approach for Extracting Semantic Relationships Between Concepts Using Tagged Images, A |
Cross- | modal | Approach to Cleansing Weakly Tagged Images, A |
Cross- | modal | Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos, A |
Cross- | modal | Attentional Context Learning for RGB-D Object Detection |
Cross- | modal | Background Suppression for Audio-Visual Event Localization |
Cross- | modal | categorisation of user-generated video sequences |
Cross- | modal | Causal Relational Reasoning for Event-Level Visual Question Answering |
Cross- | modal | Center Loss for 3D Cross-Modal Retrieval |
Cross- | modal | Center Loss for 3D Cross-Modal Retrieval |
Cross- | modal | Clinical Graph Transformer for Ophthalmic Report Generation |
Cross- | modal | co-feedback cellular automata for RGB-T saliency detection |
Cross- | modal | Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting |
Cross- | modal | context-gated convolution for multi-modal sentiment analysis |
Cross- | modal | context-gated convolution for multi-modal sentiment analysis |
Cross- | modal | Contrastive Distillation for Instructional Activity Anticipation |
Cross- | modal | Contrastive Learning for Text-to-Image Generation |
Cross- | modal | Contrastive Learning Network for Few-Shot Action Recognition |
Cross- | modal | Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval |
Cross- | modal | Correlation Learning by Adaptive Hierarchical Semantic Aggregation |
Cross- | modal | correlation learning with deep convolutional architecture |
Cross- | modal | Cross-Domain Dual Alignment Network for RGB-Infrared Person Re-Identification |
Cross- | modal | Cross-Domain Moment Alignment Network for Person Search |
cross- | modal | crowd counting method combining CNN and cross-modal transformer, A |
cross- | modal | crowd counting method combining CNN and cross-modal transformer, A |
Cross- | modal | Data Augmentation for Tasks of Different Modalities |
Cross- | modal | de-deviation for enhancing few-shot classification |
Cross- | modal | Deep Face Normals With Deactivable Skip Connections |
Cross- | modal | Deep Learning Applications: Audio-visual Retrieval |
Cross- | modal | Deep Networks For Document Image Classification |
Cross- | modal | Deep Variational Hand Pose Estimation |
Cross- | modal | Deep Variational Hashing |
Cross- | modal | Dense Passage Retrieval for Outside Knowledge Visual Question Answering |
Cross- | modal | Discrete Hashing |
Cross- | modal | discriminant adversarial network |
Cross- | modal | distillation for RGB-depth person re-identification |
Cross- | modal | domain adaptation for text-based regularization of image semantics in image retrieval systems |
Cross- | modal | dynamic convolution for multi-modal emotion recognition |
Cross- | modal | dynamic convolution for multi-modal emotion recognition |
Cross- | modal | Dynamic Networks for Video Moment Retrieval With Text Query |
Cross- | modal | Embeddings for Video and Audio Retrieval |
Cross- | modal | Enhancement Network for Multimodal Sentiment Analysis |
Cross- | modal | Face Matching: Beyond Viewed Sketches |
Cross- | modal | face matching: Tackling visual abstraction using fine-grained attributes |
Cross- | modal | Facial Attribute Recognition with Geometric Features |
Cross- | modal | Fashion Search |
Cross- | modal | feature extraction and integration based RGBD saliency detection |
Cross- | modal | Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification |
Cross- | modal | Food Retrieval: Learning a Joint Embedding of Food Images and Recipes With Semantic Consistency and Attention Mechanism |
Cross- | modal | fusion encoder via graph neural network for referring image segmentation |
Cross- | modal | Generation and Pair Correlation Alignment Hashing |
Cross- | modal | Graph With Meta Concepts for Video Captioning |
Cross- | modal | guidance based auto-encoder for multi-video summarization |
Cross- | modal | Hamming Hashing |
Cross- | modal | Hashing Based on Category Structure Preserving |
Cross- | modal | Hashing via Rank-Order Preserving |
Cross- | modal | Hierarchical Interaction Network for RGB-D Salient Object Detection |
Cross- | modal | Human-Robot Interaction |
Cross- | modal | Image Synthesis within Dual-Energy X-ray Security Imagery |
Cross- | modal | Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval |
Cross- | modal | Indexing, Cross-Modal Retrieval |
Cross- | modal | Indexing, Cross-Modal Retrieval |
Cross- | modal | Knowledge Adaptation for Language-Based Person Search |
Cross- | modal | Knowledge Distillation for Action Recognition |
Cross- | modal | knowledge learning with scene text for fine-grained image classification |
Cross- | modal | knowledge reasoning for knowledge-based visual question answering |
Cross- | modal | Knowledge Transfer Without Task-Relevant Source Data |
Cross- | modal | Latent Space Alignment for Image to Avatar Translation |
Cross- | modal | Learning for Domain Adaptation in 3D Semantic Segmentation |
Cross- | modal | Learning in Real World |
Cross- | modal | Learning to Rank via Latent Joint Representation |
Cross- | modal | Learning with 3D Deformable Attention for Action Recognition |
Cross- | modal | Manifold Cutmix for Self-supervised Video Representation Learning |
Cross- | modal | Map Learning for Vision and Language Navigation |
Cross- | modal | Matching CNN for Autonomous Driving Sensor Data Monitoring |
Cross- | modal | motion regeneration using Multimodal Deep Belief Network |
Cross- | modal | Music Retrieval and Applications: An Overview of Key Methodologies |
Cross- | modal | Orthogonal High-Rank Augmentation for RGB-Event Transformer-trackers |
Cross- | modal | Pattern-Propagation for RGB-T Tracking |
Cross- | modal | Perceptionist: Can Face Geometry be Gleaned from Voices? |
Cross- | modal | Person Search: A Coarse-to-Fine Framework using Bi-Directional Text-Image Matching |
Cross- | modal | Progressive Comprehension for Referring Segmentation |
Cross- | modal | propagation network for generalized zero-shot learning |
Cross- | modal | Prototype Driven Network for Radiology Report Generation |
Cross- | modal | prototype learning for zero-shot handwritten character recognition |
Cross- | modal | Pyramid Translation for RGB-D Scene Recognition |
Cross- | modal | Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking |
Cross- | modal | Recipe Retrieval: How to Cook this Dish? |
Cross- | modal | Recurrent Semantic Comprehension for Referring Image Segmentation |
Cross- | modal | Relational Reasoning Network for Visual Question Answering |
Cross- | modal | Relationship Inference for Grounding Referring Expressions |
Cross- | modal | Retrieval and Semantic Refinement for Remote Sensing Image Captioning |
Cross- | modal | retrieval in challenging scenarios using attributes |
Cross- | modal | Retrieval Using Contrastive Learning of Visual-Semantic Embeddings |
Cross- | modal | Retrieval Using Multiordered Discriminative Structured Subspace Learning |
Cross- | modal | Retrieval via Deep and Bidirectional Representation Learning |
Cross- | modal | retrieval via label category supervised matrix factorization hashing |
Cross- | modal | Retrieval With CNN Visual Features: A New Baseline |
Cross- | modal | Retrieval With Noisy Correspondence via Consistency Refining and Mining |
Cross- | modal | Retrieval With Noisy Labels |
Cross- | modal | Retrieval With Partially Mismatched Pairs |
Cross- | modal | Scalable Hyperbolic Hierarchical Clustering |
Cross- | modal | Scene Graph Matching for Relationship-aware Image-Text Retrieval |
Cross- | modal | Scene Networks |
Cross- | modal | Self-Attention Network for Referring Image Segmentation |
Cross- | modal | Self-Taught Learning for Image Retrieval |
Cross- | modal | semantic correlation learning by Bi-CNN network |
Cross- | modal | Semantic Enhanced Interaction for Image-Sentence Retrieval |
Cross- | modal | Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis |
Cross- | modal | social image clustering and tag cleansing |
Cross- | modal | Speaker Verification and Recognition: A Multilingual Perspective |
Cross- | modal | Style Transfer |
Cross- | modal | Subspace Learning via Pairwise Constraints |
Cross- | modal | Supervision for Learning Active Speaker Detection in Video |
Cross- | modal | Target Retrieval for Tracking by Natural Language |
Cross- | modal | Text Steganography Against Synonym Substitution-Based Text Attack |
Cross- | modal | topic correlations for multimedia retrieval |
Cross- | modal | Transferable Adversarial Attacks from Images to Videos |
Cross- | modal | Transformer for RGB-D semantic segmentation of production workshop objects |
Cross- | modal | Transformers for Infrared and Visible Image Fusion |
Cross- | modal | Translation and Alignment for Survival Analysis |
Cross- | modal | Transmission Strategy |
Cross- | modal | Variational Alignment of Latent Spaces |
Cross- | modal | Variational Auto-Encoder for Content-Based Micro-Video Background Music Recommendation |
Cross- | modal | Variational Framework For Food Image Analysis, A |
Cross- | modal | Visual Question Answering for Remote Sensing Data: the International Conference on Digital Image Computing: Techniques and Applications (DICTA 2021) |
Cross- | modal | Weighting Network for RGB-D Salient Object Detection |
Cross-Platform Multi- | modal | Topic Modeling for Personalized Inter-Platform Recommendation |
Cross-specificity: modelling data semantics for cross- | modal | matching and retrieval |
Cross-Year Multi- | modal | Image Retrieval Using Siamese Networks |
CrossCLR: Cross- | modal | Contrastive Learning For Multi-modal Video Representations |
CrossCLR: Cross- | modal | Contrastive Learning For Multi-modal Video Representations |
CrossFormer: Cross-guided attention for multi- | modal | object detection |
CrossFuser: Multi- | modal | Feature Fusion for End-to-End Autonomous Driving Under Unseen Weather Conditions |
CrossLocate: Cross- | modal | Large-scale Visual Geo-Localization in Natural Environments using Rendered Modalities |
CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via Cross- | modal | Consistency Training |
CrossPoint: Self-Supervised Cross- | modal | Contrastive Learning for 3D Point Cloud Understanding |
CU-Net+: Deep Fully Interpretable Network for Multi- | modal | Image Restoration |
CueVideo: a system for cross- | modal | search and browse of video databases |
Cycle-Consistent Deep Generative Hashing for Cross- | modal | Retrieval |
DAGNet: Depth-aware Glass-like objects segmentation via cross- | modal | attention |
Dance Style Transfer with Cross- | modal | Transformer |
DASC: Dense adaptive self-correlation descriptor for multi- | modal | and multi-spectral correspondence |
DASC: Robust Dense Descriptor for Multi- | modal | and Multi-Spectral Correspondence Estimation |
Dataset and Benchmark for Large-Scale Multi- | modal | Face Anti-Spoofing, A |
DCMAI: A Dynamical Cross- | modal | Alignment Interaction Framework for Document Key Information Extraction |
Decomposed Cross- | modal | Distillation for RGB-based Temporal Action Detection |
Decouple Before Interact: Multi- | modal | Prompt Learning for Continual Visual Question Answering |
Decoupled Cross- | modal | Phrase-Attention Network for Image-Sentence Matching |
Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross- | modal | Retrieval |
Deep Binary Reconstruction for Cross- | modal | Hashing |
Deep cascaded cross- | modal | correlation learning for fine-grained sketch-based image retrieval |
Deep Centralized Cross- | modal | Retrieval |
Deep continual hashing with gradient-aware memory for cross- | modal | retrieval |
Deep Convolutional Neural Network for Multi- | modal | Image Restoration and Fusion |
Deep Coupled ISTA Network for Multi- | modal | Image Super-Resolution |
Deep Coupled Metric Learning for Cross- | modal | Matching |
Deep Cross- | modal | Hashing |
Deep Cross- | modal | Hashing Based on Semantic Consistent Ranking |
Deep Cross- | modal | Image-Voice Retrieval in Remote Sensing |
Deep Cross- | modal | Projection Learning for Image-Text Matching |
Deep Cross- | modal | Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection |
Deep Cross- | modal | Retrieval Between Spatial Image and Acoustic Speech |
Deep Cross- | modal | Steganography Using Neural Representations |
Deep Discrete Cross- | modal | Hashing for Cross-Media Retrieval |
Deep Dual- | modal | Traffic Objects Instance Segmentation Method Using Camera and LIDAR Data for Autonomous Driving |
Deep Heterogeneous Face Recognition Networks Based on Cross- | modal | Distillation and an Equitable Distance Metric |
Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross- | modal | Retrieval |
Deep Learning Based Multi- | modal | Fusion Architectures for Maritime Vessel Detection |
Deep learning for multi- | modal | classification of cloud, shadow and land cover scenes in PlanetScope and Sentinel-2 imagery |
Deep Memory Network for Cross- | modal | Retrieval |
Deep Multi- | modal | CNN for Multi-Instance Multi-Label Image Classification, A |
Deep Multi- | modal | Discriminative and Interpretability Network for Alzheimer's Disease Diagnosis |
Deep Multi- | modal | Explanation Model for Zero-Shot Learning, A |
Deep Multi- | modal | Network Based Automated Depression Severity Estimation |
Deep Multi- | modal | Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges |
Deep Multi- | modal | Representation Schemes for Federated 3d Human Action Recognition |
Deep Multi- | modal | Vehicle Detection in Aerial ISR Imagery |
Deep multi-scale and multi- | modal | fusion for 3D object detection |
Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross- | modal | Retrieval |
Deep multi | modal | learning for cross-modal retrieval: One model for all tasks |
Deep Multiscale Fusion Hashing for Cross- | modal | Retrieval |
Deep Neighborhood-Preserving Hashing With Quadratic Spherical Mutual Information for Cross- | modal | Retrieval |
Deep Normalized Cross- | modal | Hashing with Bi-Direction Relation Reasoning |
Deep Perceptual Mapping for Cross- | modal | Face Recognition |
Deep Ranking Distribution Preserving Hashing for Robust Multi-Label Cross- | modal | Retrieval |
Deep Relation Embedding for Cross- | modal | Retrieval |
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi- | modal | Fusion |
Deep robust multilevel semantic hashing for multi-label cross- | modal | retrieval |
Deep Self-correlation Descriptor for Dense Cross- | modal | Correspondence |
Deep self-enhancement hashing for robust multi-label cross- | modal | retrieval |
Deep Semantic-Aware Proxy Hashing for Multi-Label Cross- | modal | Retrieval |
Deep Supervised Cross- | modal | Retrieval |
Deep Supervised Dual Cycle Adversarial Network for Cross- | modal | Retrieval |
DeepForest: Novel Deep Learning Models for Land Use and Land Cover Classification Using Multi-Temporal and - | modal | Sentinel Data of the Amazon Basin |
DeepFusion: Lidar-Camera Deep Fusion for Multi- | modal | 3D Object Detection |
deeply supervised residual network for HEp-2 cell classification via cross- | modal | transfer learning, A |
DeepM^2CDL: Deep Multi-Scale Multi- | modal | Convolutional Dictionary Learning Network |
Deformable Feature Aggregation for Dynamic Multi- | modal | 3D Object Detection |
Deformable Registration of Multi- | modal | Microscopic Images Using a Pyramidal Interactive Registration-Learning Methodology |
Delivering Arbitrary- | modal | Semantic Segmentation |
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross- | modal | Distillation |
Dense Cross- | modal | Correspondence Estimation With the Deep Self-Correlation Descriptor |
Depth Estimation of Multi- | modal | Scene Based on Multi-Scale Modulation |
Describe Images in a Boring Way: Towards Cross- | modal | Sarcasm Generation |
Describing Unseen Videos via Multi- | modal | Cooperative Dialog Agents |
Designing a holographic | modal | wavefront sensor for the detection of static ocular aberrations |
Detecting and Grounding Multi- | modal | Media Manipulation |
Development and Evaluation of a Mouse Emulator Using Multi- | modal | Real-time Head Tracking Systems with Facial Gesture Recognition as a Switching Mechanism |
DG3D: Generating High Quality 3D Textured Shapes by Learning to Discriminate Multi- | modal | Diffusion-Renderings |
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross- | modal | Semantic Alignment |
DiffDis: Empowering Generative Diffusion Model with Cross- | modal | Discrimination Capability |
Dilated high-resolution network driven RGB-T multi- | modal | crowd counting |
DIME: An Online Tool for the Visual Comparison of Cross- | modal | Retrieval Models |
Discrete Joint Semantic Alignment Hashing for Cross- | modal | Image-Text Search |
Discrete Latent Factor Model for Cross- | modal | Hashing |
Discrete | modal | Decomposition: a new approach for the reflectance modeling and rendering of real surfaces |
discrete | modal | transform and its application to lossy image compression, The |
Discrete online cross- | modal | hashing |
discrete search method for multi- | modal | non-rigid image registration, A |
Discrete Semantic Matrix Factorization Hashing for Cross- | modal | Retrieval |
Discriminate Cross- | modal | Quantization for Efficient Retrieval |
Discriminative correlation hashing for supervised cross- | modal | retrieval |
Discriminative Cross- | modal | Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection |
Discriminative Dictionary Learning With Common Label Alignment for Cross- | modal | Retrieval |
Discriminative Hallucination for Multi- | modal | Few-Shot Learning |
Discriminative Multi- | modal | Feature Fusion for RGBD Indoor Scene Recognition |
Discriminative semantic transitive consistency for cross- | modal | learning |
Discriminative Supervised Hashing for Cross- | modal | Similarity Search |
Discriminative Vectorial Framework for Multi- | modal | Feature Representation, A |
Disentangled Cross- | modal | Transformer for RGB-D Salient Object Detection and Beyond |
Disentangled Representation Learning for Cross- | modal | Biometric Matching |
Disperse Asymmetric Subspace Relation Hashing for Cross- | modal | Retrieval |
Dispersive Delay and Comb Filters Using a | modal | Structure |
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross- | modal | Knowledge Distillation |
Distributed Training and Inference of Deep Learning Models for Multi- | modal | Land Cover Classification |
Distribution-Consistent | modal | Recovering for Incomplete Multimodal Learning |
Do Cross | modal | Systems Leverage Semantic Relationships? |
Domain Invariant Subspace Learning for Cross- | modal | Retrieval |
Double Augmentation: A | modal | Transforming Method for Ship Detection in Remote Sensing Imagery |
Double branch synergies with | modal | reinforcement for weakly supervised temporal action detection |
Drive Act: A Multi- | modal | Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles |
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross- | modal | Distillation |
DriverMHG: A Multi- | modal | Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework |
Dual Path Multi- | modal | High-Order Features for Textual Content based Visual Question Answering |
Dual- and triple-stream RESUNET/UNET architectures for multi- | modal | liver segmentation |
dual- | modal | graph attention interaction network for person Re-identification, A |
Dual-supervised attention network for deep cross- | modal | hashing |
Dual-View Curricular Optimal Transport for Cross-Lingual Cross- | modal | Retrieval |
DurLAR: A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi- | modal | Autonomous Driving Applications |
Dyadformer: A Multi- | modal | Transformer for Long-Range Modeling of Dyadic Interactions |
Dynamic bi- | modal | fusion of images for the segmentation of pollen tubes in video |
Dynamic facial expression recognition with pseudo-label guided multi- | modal | pre-training |
Dynamic Pedobarography Transitional Objects by Lagrange's Equation with FEM, | modal | Matching and Optimization Techniques |
Dynamically Shifting Multi | modal | Representations via Hybrid-Modal Attention for Multimodal Sentiment Analysis |
Early Crop Classification via Multi- | modal | Satellite Data Fusion and Temporal Attention |
Edge-Based Multi- | modal | Registration and Application for Night Vision Devices |
Edge-Preserving Cross-Sharpening of Multi- | modal | Images |
Editorial paper for pattern recognition letters VSI on multi-view representation learning and multi- | modal | information representation |
EEG-Based Multi- | modal | Emotion Database with Both Posed and Authentic Facial Actions for Emotion Analysis, An |
Effective and efficient indexing in cross- | modal | hashing-based datasets |
Effective Fusion of Multi- | modal | Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling |
Effectiveness Guided Cross- | modal | Information Sharing for Aligned RGB-T Object Detection |
Efficient and fast multi- | modal | foreground-background segmentation using RGBD data |
Efficient CNN Architecture for Multi- | modal | Aerial View Object Classification |
Efficient disentangled representation learning for multi- | modal | finger biometrics |
Efficient Graph-Based Spatio-Temporal Indexing Method for Task-Oriented Multi- | modal | Scene Data Organization, An |
Efficient multi- | modal | fusion on supergraph for scalable image annotation |
Efficient multi- | modal | geometric mean metric learning |
Efficient multi- | modal | image registration using local-frequency maps |
Efficient Parameter-Free Adaptive Multi- | modal | Hashing |
EGGNOG: A Continuous, Multi- | modal | Data Set of Naturally Occurring Gestures with Ground Truth Labels |
Egocentric Live 4D Perception (Ego4D) Dataset: A large-scale first-person video dataset, supporting research in multi- | modal | machine perception for daily life activity |
EgoCom: A Multi-Person Multi- | modal | Egocentric Communications Dataset |
EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross- | modal | Retrieval |
EigenTrajectory: Low-Rank Descriptors for Multi- | modal | Trajectory Forecasting |
Emphasizing Complementary Samples for Non-literal Cross- | modal | Retrieval |
End-to-End Adversarial-Attention Network for Multi- | modal | Clustering |
End-to-end Multi- | modal | Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions |
Enhanced Multi | modal | Representation Learning with Cross-modal KD |
Enhanced speaker recognition based on intra- | modal | fusion and accent modeling |
Enhancing Alzheimer's Disease Diagnosis via Hierarchical 3D-FCN with Multi- | modal | Features |
Enhancing decision combination of face and fingerprint by exploitation of individual classifier space: An approach to multi- | modal | biometry |
Enhancing Multi- | modal | Features Using Local Self-attention for 3D Object Detection |
Enhancing pulmonary nodule detection via cross- | modal | alignment |
Enriched Music Representations With Multiple Cross- | modal | Contrastive Learning |
Ensemble Prior of Image Structure for Cross- | modal | Inference, An |
Entity-Graph Enhanced Cross- | modal | Pretraining for Instance-Level Product Retrieval |
Entity-Oriented Multi- | modal | Alignment and Fusion Network for Fake News Detection |
EPNet++: Cascade Bi-Directional Fusion for Multi- | modal | 3D Object Detection |
EV-Action: Electromyography-Vision Multi- | modal | Action Dataset |
Evaluation of 3D Feature Descriptors for Multi- | modal | Data Registration |
Evaluation of a multi- | modal | driver coaching function for electric vehicles |
evaluation of bi- | modal | facial appearance+facial expression face biometrics, An |
EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross- | modal | Knowledge Distillation |
Event-Based Fusion for Motion Deblurring with Cross- | modal | Attention |
Event-based Video Frame Interpolation with Cross- | modal | Asymmetric Bidirectional Motion Fields |
Everything at Once - Multi- | modal | Fusion Transformer for Video Retrieval |
Example-based cross- | modal | denoising |
Excitation-Reception Collinear Probe for Ultrasonic, Photoacoustic, and Thermoacoustic Tri- | modal | Volumetric Imaging, An |
Exemplar-based depth inpainting with arbitrary-shape patches and cross- | modal | matching |
EXIF as Language: Learning Cross- | modal | Associations between Images and Camera Metadata |
Experimental Comparison of Continuous-Wave and Frequency-Domain Fluorescence Tomography in a Commercial Multi- | modal | Scanner |
Experimental Results on Multi- | modal | Deepfake Detection |
Experts Collaboration Learning for Continual Multi- | modal | Reasoning |
Explicit Cross- | modal | Representation Learning for Visual Commonsense Reasoning |
Exploiting enhanced and robust RGB-D face representation via progressive multi- | modal | learning |
Exploiting the Manhattan-world assumption for extrinsic self-calibration of multi- | modal | sensor networks |
Exploring Self-Supervised Learning for Multi- | modal | Remote Sensing Pre-Training via Asymmetric Attention Fusion |
Exploring the Benefits of Cross- | modal | Coding |
Exposing and Mitigating Spurious Correlations for Cross- | modal | Retrieval |
Extracting a background image by a multi- | modal | scene background model |
Extrinsic calibration of multi- | modal | sensor arrangements with non-overlapping field-of-view |
Face Biometrics for Personal Identification: Multi-Sensory Multi- | modal | Systems |
Face Detection using Multi- | modal | Information |
Face Expression Recognition by Cross | modal | Data Association |
Face Recognition Using Multi- | modal | Low-Rank Dictionary Learning |
Face-iris multi- | modal | biometric system using multi-resolution Log-Gabor filter with spectral regression kernel discriminant analysis |
Facial Expression Recognition Based on Facial Region Segmentation and | modal | Value Approach |
Facial Expression Recognition Based on Multi- | modal | Features for Videos in the Wild |
Facial expression recognition using human machine interaction and multi- | modal | visualization analysis for healthcare applications |
Fail-Safe Multi- | modal | Localization Framework Using Heterogeneous Map-Matching Sources |
FAM3L: Feature-Aware Multi- | modal | Metric Learning for Integrative Survival Analysis of Human Cancers |
Farewell to Mutual Information: Variational Distillation for Cross- | modal | Person Re-Identification |
Fast and Robust Multi- | modal | Image Registration for 3D Knee Kinematics |
Fast Discrete Cross- | modal | Hashing Based on Label Relaxation and Matrix Factorization |
Fast Discrete Matrix Factorization Hashing for Large-scale Cross- | modal | Retrieval |
Fast Free-Vibration | modal | Analysis of 2-D Physics-Based Deformable Objects |
fast | modal | space transform for robust nonrigid shape retrieval, A |
Fast Multi- | modal | Approach to Facial Feature Detection, A |
Fast Unmediated Hashing for Cross- | modal | Retrieval |
Feature Integration via Back-Projection Ordering Multi- | modal | Gaussian Process Latent Variable Model for Rating Prediction |
Feature Neighbourhood Mutual Information for multi- | modal | image registration: An application to eye fundus imaging |
FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross- | modal | Association |
Few-Shot Activity Recognition with Cross- | modal | Memory Network |
Few-Shot Image and Sentence Matching via Aligned Cross- | modal | Memory |
Few-Shot Learning with Visual Distribution Calibration and Cross- | modal | Distribution Alignment |
Fine-Grained Alignment for Cross- | modal | Recipe Retrieval |
Fine-grained bidirectional attentional generation and knowledge-assisted networks for cross- | modal | retrieval |
Fine-grained Image-Text Matching by Cross- | modal | Hard Aligning Network |
FLeak-Seg: Automated Fundus Fluorescein Leakage Segmentation via Cross- | modal | Attention Learning |
Flexible Multi- | modal | Graph-Based Segmentation |
Flexible Multi-Temporal and Multi- | modal | Framework for Sentinel-1 and Sentinel-2 Analysis Ready Data, A |
Flexible- | modal | Face Anti-Spoofing: A Benchmark |
Flood Detection Using Multi- | modal | and Multi-Temporal Images: A Comparative Study |
Forward Diffusion Guided Reconstruction as a Multi- | modal | Multi-Task Learning Scheme |
Frame Aggregation and Multi- | modal | Fusion Framework for Video-Based Person Recognition |
Frame-Wise Cross- | modal | Matching for Video Moment Retrieval |
Framework for the Comparative Analysis of Multi- | modal | Travel Demand: Case Study on Brisbane Network, A |
Framework for Unsupervised Segmentation of Multi- | modal | Medical Images, A |
Free-Form Multi- | modal | Multimedia Retrieval (4MR) |
Frequency-Aware Multi- | modal | Fine-Tuning for Few-Shot Open-Set Remote Sensing Scene Classification |
Frequency-Relevant Residual Learning for Multi- | modal | Image Denoising |
From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross- | modal | Retrieval |
Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross- | modal | Attention Graph Model |
Fudan University: hierarchical video retrieval with adaptive multi- | modal | fusion |
FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi- | modal | Data |
Fusing Multi- | modal | Data for Supervised Change Detection |
Fusion Encoder with Multi-Task Guidance for Cross- | modal | Text-Image Retrieval in Remote Sensing, A |
Fusion of Audio and Video Information for Multi | modal | Person Authentication |
Fusion of Auxiliary Information for Multi- | modal | Biometrics Authentication |
Fusion of Infrared and Range Data: Multi- | modal | Face Images |
Fusion of multi- | modal | lumbar spine images using Kekre's hybrid wavelet transform |
Fusion, General Multi- | modal | |
Fusion-Based Approach to Enhancing Multi- | modal | Biometric Recognition System Failure Prediction and Overall Performance, A |
GAFNet: A Global Fourier Self Attention Based Novel Network for multi- | modal | downstream tasks |
GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi- | modal | Data Distributions |
Gated contextual transformer network for multi- | modal | retinal image clinical description generation |
Gauge Field Model of | modal | Completion, A |
Gaussian Distributed Graph Constrained Multi- | modal | Gaussian Process Latent Variable Model for Ordinal Labeled Data |
Gaussian Kernel-Based Cross | modal | Network for Spatio-Temporal Video Grounding |
GCN-Based Multi- | modal | Multi-Label Attribute Classification in Anime Illustration Using Domain-Specific Semantic Features |
Generalised Zero-shot Learning with Multi- | modal | Embedding Spaces |
Generalized Coupled Dictionary Learning Approach With Applications to Cross- | modal | Matching |
Generalized Multi-View Embedding for Visual Recognition and Cross- | modal | Retrieval |
Generalized Semantic Preserving Hashing for Cross- | modal | Retrieval |
Generalized Semantic Preserving Hashing for N-Label Cross- | modal | Retrieval |
Generalized Semi-supervised and Structured Subspace Learning for Cross- | modal | Retrieval |
Generalized time warping for multi- | modal | alignment of human motion |
Generalized Zero-Shot Cross- | modal | Retrieval |
Generalized Zero-Shot Learning Via Multi- | modal | Aggregated Posterior Aligning Neural Network |
Generic Attention-model Explainability for Interpreting Bi- | modal | and Encoder-Decoder Transformers |
Generic wavelet-based image decomposition and reconstruction framework for multi- | modal | data analysis in smart camera applications |
Geographic mapping with unsupervised multi- | modal | representation learning from VHR images and POIs |
Geometry Sensitive Cross- | modal | Reasoning for Composed Query Based Image Retrieval |
Geospatial Computer Vision Based on Multi- | modal | Data: How Valuable Is Shape Information for the Extraction of Semantic Information? |
Gesture Desk an Integrated Multi- | modal | Gestural Workplace for Sonification |
Global and local multi- | modal | feature mutual learning for retinal vessel segmentation |
Global-Shared Text Representation Based Multi-Stage Fusion Transformer Network for Multi- | modal | Dense Video Captioning |
GlueGen: Plug and Play Multi- | modal | Encoders for X-to-Image Generation |
Good Vibrations: A | modal | Analysis Approach for Sequential Non-rigid Structure from Motion |
Gradient Descent Approach for Multi- | modal | Biometric Identification, A |
Gradient Intensity-Based Registration of Multi- | modal | Images of the Brain |
GraDual: Graph-based Dual- | modal | Representation for Image-Text Matching |
Graph Embedding Contrastive Multi- | modal | Representation Learning for Clustering |
Graph Pattern Loss Based Diversified Attention Network For Cross- | modal | Retrieval |
GraphAlign++: An Accurate Feature Alignment by Graph Matching for Multi- | modal | 3D Object Detection |
GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi- | modal | 3D Object Detection |
GraphCFC: A Directed Graph Based Cross- | modal | Feature Complementation Approach for Multimodal Conversational Emotion Recognition |
GrowBit: Incremental Hashing for Cross- | modal | Retrieval |
GuessWhat?! Visual Object Discovery through Multi- | modal | Dialogue |
Hadamard Coded Discrete Cross | modal | Hashing |
Hallucinating Faces: Global Linear | modal | Based Super-Resolution and Position Based Residue Compensation |
Haptic Signal Reconstruction for Cross- | modal | Communications |
Hard-Negatives or Non-Negatives? A Hard-Negative Selection Strategy for Cross- | modal | Retrieval Using the Improved Marginal Ranking Loss |
Hashing Cross- | modal | Manifold for Scalable Sketch-Based 3D Model Retrieval |
HATF: Multi- | modal | Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer |
HCFN: Hierarchical cross- | modal | shared feature network for visible-infrared person re-identification |
Heri-Graphs: A Dataset Creation Framework for Multi- | modal | Machine Learning on Graphs of Heritage Values and Attributes with Social Media |
Hero: A Multi- | modal | Approach on Mobile Devices for Visual-aware Conversational Assistance in Industrial Domains |
Hetero-Manifold Regularisation for Cross- | modal | Hashing |
Heterogeneous Community Question Answering via Social-Aware Multi- | modal | Co-Attention Convolutional Matching |
Heterogeneous Feature Alignment and Fusion in Cross- | modal | Augmented Space for Composed Image Retrieval |
Heterogeneous Feature Selection With Multi- | modal | Deep Neural Networks and Sparse Group LASSO |
Heterogeneous image feature integration via multi- | modal | spectral clustering |
Heterogeneous Image Features Integration via Multi- | modal | Semi-supervised Learning Model |
Heterogeneous Interactive Learning Network for Unsupervised Cross- | modal | Retrieval |
Hi-Net: Hybrid-Fusion Network for Multi- | modal | MR Image Synthesis |
Hidden Gems: 4D Radar Scene Flow Learning Using Cross- | modal | Supervision |
Hidden Markov model-based multi- | modal | image fusion with efficient training |
Hierarchical Consensus Hashing for Cross- | modal | Retrieval |
Hierarchical Cross- | modal | Talking Face Generation With Dynamic Pixel-Wise Loss |
hierarchical framework for | modal | correspondence matching, A |
Hierarchical Latent Structure for Multi- | modal | Vehicle Trajectory Forecasting |
Hierarchical Multi- | modal | Attention Network for Time-Sync Comment Video Recommendation |
hierarchical multi- | modal | cross-attention model for face anti-spoofing, A |
Hierarchical Multi- | modal | Image Registration by Learning Common Feature Representations |
Hierarchical Semantic Structure Preserving Hashing for Cross- | modal | Retrieval |
High-Dimensional Sparse Hashing Framework for Cross- | modal | Retrieval, A |
High-Fidelity Generalized Emotional Talking Face Generation with Multi- | modal | Emotion Space Learning |
High-Resolution Depth Maps Imaging via Attention-Based Hierarchical Multi- | modal | Fusion |
High-Resolution Fast-Rotating Sound Localization Based on | modal | Composition Beamforming and Bayesian Inversion |
Histogram of the orientation of the weighted phase descriptor for multi- | modal | remote sensing image matching |
HM-ViT: Hetero- | modal | Vehicle-to-Vehicle Cooperative Perception with Vision Transformer |
Holistic Multi- | modal | Memory Network for Movie Question Answering |
Homogeneous Multi- | modal | Feature Fusion and Interaction for 3D Object Detection |
How much do cross- | modal | related semantics benefit image captioning by weighting attributes and re-ranking sentences? |
How to Cache Important Contents for Multi- | modal | Service in Dynamic Networks: A DRL-Based Caching Scheme |
How to Read Paintings: Semantic Art Understanding with Multi- | modal | Retrieval |
Human Shape from Silhouettes Using Generative HKS Descriptors and Cross- | modal | Neural Networks |
HuMMan: Multi- | modal | 4D Human Dataset for Versatile Sensing and Modeling |
Hybrid 3D feature description and matching for multi- | modal | data registration |
Hybrid Approach for Detecting Prerequisite Relations in Multi- | modal | Food Recipes, A |
Hybrid Contrastive Learning of Tri- | modal | Representation for Multimodal Sentiment Analysis |
hybrid framework for event detection using multi- | modal | features, A |
Hybrid MPI-MRI System for Dual- | modal | In Situ Cardiovascular Assessments of Real-Time 3D Blood Flow Quantification: A Pre-Clinical In Vivo Feasibility Investigation |
Hybrid Multi- | modal | Fusion for Human Action Recognition |
hybrid multi- | modal | visual data cross fusion network for indoor and outdoor scene segmentation, A |
HyperDense-Net: A Hyper-Densely Connected CNN for Multi- | modal | Image Segmentation |
Hypergraph-Based Multi- | modal | Representation for Open-Set 3D Object Retrieval |
Hyperspectral Image Denoising via Framelet Transformation Based Three- | modal | Tensor Nuclear Norm |
I spy with my little eye: Learning optimal filters for cross- | modal | stereo under projected patterns |
ICCL: Self-Supervised Intra- and Cross- | modal | Contrastive Learning with 2D-3D Pairs for 3D Scene Understanding |
iCmSC: Incomplete Cross- | modal | Subspace Clustering |
ICPR 2022 Challenge on Multi- | modal | Subtitle Recognition |
Identity-Guided Face Generation with Multi- | modal | Contour Conditions |
Im2Text and Text2Im: Associating Images and Texts for Cross- | modal | Retrieval |
Image description through fusion based recurrent multi- | modal | learning |
Image understanding via learning weakly-supervised cross- | modal | semantic translation |
Image-Text Retrieval With Cross- | modal | Semantic Importance Consistency |
Image-to-Video Person Re-Identification Using Three-Dimensional Semantic Appearance Alignment and Cross- | modal | Interactive Learning |
Image-to-video person re-identification with cross- | modal | embeddings |
Image2Reverb: Cross- | modal | Reverb Impulse Response Synthesis |
Imageability-Based Multi- | modal | Analysis of Urban Environments for Architects and Artists |
Imbalance knowledge-driven multi- | modal | network for land-cover semantic segmentation using aerial images and LiDAR point clouds |
IMCN: Identifying | modal | Contribution Network for Multimodal Sentiment Analysis |
Immersive System with Multi- | modal | Human-Computer Interaction, An |
Impedance-Optical Dual- | modal | Cell Culture Imaging With Learning-Based Information Fusion |
Implicit Attention-Based Cross- | modal | Collaborative Learning for Action Recognition |
Improved Point Proximity Matrix for | modal | Matching, An |
Improved Symmetric-SIFT for Multi- | modal | Image Registration |
Improvement of | modal | Matching Image Objects in Dynamic Pedobarography using Optimization Techniques |
Improving Cross- | modal | Constraints: Text Attribute Person Search With Graph Attention Networks |
Improving Cross- | modal | Image-Text Retrieval With Teacher-Student Learning |
Improving Cross- | modal | Retrieval with Set of Diverse Embeddings |
Improving Referring Expression Grounding With Cross- | modal | Attention-Guided Erasing |
Improving Single- | modal | Neuroimaging Based Diagnosis of Brain Disorders via Boosted Privileged Information Learning Framework |
Improving Supervised Cross- | modal | Retrieval with Semantic Graph Embedding |
Improving Text-image Matching with Adversarial Learning and Circle Loss for Multi- | modal | Steganography |
Improving the Kinect by Cross- | modal | Stereo |
Improving the Performance of Brain-Computer Interface Using Multi- | modal | Neuroimaging |
Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross- | modal | information retrieval |
Improving Zero-shot Generalization and Robustness of Multi- | modal | Models |
IMRAM: Iterative Matching With Recurrent Attention Memory for Cross- | modal | Image-Text Retrieval |
Incremental approach for multi- | modal | face expression recognition system using deep neural networks |
Indescribable Multi- | modal | Spatial Evaluator |
Inelastic Deformation Invariant | modal | Representation for Non-rigid 3D Object Recognition |
Influence of Colour and Feature Geometry on Multi- | modal | 3D Point Clouds Data Registration |
Information Symmetry Matters: A | modal | -Alternating Propagation Network for Few-Shot Learning |
InfraParis: A multi- | modal | and multi-task autonomous driving dataset |
Infrared and Visible Cross- | modal | Image Retrieval Through Shared Features |
Infrared-visible cross- | modal | person re-identification via dual-attention collaborative learning |
Integrated Multi- | modal | Antenna With Coupled Radiating Structures (I-MARS) for 7T pTx Body MRI |
integrated multi- | modal | sensor network for video surveillance, An |
Integrating Adversarial Generative Network with Variational Autoencoders towards Cross- | modal | Alignment for Zero-Shot Remote Sensing Image Scene Classification |
Integrating Deep and Shallow Models for Multi- | modal | Depression Analysis: Hybrid Architectures |
Integrating information theory and adversarial learning for cross- | modal | retrieval |
Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross- | modal | Retrieval |
Integrating multi- | modal | content analysis and hyperbolic visualization for large-scale news video retrieval and exploration |
Integration and Analysis of Multi- | modal | Geospatial Secondary Data to Inform Management of at-Risk Archaeological Sites |
Intelligent Terminal Based Privacy-Preserving Multi- | modal | Implicit Authentication Protocol for Internet of Connected Vehicles, An |
Inter- and Intra- | modal | Deformable Registration: Continuous Deformations Meet Efficient Optimal Linear Programming |
Inter-Intra | modal | Representation Augmentation With DCT-Transformer Adversarial Network for Image-Text Matching |
Inter- | modal | Masked Autoencoder for Self-Supervised Learning on Point Clouds |
Inter- | modal | ity Fusion Based Attention for Zero-Shot Cross-Modal Retrieval |
Interact before Align: Leveraging Cross- | modal | Knowledge for Domain Adaptive Action Recognition |
Interactive Video Search Platform for Multi- | modal | Retrieval with Advanced Concepts, An |
Interclass-Relativity-Adaptive Metric Learning for Cross- | modal | Matching and Beyond |
Interest Level Estimation via Multi- | modal | Gaussian Process Latent Variable Factorization |
Interference Mitigation Method for Millimeter-Wave Frequency-Modulation Continuous-Wave Radar Based on Outlier Detection and Variational | modal | Decomposition |
Interpretable Multi- | modal | Stacking-Based Ensemble Learning Method for Real Estate Appraisal |
Intra- | modal | Constraint Loss for Image-Text Retrieval |
Intraoperative Glioma Grading Using Neural Architecture Search and Multi- | modal | Imaging |
Inversion of Leaf Area Index in Citrus Trees Based on Multi- | modal | Data Fusion from UAV Platform |
iterative adaptive multi- | modal | stereo-vision method using mutual information, An |
Jacobi circle and annular polynomials: | modal | wavefront reconstruction from wavefront gradient |
Joint and individual matrix factorization hashing for large-scale cross- | modal | retrieval |
Joint audio-visual bi- | modal | codewords for video event detection |
Joint Cross- | modal | and Unimodal Features for RGB-D Salient Object Detection |
Joint Feature Selection and Graph Regularization for | modal | ity-Dependent Cross-Modal Retrieval |
Joint Feature Selection and Subspace Learning for Cross- | modal | Retrieval |
Joint Feature Synthesis and Embedding: Adversarial Cross- | modal | Retrieval Revisited |
Joint graph regularized dictionary learning and sparse ranking for multi- | modal | multi-shot person re-identification |
Joint Multi- | modal | Longitudinal Regression and Classification for Alzheimer's Disease Prediction |
Joint optimisation convex-negative matrix factorisation for multi- | modal | image collection summarisation based on images and tags |
Joint Representation Learning and Novel Category Discovery on Single- and Multi- | modal | Data |
Joint Semantic Preserving Sparse Hashing for Cross- | modal | Retrieval |
Joint Specifics and Consistency Hash Learning for Large-Scale Cross- | modal | Retrieval |
Joint Vessel Segmentation and Deformable Registration on Multi- | modal | Retinal Images Based on Style Transfer |
Joint- | modal | Label Denoising for Weakly-Supervised Audio-Visual Video Parsing |
JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi- | modal | Image Alignment of Large-scale Pathological CT Scans |
Juggling with representations: On the information transfer between imagery, point clouds, and meshes for multi- | modal | semantics |
Kernel Cross- | modal | Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition |
Kernelized Fuzzy | modal | Variation for Local Change Detection From Video Scenes |
Knowledge As Priors: Cross- | modal | Knowledge Generalization for Datasets Without Superior Knowledge |
Knowledge-Based Topic Model for Multi- | modal | Social Event Analysis |
Label consistent matrix factorization based hashing for cross- | modal | retrieval |
Label Consistent Matrix Factorization Hashing for Large-Scale Cross- | modal | Similarity Search |
Label Guided Discrete Hashing for Cross- | modal | Retrieval |
Label Prediction Framework For Semi-Supervised Cross- | modal | Retrieval |
Landmark Classification With Hierarchical Multi- | modal | Exemplar Feature |
Language-Guided Global Image Editing via Cross- | modal | Cyclic Mechanism |
Language-guided Multi- | modal | Fusion for Video Action Recognition |
Language-Guided Navigation via Cross- | modal | Grounding and Alternate Adversarial Learning |
LaPred: Lane-Aware Prediction of Multi- | modal | Future Trajectories of Dynamic Agents |
Large Margin Multi- | modal | Multi-Task Feature Extraction for Image Classification |
Large Margin Multi- | modal | Triplet Metric Learning |
Large-Margin Multi- | modal | Deep Learning for RGB-D Object Recognition |
Large-scale Few-shot Learning via Multi- | modal | Knowledge Discovery |
Large-Scale Outdoor Multi- | modal | Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction, A |
Large-Scale Urban Multiple- | modal | Transport Evacuation Model for Mass Gathering Events Considering Pedestrian and Public Transit System |
Laser-based multi-user multi- | modal | 3D displays |
LAT: Local area transform for cross | modal | correspondence matching |
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross- | modal | Retrieval |
LCNME: Label Correction Using Network Prediction Based on Memorization Effects for Cross- | modal | Retrieval With Noisy Labels |
Leaky Gated Cross-Attention for Weakly Supervised Multi- | modal | Temporal Action Localization |
Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi- | modal | Fusion Architecture Search |
Learnable PINs: Cross- | modal | Embeddings for Person Identity |
Learning a cross- | modal | hashing network for multimedia search |
Learning Aligned Cross- | modal | Representations from Weakly Aligned Data |
Learning an Augmented RGB Representation with Cross- | modal | Knowledge Distillation for Action Detection |
Learning by Aligning: Visible-Infrared Person Re-identification using Cross- | modal | Correspondences |
Learning Confidence Measures by Multi- | modal | Convolutional Neural Networks |
Learning Consistent Feature Representation for Cross- | modal | Multimedia Retrieval |
Learning Coupled Feature Spaces for Cross- | modal | Matching |
Learning Cross- | modal | Affinity for Referring Video Object Segmentation Targeting Limited Samples |
Learning Cross- | modal | Contrastive Features for Video Domain Adaptation |
Learning cross- | modal | correlations by exploring inter-word semantics and stacked co-attention |
Learning Cross- | modal | Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch |
Learning Cross- | modal | Deep Representations for Robust Pedestrian Detection |
Learning Cross- | modal | Embeddings for Cooking Recipes and Food Images |
Learning Cross- | modal | Embeddings With Adversarial Networks for Cooking Recipes and Food Images |
Learning Cross- | modal | Representations for Language-Based Image Manipulation |
Learning Cross- | modal | Retrieval with Noisy Labels |
Learning Cross- | modal | ity Representations From Multi-Modal Images |
Learning Deep Cross- | modal | Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification |
Learning Discriminative Binary Codes for Large-scale Cross- | modal | Retrieval |
Learning Discriminative Motion Feature for Enhancing Multi- | modal | Action Recognition |
Learning From Noisy Correspondence With Tri-Partition for Cross- | modal | Matching |
Learning from the Master: Distilling Cross- | modal | Advanced Knowledge for Lip Reading |
Learning Hierarchical Cross- | modal | Association for Co-Speech Gesture Generation |
Learning Instance-Level Representation for Large-Scale Multi- | modal | Pretraining in E-Commerce |
Learning | modal | -Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification |
Learning | modal | -Invariant Angular Metric by Cyclic Projection Network for VIS-NIR Person Re-Identification |
Learning Multi- | modal | Class-Specific Tokens for Weakly Supervised Dense Object Localization |
Learning multi- | modal | densities on Discriminative Temporal Interaction Manifold for group activity recognition |
Learning Multi- | modal | Features for Dense Matching-based Confidence Estimation |
Learning Multi- | modal | Nonlinear Embeddings: Performance Bounds and an Algorithm |
Learning Mutual Modulation for Self-supervised Cross- | modal | Super-Resolution |
Learning of Linear Video Prediction Models In A Multi- | modal | Framework for Anomaly Detection |
Learning Pedestrian Group Representations for Multi- | modal | Trajectory Prediction |
Learning SAR-Optical Cross | modal | Features for Land Cover Classification |
Learning similarity measure for multi- | modal | 3D image registration |
Learning Strategy for Amazon Deforestation Estimations Using Multi- | modal | Satellite Imagery, A |
Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross- | modal | Search |
Learning to Generate Object Segment Proposals with Multi- | modal | Cues |
Learning Unified Hyper-Network for Multi- | modal | MR Image Synthesis and Tumor Segmentation With Missing Modalities |
Learning unified sparse representations for multi- | modal | data |
Learning Visually Aligned Semantic Graph for Cross- | modal | Manifold Matching |
Less is Better: Exponential Loss for Cross- | modal | Matching |
Leveraging Multi- | modal | Analyses and Online Knowledge Base for Video Aboutness Generation |
Leveraging multi- | modal | fusion for graph-based image annotation |
LiDAR Observations of Multi- | modal | Swash Probability Distributions on a Dissipative Beach |
Linear Subspace Ranking Hashing for Cross- | modal | Retrieval |
Linking text and visual concepts semantically for cross | modal | multimedia search |
LIP-Loc: LiDAR Image Pretraining for Cross- | modal | Localization |
Lite-MDETR: A Lightweight Multi- | modal | Detector |
Liver Tumor Detection Via A Multi-Scale Intermediate Multi- | modal | Fusion Network on MRI Images |
Local circular patterns for multi- | modal | facial gender and ethnicity classification |
Local Intensity Mapping for Hierarchical Non-rigid Registration of Multi- | modal | Images Using the Cross-Correlation Coefficient |
Local multi- | modal | image matching based on self-similarity |
Local-to-Global Approach to Multi- | modal | Movie Scene Segmentation, A |
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- | modal | Fusion |
Look&listen: Multi- | modal | Correlation Learning for Active Speaker Detection and Speech Enhancement |
Look, Imagine and Match: Improving Textual-Visual Cross- | modal | Retrieval with Generative Models |
Looking into Your Speech: Learning Cross- | modal | Affinity for Audio-visual Speech Separation |
Low-light pedestrian detection from RGB images using multi- | modal | knowledge distillation |
Low-Rank and Joint Sparse Representations for Multi- | modal | Recognition |
Low-Rank Tensor Bayesian Filter Framework For Multi- | modal | Analysis, A |
Low-Visibility Vehicle-Road Environment Perception Based on the Multi- | modal | Visual Features Fusion of Polarization and Infrared |
M2FNet: Multi- | modal | Fusion Network for Emotion Recognition in Conversation |
M2RNet: Multi- | modal | and multi-scale refined network for RGB-D salient object detection |
M2SIR: A multi | modal | sequential importance resampling algorithm for particle filters |
M33D: Learning 3D priors using Multi- | modal | Masked Autoencoders for 2D image and video understanding |
M3ANet: Multi- | modal | and Multi-Attention Fusion Network for Ship License Plate Recognition |
M3F: Multi- | modal | Continuous Valence-Arousal Estimation in the Wild |
M3L: Language-based Video Editing via Multi- | modal | Multi-Level Transformers |
M3LH: Multi- | modal | Multi-label Hashing for Large Scale Data Search |
M3TTS: Multi- | modal | text-to-speech of multi-scale style control for dubbing |
M5L: Multi- | modal | Multi-Margin Metric Learning for RGBT Tracking |
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi- | modal | Pretraining |
MAAS: Multi- | modal | Assignation for Active Speaker Detection |
Malignant melanoma dermoscopy image classification method based on multi- | modal | medical features |
MALip: | modal | Amplification Lipreading based on reconstructed audio features |
Manifold Learning for Multi- | modal | Image Registration |
MaPLe: Multi- | modal | Prompt Learning |
MAPNet: Multi- | modal | attentive pooling network for RGB-D indoor scene classification |
Mapping Multi- | modal | Brain Connectome for Brain Disorder Diagnosis via Cross-Modal Mutual Learning |
Mapping Multi- | modal | Brain Connectome for Brain Disorder Diagnosis via Cross-Modal Mutual Learning |
Margin-constrained multiple kernel learning based multi- | modal | fusion for affect recognition |
Mask Cross- | modal | Hashing Networks |
Masking | modal | ities for Cross-modal Video Retrieval |
MATNet: Exploiting Multi- | modal | Features for Radiology Report Generation |
MCEN: Bridging Cross- | modal | Gap between Cooking Recipes and Dish Images with Latent Variable Model |
MDETR: Modulated Detection for End-to-End Multi- | modal | Understanding |
MEP-3M: A large-scale multi- | modal | E-commerce product dataset |
Method and system for multi- | modal | component-based tracking of an object using robust information fusion |
MFFNet: Multi- | modal | Feature Fusion Network for V-D-T Salient Object Detection |
MFST: Multi- | modal | Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion |
MIA-Net: Multi- | modal | Interactive Attention Network for Multi-Modal Affective Analysis |
MIA-Net: Multi- | modal | Interactive Attention Network for Multi-Modal Affective Analysis |
Misalignment-Robust Joint Filter for Cross- | modal | Image Pairs |
Missing | modal | ity Robustness in Semi-Supervised Multi-Modal Semantic Segmentation |
Missing MRI Pulse Sequence Synthesis Using Multi- | modal | Generative Adversarial Network |
MIST: Multi- | modal | Iterative Spatial-Temporal Transformer for Long-form Video Question Answering |
Mix and Match Networks: Cross- | modal | Alignment for Zero-Pair Image-to-Image Translation |
MixReorg: Cross- | modal | Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation |
MM-Diffusion: Learning Multi- | modal | Diffusion Models for Joint Audio and Video Generation |
MM-TTA: Multi- | modal | Test-Time Adaptation for 3D Semantic Segmentation |
MM-ViT: Multi- | modal | Video Transformer for Compressed Video Action Recognition |
MMAct: A Large-Scale Dataset for Cross | modal | Human Action Understanding |
MMCAN: Multi- | modal | Cross-Attention Network for Free-Space Detection with Uncalibrated Hyperspectral Sensors |
MMEC: Multi- | modal | Ensemble Classifier for Protein Secondary Structure Prediction |
MMFC: Multi- | modal | Fusion Cascade Framework for Covid-19 Disease Course Classification |
MMG-Ego4D: Multi- | modal | Generalization in Egocentric Action Recognition |
MMM-GCN: Multi-Level Multi- | modal | Graph Convolution Network for Video-Based Person Identification |
MMSMCNet: | modal | Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation |
MMSNet: Multi- | modal | scene recognition using multi-scale encoded features |
MMSS: Graph-based Multi- | modal | Story-oriented Video Summarization and Retrieval |
MMSS: Multi- | modal | Sharable and Specific Feature Learning for RGB-D Object Recognition |
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi- | modal | Spatial-Temporal Vision Transformer |
MMTF: Multi- | modal | Temporal Fusion for Commonsense Video Question Answering |
MMTFN: Multi- | modal | multi-scale transformer fusion network for Alzheimer's disease diagnosis |
| modal | Activity-Based Stochastic Model for Estimating Vehicle Trajectories from Sparse Mobile Sensor Data |
| modal | Approach to Feature-Based Correspondence |
| modal | Control of an Attentive Vision System |
| modal | features for image texture classification |
| modal | Framework for Correspondence and Description, A |
| modal | Matching for Correspondence and Recognition |
| modal | Matching: A Method for Describing, Comparing, and Manipulating Digital Signals |
| modal | Parameters Identification of Bridge Structures from GNSS Data Using the Improved Empirical Wavelet Transform |
| modal | Regression-Based Atomic Representation for Robust Face Recognition and Reconstruction |
| modal | represenations |
| modal | Space: A Physics-Based Model for Sequential Estimation of Time-Varying Shape from Monocular Video |
| modal | Symbolic Classifier for selecting time series models, A |
| modal | -Adaptive Gated Recoding Network for RGB-D Salient Object Detection |
| modal | -based phase retrieval for adaptive optics |
| modal | -based phase retrieval using Gaussian radial basis functions |
| modal | -based tomographic imaging from far-zone observations |
| modal | ity and Event Adversarial Networks for Multi-Modal Fake News Detection |
| modal | ity Compensation Network: Cross-Modal Adaptation for Action Recognition |
| modal | ity Mixer for Multi-modal Action Recognition |
| modal | ity Shifting Attention Network for Multi-Modal Video Question Answering |
| modal | ity-Aware OOD Suppression Using Feature Discrepancy for Multi-Modal Emotion Recognition |
| modal | ity-specific and shared generative adversarial network for cross-modal retrieval |
| modal | ity-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network |
ModDrop: Adaptive Multi- | modal | Gesture Recognition |
Mode-shape interpretation: Re-thinking | modal | space for recovering deformable shapes |
Model May Fit You: User-Generalized Cross- | modal | Retrieval, The |
Modeling correlation between multi- | modal | continuous words for pLSA-based video classification |
Modeling Cross- | modal | Interaction in a Multi-detector, Multi-modal Tracking Framework |
Modeling Cross- | modal | Interaction in a Multi-detector, Multi-modal Tracking Framework |
Modeling Motion with Multi- | modal | Features for Text-Based Video Segmentation |
Modeling Thermal Sequence Signal Decreasing for Dual | modal | Password Breaking |
Modout: Learning Multi- | modal | Architectures by Stochastic Regularization |
MoDuL: Deep | modal | and Dual Landmark-wise Gated Network for Facial Expression Recognition |
Modular Framework for 2D/3D and Multi- | modal | Segmentation with Joint Super-Resolution, A |
Moment Retrieval via Cross- | modal | Interaction Networks With Query Reconstruction |
Monocular Real-Time Hand Shape and Motion Capture Using Multi- | modal | Data |
MoQA: A Multi- | modal | Question Answering Architecture |
More Than Just Attention: Improving Cross- | modal | Attentions with Contrastive Constraints for Image-Text Matching |
Moving Humans Detection Based on Multi- | modal | Sensor Fusion |
MRA-Net: Improving VQA Via Multi- | modal | Relation Attention Network |
MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross- | modal | retrieval |
MSeg3D: Multi- | modal | 3D Semantic Segmentation for Autonomous Driving |
MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross- | modal | Retrieval |
MTMSN: Multi-Task and Multi- | modal | Sequence Network for Facial Action Unit and Expression Recognition |
Multi | modal | semantic indexing for image retrieval |
Multi-Agent Advanced Traveler Information System for Optimal Trip Planning in a Co- | modal | Framework, A |
Multi-channel sub-Nyquist cross-spectral estimation for | modal | analysis of vibrating structures |
Multi-Class Joint Subspace Learning for Cross- | modal | Retrieval |
Multi-Domain and Multi- | modal | Representation Disentangler for Cross-Domain Image Manipulation and Classification, A |
Multi-Evidence and Multi- | modal | Fusion Network for Ground-Based Cloud Recognition |
Multi-Facet Weighted Asymmetric Multi- | modal | Hashing Based on Latent Semantic Distribution |
Multi-Granularity Contrastive Cross- | modal | Collaborative Generation for End-to-End Long-Term Video Question Answering |
Multi-Hop Interactive Cross- | modal | Retrieval |
Multi-Kernel Supervised Hashing with Graph Regularization for Cross- | modal | Retrieval |
Multi-Label Adversarial Fine-Grained Cross- | modal | Retrieval |
Multi-label Cross- | modal | Retrieval |
Multi-label semantics preserving based deep cross- | modal | hashing |
Multi-level adversarial attention cross- | modal | hashing |
Multi-level context extraction and attention-based contextual inter- | modal | fusion for multimodal sentiment analysis and emotion classification |
Multi-Level Correlation Adversarial Hashing for Cross- | modal | Retrieval |
Multi-lingual and Multi- | modal | Speech Processing and Applications |
Multi-Manifold Deep Discriminative Cross- | modal | Hashing for Medical Image Retrieval |
Multi- | modal | (2-D and 3-D) face modeling and recognition using Attributed Relational Graph |
Multi- | modal | 2D + 3D Face Recognition Method with a Novel Local Feature Descriptor, A |
Multi- | modal | 2D and 3D biometrics for face recognition |
Multi- | modal | 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving |
Multi- | modal | 3D Object Detection in Autonomous Driving: A Survey |
Multi- | modal | 3d reconstruction and measurements of zebrafish larvae and its organs using axial-view microscopy |
Multi- | modal | abnormality detection in video with unknown data segmentation |
Multi- | modal | activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia |
Multi- | modal | Aerial View Image Challenge: Translation from Synthetic Aperture Radar to Electro-Optical Domain Results - PBVS 2023 |
Multi- | modal | Aerial View Object Classification Challenge Results - PBVS 2023 |
Multi- | modal | Aerial View Object Classification Challenge Results: PBVS 2022 |
Multi- | modal | Alignment using Representation Codebook |
Multi- | modal | and Cross-Modal for Lecture Videos Retrieval |
Multi- | modal | and Cross-Modal for Lecture Videos Retrieval |
Multi- | modal | and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis |
Multi- | modal | and multi-layout discriminative learning for placental maturity staging |
Multi- | modal | and Multi-spectral Registration for Natural Images |
Multi- | modal | Association based Grouping for Form Structure Extraction |
Multi- | modal | Attention System for Smart Environments, A |
multi- | modal | automatic image registration technique based on complex wavelets, A |
Multi- | modal | Background Model Initialization |
Multi- | modal | background subtraction using Gaussian Mixture Models |
Multi- | modal | Bifurcated Network for Depth Guided Image Relighting |
Multi- | modal | big-data management for film production |
Multi- | modal | Biometrics Involving the Human Ear |
Multi- | modal | Biometrics Pixel Level Fusion and KPCA-RBF Feature Classification for Single Sample Recognition Problem |
Multi- | modal | brain tumor image segmentation based on SDAE |
Multi- | modal | Brain Tumor Segmentation Using Cascaded 3D U-Net |
Multi- | modal | brain tumor segmentation via disentangled representation learning and region-aware contrastive learning |
Multi- | modal | Characteristic Guided Depth Completion Network |
Multi- | modal | classification of Alzheimer's disease using nonlinear graph fusion |
Multi- | modal | Clique-Graph Matching for View-Based 3D Model Retrieval |
Multi- | modal | Clustering for Multimedia Collections |
Multi- | modal | Co-Learning for Liver Lesion Segmentation on PET-CT Images |
Multi- | modal | Combined Route Choice Modeling in the MaaS Age Considering Generalized Path Overlapping Problem |
Multi- | modal | Complete Breast Segmentation |
Multi- | modal | Computer Vision for the Detection of Multi-scale Crowd Physical Motions and Behavior in Confined Spaces |
Multi- | modal | Context Propagation for Person Re-Identification With Wireless Positioning |
Multi- | modal | Contextual Graph Neural Network for Text Visual Question Answering |
Multi- | modal | Continual Test-Time Adaptation for 3D Semantic Segmentation |
Multi- | modal | Convolutional Dictionary Learning |
Multi- | modal | Correlated Network with Emotional Reasoning Knowledge for Social Intelligence Question-Answering |
Multi- | modal | Cross-Domain Alignment Network for Video Moment Retrieval |
Multi- | modal | Crowd Counting |
Multi- | modal | Curriculum Learning for Semi-Supervised Image Classification |
Multi- | modal | Cycle-Consistent Generalized Zero-Shot Learning |
Multi- | modal | data fusion for pain intensity assessment and classification |
Multi- | modal | Data Fusion for Person Authentication Using SVM |
Multi- | modal | data fusion using group-structured sparse canonical correlation analysis: A simulation study |
Multi- | modal | Deep Analysis for Multimedia |
Multi- | modal | Deep Clustering: Unsupervised Partitioning of Images |
Multi- | modal | deep feature learning for RGB-D object detection |
Multi- | modal | Deep Guided Filtering for Comprehensible Medical Image Processing |
Multi- | modal | deep learning for landform recognition |
Multi- | modal | Deep Learning: Challenges and Applications |
Multi- | modal | defect detection of residual oxide scale on a cold stainless steel strip |
Multi- | modal | Dense Video Captioning |
Multi- | modal | Depression Estimation Based on Sub-attentional Fusion |
Multi- | modal | Design of an Intelligent Transportation System |
Multi- | modal | Detection Fusion on a Mobile UGV for Wide-Area, Long-Range Surveillance |
Multi- | modal | Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing |
Multi- | modal | Dictionary Learning for Image Separation With Application in Art Investigation |
Multi- | modal | Distance Metric Learning: A Bayesian Non-parametric Approach |
Multi- | modal | Domain Adaptation for Fine-Grained Action Recognition |
Multi- | modal | Dynamic Graph Transformer for Visual Grounding |
Multi- | modal | ear and face modeling and recognition |
Multi- | modal | Embedding for Main Product Detection in Fashion |
Multi- | modal | emotion analysis from facial expressions and electroencephalogram |
Multi- | modal | Emotion Reaction Intensity Estimation with Temporal Augmentation |
Multi- | modal | Emotion Recognition Using Canonical Correlations and Acoustic Features |
Multi- | modal | Emotion, Multimodal Emotion Recognition |
Multi- | modal | Event Topic Model for Social Event Analysis |
Multi- | modal | Expression Recognition in the Wild Using Sequence Modeling |
Multi- | modal | Extreme Classification |
Multi- | modal | Face Anti-Spoofing Based on Central Difference Networks |
Multi- | modal | Face Image Super-Resolutions in Tensor Space |
Multi- | modal | Face Recognition by Means of Augmented Normal Map and PCA |
Multi- | modal | Face Tracking in Multi-camera Environments |
Multi- | modal | face tracking using bayesian network |
Multi- | modal | Facial Affective Analysis based on Masked Autoencoder |
Multi- | modal | Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering |
Multi- | modal | feature fusion for geographic image annotation |
multi- | modal | feature fusion framework for kinect-based facial expression recognition using Dual Kernel Discriminant Analysis (DKDA), A |
Multi- | modal | Feature Fusion Network for Ghost Imaging Object Detection |
Multi- | modal | Feature Fusion Network with Adaptive Center Point Detector for Building Instance Extraction |
Multi- | modal | Feature Pyramid Transformer for RGB-Infrared Object Detection |
Multi- | modal | Fusion based on classifiers using reject options and Markov Fusion Networks |
Multi- | modal | Fusion for End-to-End RGB-T Tracking |
Multi- | modal | fusion method for human action recognition based on IALC |
Multi- | modal | Fusion Network for Rumor Detection with Texts and Images |
Multi- | modal | Fusion Network Guided by Feature Co-Occurrence for Urban Region Function Recognition, A |
Multi- | modal | fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection |
Multi- | modal | fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection |
Multi- | modal | Fusion Transformer for End-to-End Autonomous Driving |
Multi- | modal | Fusion With Observation Points For Skeleton Action Recognition |
Multi- | modal | Gait Recognition via Effective Spatial-Temporal Feature Fusion |
multi- | modal | garden dataset and hybrid 3D dense reconstruction framework based on panoramic stereo images for a trimming robot, A |
Multi- | modal | Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion |
Multi- | modal | Gaze Following in Conversational Scenarios |
Multi- | modal | Gesture Recognition Using Skeletal Joints and Motion Trail Model |
Multi- | modal | Gesture Recognition, Multimodal Recognition |
Multi- | modal | Graph Learning for Disease Prediction |
Multi- | modal | Graph Neural Network for Joint Reasoning on Vision and Scene Text |
Multi- | modal | Graphical Model for Robust Recognition of Group Actions in Meetings from Disturbed Videos, A |
Multi- | modal | Graphical Model for Scene Analysis, A |
Multi- | modal | gray-level histogram modeling and decomposition |
Multi- | modal | Hierarchical Attention-Based Dense Video Captioning |
Multi- | modal | human aggression detection |
Multi- | modal | Human Authentication Using Silhouettes, Gait and RGB |
Multi- | modal | Human Identification System |
Multi- | modal | Human pose estimation based on probability distribution perception on a depth convolution neural network |
Multi- | modal | Human Verification Using Face and Speech |
Multi- | modal | Identification of State-Sponsored Propaganda on Social Media |
Multi- | modal | Identity Verification Based on Improved BP Neural Network |
Multi- | modal | image fusion based on saliency guided in NSCT domain |
Multi- | modal | Image Fusion Based on Weight Local Features and Novel Sum-modified-laplacian in Non-subsampled Shearlet Transform Domain |
Multi- | modal | Image Fusion via Deep Laplacian Pyramid Hybrid Network |
Multi- | modal | Image Re-ranking with Autoencoders and Click Semantics |
Multi- | modal | Image Registration Based on Gradient Orientations of Minimal Uncertainty |
Multi- | modal | Image Registration Based on Phase Exponent Differences of the Gaussian Pyramid |
Multi- | modal | image registration by minimizing Kullback-Leibler distance between expected and observed joint class histograms |
Multi- | modal | Image Registration by Quantitative-Qualitative Measure of Mutual Information (Q-MI) |
Multi- | modal | Image Registration Using Dirichlet-Encoded Prior Information |
Multi- | modal | image registration using fuzzy kernel regression |
Multi- | modal | image registration using line features and mutual information |
Multi- | modal | image registration using local frequency representation and computer-aided design (CAD) models |
Multi- | modal | Image Registration Using Polynomial Expansion and Mutual Information |
Multi- | modal | Imaging Genetics Data Fusion via a Hypergraph-Based Manifold Regularization: Application to Schizophrenia Study |
Multi- | modal | Information Fusion for Action Unit Detection in the Wild |
Multi- | modal | Information Integration for Document Retrieval |
Multi- | modal | information retrieval from broadcast video using OCR and speech recognition |
Multi- | modal | Interaction Graph Convolutional Network for Temporal Language Localization in Videos |
Multi- | modal | Interactive Video Retrieval with Temporal Queries |
Multi- | modal | Interface for Road Planning Tasks Using Vision, Haptics and Sound, A |
Multi- | modal | Intra-Operative Navigation During Distal Locking of Intramedullary Nails |
Multi- | modal | Joint Clustering With Application for Unsupervised Attribute Discovery |
Multi- | modal | joint embedding for fashion product retrieval |
Multi- | modal | laughter recognition in video conversations |
Multi- | modal | Learning for AU Detection Based on Multi-Head Fused Transformers |
Multi- | modal | Learning for Predicting the Genotype of Glioma |
Multi- | modal | Learning from Unpaired Images: Application to Multi-organ Segmentation in CT and MRI |
Multi- | modal | Learning With Generalizable Nonlinear Dimensionality Reduction |
Multi- | modal | Learning with Missing Modality via Shared-Specific Feature Modelling |
Multi- | modal | Masked Pre-training for Monocular Panoramic Depth Completion |
Multi- | modal | Matching Applied to Stereo |
Multi- | modal | Mean-Fields via Cardinality-Based Clamping |
Multi- | modal | medical image fusion using 2DPCA |
Multi- | modal | Medical Image Fusion Using 3-Stage Multiscale Decomposition and PCNN with Adaptive Arguments |
Multi- | modal | Meta Multi-Task Learning for Social Media Rumor Detection |
Multi- | modal | Meta-Transfer Fusion Network for Few-Shot 3D Model Classification |
multi- | modal | method based on the competitors of FVC2004 and on palm data combined with tokenised random numbers, A |
Multi- | modal | method for locating objects in images |
Multi- | modal | metric learning for vehicle re-identification in traffic surveillance environment |
Multi- | modal | motion dictionary learning for facial expression recognition |
Multi- | modal | Multi-Action Video Recognition |
Multi- | modal | Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification |
Multi- | modal | Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery |
Multi- | modal | Multi-Scale Deep Learning for Large-Scale Image Annotation |
multi- | modal | multi-view dataset for human fall analysis and preliminary investigation on modality, A |
Multi- | modal | multiple kernel learning for accurate identification of Tourette syndrome children |
Multi- | modal | Mutual Attention and Iterative Interaction for Referring Image Segmentation |
Multi- | modal | Neural Conditional Ordinal Random Fields for agreement level estimation |
Multi- | modal | Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor |
Multi- | modal | Non-Line-of-Sight Passive Imaging |
Multi- | modal | non-rigid registration of medical images based on mutual information maximization |
Multi- | modal | nonlinear feature reduction for the recognition of handwritten numerals |
Multi- | modal | object detection and localization for high integrity driving assistance |
Multi- | modal | object detection via transformer network |
Multi- | modal | Object Tracking using Dynamic Performance Metrics |
Multi- | modal | Pain Intensity Recognition Based on the SenseEmotion Database |
Multi- | modal | Particle Filtering Tracking using Appearance, Motion and Audio Likelihoods |
Multi- | modal | Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU |
Multi- | modal | Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU |
Multi- | modal | Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU |
multi- | modal | perception based assistive robotic system for the elderly, A |
Multi- | modal | Person Identification in a Smart Environment |
Multi- | modal | Pyramid Feature Combination for Human Action Recognition |
Multi- | modal | Re-Identification, Multi-Modal Human Tracking |
Multi- | modal | Re-Identification, Multi-Modal Human Tracking |
Multi- | modal | Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval |
Multi- | modal | Recurrent Attention Networks for Facial Expression Recognition |
Multi- | modal | Reflection Removal Using Convolutional Neural Networks |
Multi- | modal | region selection approach for training object detectors |
Multi- | modal | Registration of SAR and Optical Satellite Images |
Multi- | modal | Relational Graph for Cross-Modal Video Moment Retrieval |
Multi- | modal | Relational Graph for Cross-Modal Video Moment Retrieval |
Multi- | modal | Remote Sensing Image Matching Considering Co-Occurrence Filter |
Multi- | modal | Representation Learning with Text-Driven Soft Masks |
Multi- | modal | Retinal Image Classification with Modality-Specific Attention Network |
multi- | modal | RGB-D object recognizer, A |
Multi- | modal | RGB-D Scene Recognition Across Domains |
Multi- | modal | RGB-Depth-Thermal Human Body Segmentation |
Multi- | modal | Sarcasm Detection and Humor Classification in Code-Mixed Conversations |
Multi- | modal | Scene Reconstruction using Perceptual Grouping Constraints |
Multi- | modal | Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs |
Multi- | modal | Segment Assemblage Network for Ad Video Editing with Importance-coherence Reward |
Multi- | modal | Segmental Models for On-line Handwriting Recognition |
Multi- | modal | Segmentation Using Markov Random Fields |
Multi- | modal | semantic image segmentation |
Multi- | modal | Semantic Inconsistency Detection in Social Media News Posts |
Multi- | modal | Sequence-to-sequence Model for Continuous Affect Prediction in the Wild Using Deep 3D Features |
Multi- | modal | Sequential Monte Carlo for On-Line Hierarchical Graph Structure Estimation in Model-based Scene Interpretation |
Multi- | modal | Siamese Network for Diagnostically Similar Lesion Retrieval in Prostate MRI |
Multi- | modal | Sign Language Spotting by Multi/one-shot Learning |
Multi- | modal | Solution for Unconstrained News Story Retrieval |
Multi- | modal | Sparse Coding Classifier Using Dictionaries with Different Number of Atoms, A |
Multi- | modal | spatial relational attention networks for visual question answering |
Multi- | modal | Spatio-Temporal Meteorological Forecasting with Deep Neural Network |
Multi- | modal | Spectral Image Super-Resolution |
Multi- | modal | Stacked Ensemble Model for Bipolar Disorder Classification, A |
Multi- | modal | Steganography Based on Semantic Relevancy |
Multi- | modal | Structure-Embedding Graph Transformer for Visual Commonsense Reasoning |
Multi- | modal | subspace learning with dropout regularization for cross-modal recognition and retrieval |
Multi- | modal | subspace learning with dropout regularization for cross-modal recognition and retrieval |
Multi- | modal | Subspace Learning with Joint Graph Regularization for Cross-Modal Retrieval |
Multi- | modal | Subspace Learning with Joint Graph Regularization for Cross-Modal Retrieval |
Multi- | modal | summarization of key events and top players in sports tournament videos |
Multi- | modal | surface registration for markerless initial patient setup in radiation therapy using microsoft's Kinect sensor |
Multi- | modal | system for locating heads and faces |
multi- | modal | system for the retrieval of semantic video events, A |
Multi- | modal | temporal attention models for crop mapping from satellite time series |
Multi- | modal | Temporal Convolutional Network for Anticipating Actions in Egocentric Videos |
Multi- | modal | Tensor Face for Simultaneous Super-Resolution and Recognition |
Multi- | modal | Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features |
Multi- | modal | Three-Stream Network for Action Recognition |
Multi- | modal | Topic Model for Image Annotation Using Text Analysis, A |
Multi- | modal | Tracking of Faces for Video Communications |
Multi- | modal | Tracking of Interacting Targets using Gaussian Approximations |
Multi- | modal | tracking of people using laser scanners and video camera |
Multi- | modal | tracking using texture changes |
Multi- | modal | Traffic Signal Control in Shared Space Street |
Multi- | modal | Trajectory Prediction of NBA Players |
Multi- | modal | Transformer Approach for Football Event Classification, A |
Multi- | modal | Transformer for RGB-D Salient Object Detection |
Multi- | modal | Transformer for Video Retrieval |
Multi- | modal | Transformer network for action detection, A |
Multi- | modal | Transformer With Global-Local Alignment for Composed Query Image Retrieval |
Multi- | modal | Tumor Segmentation With Deformable Aggregation and Uncertain Region Inpainting |
Multi- | modal | uniform deep learning for RGB-D person re-identification |
multi- | modal | universe of fast-fashion: the Visuelle 2.0 benchmark, The |
Multi- | modal | unsupervised domain adaptation for semantic image segmentation |
Multi- | modal | Unsupervised Feature Learning for RGB-D Scene Labeling |
Multi- | modal | user identification and object recognition surveillance system |
Multi- | modal | user interaction method based on gaze tracking and gesture recognition |
Multi- | modal | User Interactions in Controlled Environments |
Multi- | modal | Variational Faster R-CNN for Improved Visual Object Detection in Manufacturing |
Multi- | modal | Variational Graph Auto-Encoder for Recommendation Systems |
Multi- | modal | vehicle trajectory prediction based on mutual information |
Multi- | modal | Video Reasoning and Analyzing |
Multi- | modal | Video Reasoning and Analyzing Competition, The |
Multi- | modal | Video Retrieval in Virtual Reality with VITRIVR-VR |
Multi- | modal | Video Surveillance aided by Pyroelectric Infrared Sensors |
Multi- | modal | visual concept classification of images via Markov random walk over tags |
Multi- | modal | Visual Place Recognition in Dynamics-Invariant Perception Space |
Multi- | modal | , Cross-Modal Captioning, Image Captioning |
Multi- | modal | , Cross-Modal Captioning, Image Captioning |
Multi- | modal | , Discriminative and Spatially Invariant CNN for RGB-D Object Labeling, A |
multi- | modal | , multi-atlas-based approach for Alzheimer detection via machine learning, A |
Multi- | modal | , multi-resource methods for placing Flickr videos on the map |
Multi- | modal | , Multi-touch Interaction with Maps in Disaster Management Applications |
Multi- | modal | /multi-scale convolutional neural network based in-loop filter design for next generation video codec |
Multi-Objective Geoacoustic Inversion of | modal | -Dispersion and Waveform Envelope Data Based on Wasserstein Metric, A |
Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross- | modal | Retrieval |
Multi-Relational Deep Hashing for Cross- | modal | Search |
multi-resolution area-based technique for automatic multi- | modal | image registration, A |
Multi-Ridge Recognition Method Using Modified MOPSO Algorithm and its Application on | modal | Parameter Identification, A |
Multi-scale Cross- | modal | Transformer Network for RGB-D Object Detection |
Multi-scale relation reasoning for multi- | modal | Visual Question Answering |
Multi-sensory and Multi- | modal | Fusion for Sentient Computing |
Multi-source Multi- | modal | Activity Recognition in Aerial Video Surveillance |
Multi-Task Consistency-Preserving Adversarial Hashing for Cross- | modal | Retrieval |
Multi-task framework based on feature separation and reconstruction for cross- | modal | retrieval |
Multi-task hierarchical convolutional network for visual-semantic cross- | modal | retrieval |
Multi-task Learning Using Multi- | modal | Encoder-Decoder Networks with Shared Skip Connections |
Multi-Task Multi- | modal | Semantic Hashing for Web Image Retrieval with Limited Supervision |
Multi-view collective tensor decomposition for cross- | modal | hashing |
Multi-View Fusion Through Cross- | modal | Retrieval |
Multi-view Multi- | modal | Feature Embedding for Endomicroscopy Mosaic Classification |
Multi-View Variational Recurrent Neural Network for Human Emotion Recognition Using Multi- | modal | Biological Signals |
Multichannel Cross- | modal | Fusion Framework for Electron Tomography, A |
MultiMAE: Multi- | modal | Multi-task Masked Autoencoders |
Multi | modal | Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval |
Multi | modal | emotion recognition using cross modal audio-video fusion with attention and deep metric learning |
Multi | modal | Reaction: Information Modulation for Cross-Modal Representation Learning |
Multi | modal | ity-guided Image Style Transfer using Cross-modal GAN Inversion |
MultiNet: Multi- | modal | Multi-Task Learning for Autonomous Driving |
Multiple Kernel Learning Approach to Multi- | modal | Pedestrian Classification, A |
Multiple Object Tracking Via Multi-layer Multi- | modal | Framework |
Multiscale Cross- | modal | Homogeneity Enhancement and Confidence-Aware Fusion for Multispectral Pedestrian Detection |
Multiset Canonical Correlation Analysis: Texture Feature Level Fusion of Multiple Descriptors for Intra- | modal | Palmprint Biometric Recognition |
Multiview Facial Feature Tracking with a Multi- | modal | Probabilistic Model |
MURF: Mutually Reinforcing Multi- | modal | Image Registration and Fusion |
Muti- | modal | learning in photogrammetry and remote sensing |
Mutual Information for Multi- | modal | , Discontinuity-preserving Image Registration |
Mutual Quantization for Cross- | modal | Search with Noisy Labels |
NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross- | modal | Embeddings |
Natural Language-Based Vehicle Retrieval with Explicit Cross- | modal | Representation Learning |
Neighbourhood Structure Preserving Cross- | modal | Embedding for Video Hyperlinking |
Neural Registration and Segmentation of White Matter Tracts in Multi- | modal | Brain MRI |
new multi- | modal | approach to bib number/text detection and recognition in Marathon images, A |
New Multi- | modal | Dataset for Human Affect Analysis, A |
New semi-supervised classification using a multi- | modal | feature joint L21-norm based sparse representation |
new similarity measure for multi- | modal | image registration, A |
News Video Classification Based on Multi- | modal | Information Fusion |
Non-linear and selective fusion of cross- | modal | images |
Non-Rigid 3D Multi- | modal | Registration Algorithm Using Partial Volume Interpolation and the Sum of Conditional Variance, A |
Non-Rigid Multi- | modal | Image Registration Using Cross-Cumulative Residual Entropy |
Non-rigid Multi- | modal | Registration of Coronary Arteries Using SIFTflow |
Non-sequential Multi-view Detection, Localization and Identification of People Using Multi- | modal | Feature Maps |
Non-Uniform Attention Network for Multi- | modal | Sentiment Analysis |
Nonlinear Discrete Cross- | modal | Hashing for Visual-Textual Data |
Nonlinear Graph Fusion for Multi- | modal | Classification of Alzheimer's Disease |
Nonparametric Feature Matching Based Conditional Random Fields for Gesture Recognition from Multi- | modal | Video |
Nonparametric Gesture Labeling from Multi- | modal | Data |
Nonparametric Priors on the Space of Joint Intensity Distributions for Non-Rigid Multi- | modal | Image Registration |
Novel Lidar Signal-Denoising Algorithm Based on Sparrow Search Algorithm for Optimal Variational | modal | Decomposition, A |
novel multi | modal | tracking method based on depth and semantic color features for human robot interaction, A |
Novel Multi- | modal | Image Registration Method Based on Corners, A |
Novel Multi- | modal | Integration and Propagation Model for Cross-Media Information Retrieval, A |
Novel Self-Supervised Cross- | modal | Image Retrieval Method in Remote Sensing, A |
novel strategy to balance the results of cross- | modal | hashing, A |
NTIRE 2021 Multi- | modal | Aerial View Object Classification Challenge |
Numerical Modeling and | modal | Analysis of Puranapul an Ancient Arch Bridge |
Object Recognition and Categorization Using | modal | Matching |
ObjectFusion: Multi- | modal | 3D Object Detection with Object-Centric Fusion |
Offshore Oil Slick Detection: From Photo-Interpreter to Explainable Multi- | modal | Deep Learning Models Using SAR Images and Contextual Data |
OIF-Net: An Optical Flow Registration-Based PET/MR Cross- | modal | Interactive Fusion Network for Low-Count Brain PET Image Denoising |
OLCH: Online Label Consistent Hashing for streaming cross- | modal | retrieval |
OMGH: Online Manifold-Guided Hashing for Flexible Cross- | modal | Retrieval |
OmniVec: Learning robust representations with cross | modal | sharing |
On | modal | Modeling for Medical Images: Underconstrained Shape Description and Data Compression |
On the Adversarial Robustness of Multi- | modal | Foundation Models |
On the regularization of image semantics by | modal | expansion |
On the Role of Correlation and Abstraction in Cross- | modal | Multimedia Retrieval |
Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross- | modal | Retrieval |
Online Asymmetric Similarity Learning for Cross- | modal | Retrieval |
Online Cross- | modal | Adaptation for Audio-Visual Person Identification With Wearable Cameras |
Online Fast Adaptive Low-Rank Similarity Learning for Cross- | modal | Retrieval |
Online Multi- | modal | Person Search in Videos |
Online multi- | modal | task-driven dictionary learning and robust joint sparse representation for visual tracking |
Open-Domain Retrieval Under Multi- | modal | Settings |
Open-Domain, Content-based, Multi- | modal | Fact-checking of Out-of-Context Images via Online Resources |
Open-Ended Video Question Answering via Multi- | modal | Conditional Adversarial Networks |
Open-Vocabulary Instance Segmentation via Robust Cross- | modal | Pseudo-Labeling |
Optimal Routing of Wide Multi- | modal | Energy and Infrastructure Corridors |
Order Preserving Bilinear Model for Person Detection in Multi- | modal | Data, An |
Panoramic face and ear image stitching in multi- | modal | recognition |
parallel cross- | modal | search engine over large-scale multimedia collections with interactive relevance feedback, A |
Parallel multi- | modal | background modeling |
Parametric Local Multi- | modal | Metric Learning for Person Re-identification |
Partial Multi- | modal | Hashing via Neighbor-Aware Completion Learning |
PCFN: Progressive Cross- | modal | Fusion Network for Human Pose Transfer |
Per-Cluster Ensemble Kernel Learning for Multi- | modal | Image Clustering With Group-Dependent Feature Selection |
Perception-Aware Cross- | modal | Signal Reconstruction: From Audio-Haptic to Visual |
Performance comparisons of multi- | modal | medical image registration algorithms |
Performance of Camera-Based Vibration Monitoring Systems in Input-Output | modal | Identification Using Shaker Excitation |
Persistent Stereo Visual Localization on Cross- | modal | Invariant Map |
Person instance graphs for mono-, cross- and multi- | modal | person recognition in multimedia data: application to speaker identification in TV broadcast |
Person tracking association using multi- | modal | systems |
Personalised, Multi- | modal | , Affective State Detection for Hybrid Brain-Computer Music Interfacing |
Perspectives and Prospects on Transformer Architecture for Cross- | modal | Tasks with Language and Vision |
PhoCaL: A Multi- | modal | Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects |
Photomatch: An Open-source Multi-view and Multi- | modal | Feature Matching Tool for Photogrammetric Applications |
Physical querying with multi- | modal | sensing |
PipeNet: Selective | modal | Pipeline of Fusion Network for Multi-Modal Face Anti-Spoofing |
PipeNet: Selective | modal | Pipeline of Fusion Network for Multi-Modal Face Anti-Spoofing |
Pix2Map: Cross- | modal | Retrieval for Inferring Street Maps from Images |
PMMN: Pre-Trained Multi- | modal | Network for Scene Text Recognition |
PMR: Prototypical | modal | Rebalance for Multimodal Learning |
Point similarity measures for non-rigid registration of multi- | modal | data |
PointAugmenting: Cross- | modal | Augmentation for 3D Object Detection |
PointDC: Unsupervised Semantic Segmentation of 3D Point Clouds via Cross- | modal | Distillation and Super-Voxel Clustering |
PolSAR Image Classification Based on Multi- | modal | Contrastive Fully Convolutional Network |
Polysemous Visual-Semantic Embedding for Cross- | modal | Retrieval |
POP: Mining POtential Performance of New Fashion Products via Webly Cross- | modal | Query Expansion |
Positive Unlabeled Fake News Detection via Multi- | modal | Masked Transformer Network |
Potential Impact of Cycling on Urban Transport Energy and | modal | Share: A GIS-Based Methodology, The |
Practical Cross- | modal | Manifold Alignment for Robotic Grounded Language Learning |
Practical Membership Inference Attacks Against Large-Scale Multi- | modal | Models: A Pilot Study |
Prediction of Learning Abilities Based on a Cross- | modal | Evaluation of Non-verbal Mental Attributes Using Video-Game-Like Interfaces |
Preserving | modal | ity Structure Improves Multi-Modal Learning |
Preserving Semantic Neighborhoods for Robust Cross- | modal | Retrieval |
Probabilistic Embeddings for Cross- | modal | Retrieval |
Probabilistic Information Fusion for Multi- | modal | Image Segmentation |
Probabilistic Risk Metric for Highway Driving Leveraging Multi- | modal | Trajectory Predictions |
Probabilistic Semantic Model for Image Annotation and Multi- | modal | Image Retrieval, A |
Probabilistic Semi-Supervised Multi- | modal | Hashing |
Probability-based Global Cross- | modal | Upsampling for Pansharpening |
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross- | modal | Pretraining |
Progressive Cross- | modal | Semantic Network for Zero-Shot Sketch-Based Image Retrieval |
Progressive | modal | ity-complement aggregative multitransformer for domain multi-modal neural machine translation |
Promptlearner-clip: Contrastive Multi- | modal | Action Representation Learning with Context Optimization |
ProtoTransfer: Cross- | modal | Prototype Transfer for Point Cloud Segmentation |
Pure Versus Hybrid Transformers For Multi- | modal | Brain Tumor Segmentation: A Comparative Study |
Quality-aware dual- | modal | saliency detection via deep reinforcement learning |
Quantitative Validation of Multi- | modal | Image Fusion and Segmentation for Object Detection and Tracking, A |
Quantum Probability Driven Framework for Joint Multi- | modal | Sarcasm, Sentiment and Emotion Analysis, A |
Query specific re-ranking for improved cross- | modal | retrieval |
R2GAN: Cross- | modal | Recipe Retrieval With Generative Adversarial Network |
Radar High-Resolution Range Profile Rejection Based on Deep Multi- | modal | Support Vector Data Description |
Re-mine, Learn and Reason: Exploring the Cross- | modal | Semantic Correlations for Language-guided HOI detection |
Re-Synchronization Using the Hand Preceding Model for Multi- | modal | Fusion in Automatic Continuous Cued Speech Recognition |
Reading order detection in visually-rich documents with multi- | modal | layout-aware relation prediction |
Reading to Listen at the Cocktail Party: Multi- | modal | Speech Separation |
Real-time Triple- | modal | Photoacoustic, Ultrasound, and Magnetic Resonance Fusion Imaging of Humans |
Real-world Cross- | modal | Retrieval via Sequential Learning |
Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross- | modal | Retrieval |
Recent Developments in 3D Multi- | modal | Laser Imaging Applied to Cultural Heritage |
Recipe1M+: A Dataset for Learning Cross- | modal | Embeddings for Cooking Recipes and Food Images |
Reconstruction regularized low-rank subspace learning for cross- | modal | retrieval |
Recovering 6D Object Pose: A Review and Multi- | modal | Analysis |
Referring Image Segmentation via Cross- | modal | Progressive Comprehension |
Referring Segmentation in Images and Videos With Cross- | modal | Self-Attention Network |
Referring Segmentation via Encoder-Fused Cross- | modal | Attention Network |
Reinforced Cross- | modal | Matching and Self-Supervised Imitation Learning for Vision-Language Navigation |
Relation-Induced Multi- | modal | Shared Representation Learning for Alzheimer's Disease Diagnosis |
Replay: Multi- | modal | Multi-view Acted Videos for Casual Holography |
Residual UNet with Dual Attention: An ensemble residual UNet with dual attention for multi- | modal | and multi-class brain MRI segmentation |
Results and Analysis of ChaLearn LAP Multi- | modal | Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges |
Rethink Cross- | modal | Fusion in Weakly-Supervised Audio-Visual Video Parsing |
Retrieval From and Understanding of Large-Scale Multi- | modal | Medical Datasets: A Review |
Retrieval of textured images through the use of quantization and | modal | analysis |
Revamping Cross- | modal | Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning |
Reversible | modal | Conversion Model for Thermal Infrared Tracking |
RFNet: Region-aware Fusion Network for Incomplete Multi- | modal | Brain Tumor Segmentation |
RFNet: Unsupervised Network for Mutually Reinforcing Multi- | modal | Image Registration and Fusion |
RGB depth salient object detection via cross- | modal | attention and boundary feature guidance |
RGBD Salient Object Detection via Disentangled Cross- | modal | Fusion |
RI-LPOH: Rotation-Invariant Local Phase Orientation Histogram for Multi- | modal | Image Matching |
Rich Features Embedding for Cross- | modal | Retrieval: A Simple Baseline |
RIFT: Multi- | modal | Image Matching Based on Radiation-Variation Insensitive Feature Transform |
RISM: Single- | modal | Image Registration via Rank-Induced Similarity Measure |
Robust 3D Multi- | modal | Registration of MRI Volumes Using the Sum of Conditional Variance |
Robust 3D Semantic Segmentation Method Based on Multi- | modal | Collaborative Learning |
Robust and discrete matrix factorization hashing for cross- | modal | retrieval |
Robust and Flexible Discrete Hashing for Cross- | modal | Similarity Search |
Robust Cross- | modal | Representation Learning with Progressive Self-Distillation |
Robust Deep Multi- | modal | Learning Based on Gated Information Fusion Network |
Robust Dynamic Multi- | modal | Data Fusion: A Model Uncertainty Perspective |
Robust MR image super-resolution reconstruction with cross- | modal | edge-preserving regularization |
Robust Multi- | modal | and Multi-unit Feature Level Fusion of Face and Iris Biometrics |
Robust Multi- | modal | Group Action Recognition in Meetings from Disturbed Videos with the Asynchronous Hidden Markov Model |
Robust multi- | modal | method for recognizing objects |
Robust multi | modal | discrete hashing for cross-modal similarity search |
Robust Multispectral Pedestrian Detection via Uncertainty-aware Cross- | modal | Learning |
Robust online hashing with label semantic enhancement for cross- | modal | retrieval |
Robust Registration of Dissimilar Single and Multi- | modal | Images |
Robust Registration of Multi- | modal | Medical Images: Towards Real-Time Clinical Applications |
Robust visual question answering via semantic cross | modal | augmentation |
RODNet: Radar Object Detection using Cross- | modal | Supervision |
RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross- | modal | Retrieval |
Scalable Discrete and Asymmetric Unequal Length Hashing Learning for Cross- | modal | Retrieval |
Scalable multi-label canonical correlation analysis for cross- | modal | retrieval |
Scale Estimation of Monocular SfM for a Multi- | modal | Stereo Camera |
Scene Representation Based on Multi- | modal | 2D and 3D Features, A |
Score Selection Techniques for Fingerprint Multi- | modal | Biometric Authentication |
Scratch Each Other's Back: Incomplete Multi- | modal | Brain Tumor Segmentation Via Category Aware Group Self-Support Learning |
SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross- | modal | Retrieval |
SDT: A Synthetic Multi- | modal | Dataset For Person Detection And Pose Classification |
Searching Multi-Rate and Multi- | modal | Temporal Enhanced Networks for Gesture Recognition |
See More and Know More: Zero-shot Point Cloud Segmentation via Multi- | modal | Visual Data |
Seeing Voices and Hearing Faces: Cross- | modal | Biometric Matching |
Segmenting Microorganisms in Multi- | modal | Volumetric Datasets Using a Modified Watershed Transform |
Sejong face database: A multi- | modal | disguise face database |
Self-Paced Multi-Grained Cross- | modal | Interaction Modeling for Referring Expression Comprehension |
Self-similarity measure for multi- | modal | image registration |
Self-Supervised Adversarial Hashing Networks for Cross- | modal | Retrieval |
Self-Supervised Correlation Learning for Cross- | modal | Retrieval |
Self-Supervised Cross- | modal | Distillation for Thermal Infrared Tracking |
Self-supervised cross- | modal | visual retrieval from brain activities |
Self-Supervised Depth Completion Based on Multi- | modal | Spatio-Temporal Consistency |
Self-Supervised Distilled Learning for Multi- | modal | Misinformation Identification |
Self-Supervised Feature Learning via Exploiting Multi- | modal | Data for Retinal Disease Diagnosis |
Self-Supervised Intra- | modal | and Cross-Modal Contrastive Learning for Point Cloud Understanding |
Self-Supervised Intra- | modal | and Cross-Modal Contrastive Learning for Point Cloud Understanding |
Semantic Alignment Network for Multi- | modal | Emotion Recognition |
Semantic consistency cross- | modal | dictionary learning with rank constraint |
Semantic Disentanglement Adversarial Hashing for Cross- | modal | Retrieval |
Semantic Preserving Generative Adversarial Network for Cross- | modal | Hashing |
Semantic-Driven Interpretable Deep Multi- | modal | Hashing for Large-Scale Multimedia Retrieval |
Semantic-embedding Guided Graph Network for cross- | modal | retrieval |
Semantically Multi- | modal | Image Synthesis |
Semantically Supervised Maximal Correlation For Cross- | modal | Retrieval |
Semantically-Aware Aerial Reconstruction from Multi- | modal | Data |
Semantics Disentangling for Cross- | modal | Retrieval |
Semantics-aware Multi- | modal | Domain Translation: From LiDAR Point Clouds to Panoramic Color Images |
Semantics-Aware Spatial-Temporal Binaries for Cross- | modal | Video Retrieval |
Semi-supervised cross- | modal | common representation learning with vector-valued manifold regularization |
Semi-supervised cross- | modal | hashing via modality-specific and cross-modal graph convolutional networks |
Semi-supervised cross- | modal | hashing via modality-specific and cross-modal graph convolutional networks |
Semi-supervised cross- | modal | image generation with generative adversarial networks |
Semi-supervised cross- | modal | representation learning with GAN-based Asymmetric Transfer Network |
Semi-Supervised Cross- | modal | Retrieval With Label Prediction |
Semi-Supervised Graph Convolutional Hashing Network For Large-Scale Cross- | modal | Retrieval |
Semi-Supervised Grounding Alignment for Multi- | modal | Feature Learning |
Semi-Supervised Knowledge Distillation for Cross- | modal | Hashing |
Semi-supervised Learning of Geospatial Objects through Multi- | modal | Data Integration |
Semi-supervised Speech-driven 3D Facial Animation via Cross- | modal | Encoding |
Semi-Supervised Urban Change Detection Using Multi- | modal | Sentinel-1 SAR and Sentinel-2 MSI Data |
Sentiment analysis of Chinese micro-blog based on multi- | modal | correlation model |
Sentiment-Aware Multi- | modal | Recommendation on Tourist Attractions |
Sequential Learning for Cross- | modal | Retrieval |
Sequential Parameter Estimation of | modal | Dispersion Curves with an Adaptive Particle Filter in Shallow Water: Experimental Results |
Shape gradient for multi- | modal | image segmentation using mutual information |
Shared Cross- | modal | Trajectory Prediction for Autonomous Driving |
Shared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi- | modal | Whole Slide Image Analysis |
Shear-resize factorizations for fast multi- | modal | volume registration |
Show and Tell in the Loop: Cross- | modal | Circular Correlation Learning |
SiamCLIM: Text-Based Pedestrian Search Via Multi- | modal | Siamese Contrastive Learning |
Siamese Transformer for Saliency Prediction Based on Multi-Prior Enhancement and Cross- | modal | Attention Collaboration |
SIC DB: multi- | modal | database for person authentication |
Sign Spotting via Multi- | modal | Fusion and Testing Time Transferring |
Similarity Gaussian Process Latent Variable Model for Multi- | modal | Data Analysis |
Simple to complex cross- | modal | learning to rank |
Simplified Framework for Zero-shot Cross- | modal | Sketch Data Retrieval, A |
Simplified | modal | Analysis and Search for Reliable Shape Retrieval |
Simultaneous Depth and Spectral Imaging With a Cross- | modal | Stereo System |
Simultaneous multi- | modal | registration of multiple images based on multi-dimensional joint phase moment distributions |
Single Frame Semantic Segmentation Using Multi- | modal | Spherical Images |
Single Image 3D Shape Retrieval via Cross- | modal | Instance and Category Contrastive Learning |
Single- | modal | Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning |
Single- | modal | Video Analysis of Personality Traits using Low-Level Visual Features |
Skeleton Aware Multi- | modal | Sign Language Recognition |
SMAN: Stacked Multi | modal | Attention Network for Cross-Modal Image-Text Retrieval |
SMIN: Semi-Supervised Multi- | modal | Interaction Network for Conversational Emotion Recognition |
Social Aware Multi- | modal | Pedestrian Crossing Behavior Prediction |
Social E-commerce Tax Evasion Detection Using Multi- | modal | Deep Neural Networks |
Social Image-Text Sentiment Classification With Cross- | modal | Consistency and Knowledge Distillation |
Social multi- | modal | event analysis via knowledge-based weighted topic model |
Soil Moisture Retrieval Using GNSS-IR Based on Empirical | modal | Decomposition and Cross-Correlation Satellite Selection |
Solving Mixed- | modal | Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval |
Sound Source Localization is All about Cross- | modal | Alignment |
Space-Time Crop & Attend: Improving Cross- | modal | Video Representation Learning |
Sparse Distortionless | modal | Beamforming for Spherical Microphone Arrays |
Sparse Multi- | modal | Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images |
Sparse Multi- | modal | Hashing |
Sparse Relational Topical Coding on multi- | modal | data |
Sparse Tikhonov-Regularized Hashing for Multi- | modal | Learning |
Sparse-induced similarity measure: mono- | modal | image registration via sparse-induced similarity measure |
Sparse-to-dense Feature Matching: Intra and Inter domain Cross- | modal | Learning in Domain Adaptation for 3D Semantic Segmentation |
SparseFusion: Fusing Multi- | modal | Sparse Representations for Multi-Sensor 3D Object Detection |
spatial feature enhanced MMI algorithm for multi- | modal | wild-fire image registration, A |
Spatial Information Extraction Method Based on Multi- | modal | Social Media Data: A Case Study on Urban Inundation, A |
Spatial-Temporal Graphs for Cross- | modal | Text2Video Retrieval |
Spatially-Constrained Fisher Representation for Brain Disease Identification With Incomplete Multi- | modal | Neuroimages |
Spatio-channel Attention Blocks for Cross- | modal | Crowd Counting |
Speaker Tracking Using Multi- | modal | Fusion Framework |
Special Issue on Multi-Camera and Multi- | modal | Sensor Fusion |
Speech-Visual Emotion Recognition via | modal | Decomposition Learning |
Speech2Action: Cross- | modal | Supervision for Action Recognition |
Splenomegaly Segmentation on Multi- | modal | MRI Using Deep Convolutional Networks |
StacMR: Scene-Text Aware Cross- | modal | Retrieval |
Statistical Modeling of Multi- | modal | Medical Image Fusion Method Using C-CHMM and M-PCNN |
Statistics of Second Order Multi- | modal | Feature Events and Their Exploitation in Biological and Artificial Visual Systems |
StEP: Style-based Encoder Pre-training for Multi- | modal | Image Synthesis |
Stimulus Verification is a Universal and Effective Sampler in Multi- | modal | Human Trajectory Prediction |
Structural Representations for Multi- | modal | Image Registration Based on Modified Entropy |
Structurally noise resistant classifier for multi- | modal | person verification |
Structure Aware Multi-Graph Network for Multi- | modal | Emotion Recognition in Conversations |
Structure-Aware Cross- | modal | Transformer for Depth Completion |
Structured and Sparse Canonical Correlation Analysis as a Brain-Wide Multi- | modal | Data Fusion Approach |
Study of a Cross- | modal | Interactive Search Tool Using CLIP and Temporal Fusion, A |
Subject-dependent classification for robust idle state detection using multi- | modal | neuroimaging and data-fusion techniques in BCI |
SUMMIT: Source-Free Adaptation of Uni- | modal | Models to Multi-Modal Targets |
SUMMIT: Source-Free Adaptation of Uni- | modal | Models to Multi-Modal Targets |
Supervised Contrastive Learning for Robust and Efficient Multi- | modal | Emotion and Sentiment Analysis |
Supervised discrete cross- | modal | hashing based on kernel discriminant analysis |
Supervised Matrix Factorization Hashing for Cross- | modal | Retrieval |
Supervised multi- | modal | dictionary learning for clothing representation |
support vector approach for cross- | modal | search of images and texts, A |
SUPRA: Superpixel Guided Loss for Improved Multi- | modal | Segmentation in Endoscopy |
survey of approaches and challenges in 3D and multi- | modal | 3D-2D face recognition, A |
SynDrone: Multi- | modal | UAV Dataset for Urban Scenarios |
Synergism of Multi- | modal | Data for Mapping Tree Species Distribution: A Case Study from a Mountainous Forest in Southwest China |
Synthesizing Photorealistic Virtual Humans Through Cross- | modal | Disentanglement |
SYRER: Synergistic Relational Reasoning for RGB-D Cross- | modal | Re-Identification |
Systems issues in distributed multi- | modal | surveillance |
TAMM: A Task-Adaptive Multi- | modal | Fusion Network for Facial-Related Health Assessments on 3D Facial Images |
TaoHighlight: Commodity-Aware Multi- | modal | Video Highlight Detection in E-Commerce |
Targeted Adversarial Attack Against Deep Cross- | modal | Hashing Retrieval |
Task-Oriented Multi- | modal | Mutual Learning for Vision-Language Models |
Task-Oriented Multi- | modal | Question Answering For Collaborative Applications |
Teacher-Student Learning: Efficient Hierarchical Message Aggregation Hashing for Cross- | modal | Retrieval |
Temporal and Cross- | modal | Attention for Audio-Visual Zero-Shot Learning |
Temporal multi- | modal | mean |
Temporally Language Grounding With Multi- | modal | Multi-Prompt Tuning |
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi- | modal | Video Similarity Evaluation |
Tensor Factorization for Leveraging Cross- | modal | Knowledge in Data-Constrained Infrared Object Detection |
Tensor-Train-Based Incremental High Order Dominant Z-Eigen Decomposition for Multi- | modal | Intelligent Transportation Prediction |
Text-Based Person Search via Cross- | modal | Alignment Learning |
Text-Guided Face Recognition using Multi-Granularity Cross- | modal | Contrastive Learning |
Text-Guided Object Detector for Multi- | modal | Video Question Answering |
Text2Pos: Text-to-Point-Cloud Cross- | modal | Localization |
Texture Recognition through | modal | Analysis of Spectral Peak Patterns |
Things That See: Context-Aware Multi- | modal | Interaction |
THInImg: Cross- | modal | Steganography for Presenting Talking Heads in Images |
Three-Stream Cross- | modal | Feature Aggregation Network for Light Field Salient Object Detection |
Three-stream RGB-D salient object detection network based on cross-level and cross- | modal | dual-attention fusion |
Time series classification using local distance-based features in multi- | modal | fusion networks |
Time-Domain Multi- | modal | Bone/Air Conducted Speech Enhancement |
Time-Lag Aware Multi- | modal | Variational Autoencoder Using Baseball Videos and Tweets for Prediction of Important Scenes |
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross- | modal | Saliency |
TMM-Nets: Transferred Multi- to Mono- | modal | Generation for Lupus Retinopathy Diagnosis |
TMMF: Temporal Multi- | modal | Fusion for Single-Stage Continuous Gesture Recognition |
TMNet: Triple- | modal | interaction encoder and multi-scale fusion decoder network for V-D-T salient object detection |
Top-1 Corsmal Challenge 2020 Submission: Filling Mass Estimation Using Multi- | modal | Observations of Human-robot Handovers |
TopFusion: Using Topological Feature Space for Fusion and Imputation in Multi- | modal | Data |
Topic regression multi- | modal | Latent Dirichlet Allocation for image annotation |
Toward General Cross- | modal | Signal Reconstruction for Robotic Teleoperation |
Toward Multi- | modal | Conditioned Fashion Image Translation |
Towards a Gesture-Sound Cross- | modal | Analysis |
Towards a Theoretical Framework for Learning Multi- | modal | Patterns for Embodied Agents |
Towards All-in-One Pre-Training via Maximizing Multi- | modal | Mutual Information |
Towards Bridged Vision and Language: Learning Cross- | modal | Knowledge Representation for Relation Extraction |
Towards Continual Egocentric Activity Recognition: A Multi- | modal | Egocentric Activity Dataset for Continual Learning |
Towards Cross- | modal | Comparison of Human Motion Data |
Towards Explainable Interactive Multi- | modal | Video Retrieval with Vitrivr |
Towards Flexible Multi- | modal | Document Models |
Towards Good Practices for Multi- | modal | Fusion in Large-Scale Video Classification |
Towards Video Captioning with Naming: A Novel Dataset and a Multi- | modal | Approach |
Track: a Multi- | modal | Deep Architecture for Head Motion Prediction in 360° Videos |
Tracking Humans using Multi- | modal | Fusion |
Traffic emissions management using capacity formulation and multi- | modal | road space allocation |
Traffic Sign Recognition via Multi- | modal | Tree-Structure Embedded Multi-Task Learning |
Transformer Decoders with Multi- | modal | Regularization for Cross-Modal Food Retrieval |
Transformer Decoders with Multi- | modal | Regularization for Cross-Modal Food Retrieval |
Transformer Encoder With Multi- | modal | Multi-Head Attention for Continuous Affect Recognition |
Transformer vision-language tracking via proxy token guided cross- | modal | fusion |
Transformer-based Cross- | modal | Recipe Embeddings with Large Batch Training |
TransFusion: Multi- | modal | Fusion Network for Semantic Segmentation |
Translation, Scale and Rotation: Cross- | modal | Alignment Meets RGB-Infrared Vehicle Detection |
Trash to Treasure: Harvesting OOD Data with Cross- | modal | Matching for Open-Set Semi-Supervised Learning |
Tri-Attention fusion guided multi- | modal | segmentation network, A |
Tri- | modal | Person Re-identification with RGB, Depth and Thermal Features |
Tripartite Graph Models for Multi | modal | Image Retrieval |
Triplet-Based Deep Hashing Network for Cross- | modal | Retrieval |
TV Program Segmentation using Multi- | modal | Information Fusion |
Two Headed Dragons: Multi | modal | Fusion and Cross Modal Transactions |
Two-stage multi- | modal | MR images fusion method based on Parametric Logarithmic Image Processing (PLIP) Model |
Two-Stage Supervised Discrete Hashing for Cross- | modal | Retrieval |
Two-Step Registration on Multi- | modal | Retinal Images via Deep Neural Networks |
UC2: Universal Cross-lingual Cross- | modal | Vision-and-Language Pre-training |
UCTNet: Uncertainty-Aware Cross- | modal | Transformer Network for Indoor RGB-D Semantic Segmentation |
UMT: Unified Multi- | modal | Transformers for Joint Video Moment Retrieval and Highlight Detection |
Uncertainty-Aware Multi- | modal | Learning via Cross-Modal Random Network Prediction |
Uncertainty-Aware Multi- | modal | Learning via Cross-Modal Random Network Prediction |
Uncertainty-Guided Cross- | modal | Learning for Robust Multispectral Pedestrian Detection |
Understanding and Constructing Latent | modal | ity Structures in Multi-Modal Representation Learning |
Understanding Dark Scenes by Contrasting Multi- | modal | Observations |
Unified Adversarial Patch for Cross- | modal | Attacks in the Physical World |
Unified Adversarial Patch for Visible-Infrared Cross- | modal | Attacks in the Physical World |
Unified Information Fusion Network for Multi- | modal | RGB-D and RGB-T Salient Object Detection |
Unified Multi- | modal | Structure for Retrieving Tracked Vehicles through Natural Language Descriptions, A |
Unified Statistical and Information Theoretic Framework for Multi- | modal | Image Registration, A |
UniSeg: A Unified Multi- | modal | LiDAR Segmentation Network and the OpenPCSeg Codebase |
Unite and Conquer: Plug and Play Multi- | modal | Synthesis Using Diffusion Models |
UniTR: A Unified and Efficient Multi- | modal | Transformer for Bird's-Eye-View Representation |
UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi- | modal | Saliency Detection |
Universal Weighting Metric Learning for Cross- | modal | Matching |
Universal Weighting Metric Learning for Cross- | modal | Retrieval |
Unobtrusive multi- | modal | biometric recognition using activity-related signatures |
Unpaired Multi- | modal | Segmentation via Knowledge Distillation |
Unsupervised Contrastive Cross- | modal | Hashing |
Unsupervised Cross- | modal | Alignment for Multi-person 3d Pose Estimation |
Unsupervised Cross- | modal | Deep-Model Adaptation for Audio-Visual Re-identification with Wearable Cameras |
Unsupervised Cross- | modal | Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing, An |
Unsupervised Cross- | modal | Hashing With Modality-Interaction |
Unsupervised Cross- | modal | Synthesis of Subject-Specific Scans |
Unsupervised Multi- | modal | Image Registration via Geometry Preserving Image-to-Image Translation |
Unsupervised Multi- | modal | Neural Machine Translation |
Unsupervised person clustering in videos with cross- | modal | communication |
Unsupervised Segmentation of Multi- | modal | Images by a Precise Approximation of Individual Modes with Linear Combinations of Discrete Gaussians |
Urban Observatory: A Multi- | modal | Imaging Platform for the Study of Dynamics in Complex Urban Systems, The |
Urban Road Network Partitioning Based on Bi- | modal | Traffic Flows With Multiobjective Optimization |
Using multi- | modal | 3D contours and their relations for vision and robotics |
Using Multi- | modal | Machine Learning for User Behavior Prediction in Simulated Smart Home for Extended Reality |
Utilizing Deep Learning Towards Multi- | modal | Bio-Sensing and Vision-Based Affective Computing |
V2VFormer++: Multi- | modal | Vehicle-to-Vehicle Cooperative Perception via Global-Local Transformer |
Variational Approach to Multi- | modal | Image Matching, A |
VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi- | modal | GRaphs |
Vegetation Land Segmentation with Multi- | modal | and Multi-Temporal Remote Sensing Images: A Temporal Learning Approach and a New Dataset |
Versatile Multi- | modal | Pre-Training for Human-Centric Perception |
Versatile Multi- | modal | System for Surface Profile Measurements Using a Wrist-Mounted Laser Device |
Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi- | modal | Networks |
Video driven fire spread forecasting (f) using multi- | modal | LWIR and visual flame and smoke data |
Video Memorability Prediction Via Late Fusion of Deep Multi- | modal | Features |
Video Moment Localization via Deep Cross- | modal | Hashing |
Video Moment Retrieval With Cross- | modal | Neural Architecture Search |
Video Pivoting Unsupervised Multi- | modal | Machine Translation |
Video retrieval with multi- | modal | features |
Video Sampled Frame Category Aggregation and Consistent Representation for Cross- | modal | Retrieval |
Video search by multi- | modal | and clustering analysis |
Video-Based Cross- | modal | Auxiliary Network for Multimodal Sentiment Analysis |
Video-Music Retrieval with Fine-Grained Cross- | modal | Alignment |
Video-Rate Dual- | modal | Wide-Beam Harmonic Ultrasound and Photoacoustic Computed Tomography |
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross- | modal | Representation Learning |
VideoXum: Cross- | modal | Visual and Textural Summarization of Videos |
VIPL-HR: A Multi- | modal | Database for Pulse Estimation from Less-Constrained Face Video |
Vision-Based Dynamic Response Extraction and | modal | Identification of Simple Structures Subject to Ambient Excitation |
Vision-Based Multi- | modal | Framework for Action Recognition |
Vision-Dialog Navigation by Exploring Cross- | modal | Memory |
ViSTA: Vision and Scene Text Aggregation for Cross- | modal | Retrieval |
Visual Mapping and Multi- | modal | Localisation for Anywhere AR Authoring |
Visual Prompt Multi- | modal | Tracking |
Visual question answering with attention transfer and a cross- | modal | gating mechanism |
Visual Question Generation Under Multi-granularity Cross- | modal | Interaction |
Visual-Textual Cross- | modal | Interaction Network for Radiology Report Generation |
Visual-to-EEG cross- | modal | knowledge distillation for continuous emotion recognition |
VisualVoice: Audio-Visual Speech Separation with Cross- | modal | Consistency |
VLCDoC: Vision-Language contrastive pre-training model for cross- | modal | document classification |
VoP: Text-Video Co-Operative Prompt Tuning for Cross- | modal | Retrieval |
Wasserstein Coupled Graph Learning for Cross- | modal | Retrieval |
Watch, Listen and Tell: Multi- | modal | Weakly Supervised Dense Event Captioning |
Wavelet energy map: A robust support for multi- | modal | registration of medical images |
Weakly Aligned Cross- | modal | Learning for Multispectral Pedestrian Detection |
Weakly Supervised Video Emotion Detection and Prediction via Cross- | modal | Temporal Erasing Network |
Weakly-paired deep dictionary learning for cross- | modal | retrieval |
Weakly-Supervised Deep Image Hashing based on Cross- | modal | Transformer |
Weakly-Supervised Vessel Detection in Ultra-Widefield Fundus Photography via Iterative Multi- | modal | Registration and Learning |
Weakly-Supervised Video Object Grounding via Learning Uni- | modal | Associations |
Weighted Graph-Matching Using | modal | Clusters |
Weighted Graph-Structured Semantics Constraint Network for Cross- | modal | Retrieval |
weightless regression system for predicting multi- | modal | empathy, A |
Well-posedness of eight problems of multi- | modal | statistical image-matching |
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi- | modal | Language Models |
What Makes Training Multi- | modal | Classification Networks Hard? |
Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross- | modal | Denoising Networks |
Workshop on Multi-camera and Multi- | modal | Sensor Fusion Algorithms and Applications |
X-Align: Cross- | modal | Cross-View Alignment for Bird's-Eye-View Segmentation |
X-GACMN: An X-Shaped Generative Adversarial Cross- | modal | Network with Hypersphere Embedding |
X- | modal | Net: A semi-supervised deep cross-modal network for classification of remote sensing data |
X-Pool: Cross- | modal | Language-Video Attention for Text-Video Retrieval |
X-Trans2Cap: Cross- | modal | Knowledge Transfer using Transformer for 3D Dense Captioning |
xMUDA: Cross- | modal | Unsupervised Domain Adaptation for 3D Semantic Segmentation |
XVO: Generalized Visual Odometry via Cross- | modal | Self-Training |
YOLOMM: You Only Look Once for Multi- | modal | Multi-tasking |
Zero-Shot and Few-Shot Video Question Answering with Multi- | modal | Prompts |
Zero-Shot Event Detection Using Multi- | modal | Fusion of Weakly Supervised Concepts |
1702 for modal
_ | modality | _ |
3-D Multi- | modality | Image Framework for Left Ventricle Motion Analysis, A |
3D Auto-Context-Based Locality Adaptive Multi- | modality | GANs for PET Synthesis |
3D Multi- | modality | Medical Image Registration Using Feature Space Clustering |
3D Reconstruction of Blood Vessels by Multi- | modality | Data Fusion Using Fuzzy and Markovian Modelling |
ABMDRNet: Adaptive-weighted Bi-directional | modality | Difference Reduction Network for RGB-T Semantic Segmentation |
Acoustic | modality | Based Hybrid Deep 1D CNN-BiLSTM Algorithm for Moving Vehicle Classification |
ADAPT: Vision-Language Navigation with | modality | -Aligned Action Prompts |
Adaptive | modality | Distillation for Separable Multimodal Sentiment Analysis |
Adding Gestures to Ordinary Mouse Use: a New Input | modality | for Improved Human-Computer Interaction |
Adversarial Approach to Discriminative | modality | Distillation for Remote Sensing Image Classification, An |
Adversarial Decoupling and | modality | -Invariant Representation Learning for Visible-Infrared Person Re-Identification |
Adversarial Disentanglement Spectrum Variations and Cross- | modality | Attention Networks for NIR-VIS Face Recognition |
Alleviating | modality | Bias Training for Infrared-Visible Person Re-Identification |
Anatomy-Regularized Representation Learning for Cross- | modality | Medical Image Segmentation |
Anchor-supported multi- | modality | hashing embedding for person re-identification |
Appearance Matters, So Does Audio: Revealing the Hidden Face via Cross- | modality | Transfer |
Applicability of Precipitation Products in the Endorheic Basin of the Yellow River under Multi-Scale in Time and | modality | |
ArabSign: A Multi- | modality | Dataset and Benchmark for Continuous Arabic Sign Language Recognition |
Are Multimodal Transformers Robust to Missing | modality | ? |
ASMFS: Adaptive-Similarity-Based Multi- | modality | Feature Selection for Classification of Alzheimer's Disease |
Assemblenet++: Assembling | modality | Representations via Attention Connections |
Asymmetric Color Transfer with Consistent | modality | Learning |
Asymmetric | modality | Translation for Face Presentation Attack Detection |
Attend to the Difference: Cross- | modality | Person Re-Identification via Contrastive Correlation |
Attention-Based Autism Spectrum Disorder Screening With Privileged | modality | |
Audio-visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual | modality | for Person Recognition |
Automatic 2-D/3-D Vessel Enhancement in Multiple | modality | Images Using a Weighted Symmetry Filter |
Automatic Hippocampal Subfield Segmentation from 3T Multi- | modality | Images |
Automatic Sentence | modality | Recognition in Children's Speech, and Its Usage Potential in the Speech Therapy |
Backfilled GEI: A Cross-Capture | modality | Gait Feature for Frontal and Side-View Gait Recognition, The |
Balanced and Essential | modality | -Specific and Modality-Shared Representations for Visible-Infrared Person Re-Identification |
Balanced and Essential | modality | -Specific and Modality-Shared Representations for Visible-Infrared Person Re-Identification |
Base-Derivative Framework for Cross- | modality | RGB-Infrared Person Re-Identification, A |
Basic finger inner-knuckle print: A new hand biometric | modality | |
BEV-Guided Multi- | modality | Fusion for Driving Perception |
Beyond Fusion: | modality | Hallucination-based Multispectral Fusion for Pedestrian Detection |
Beyond | modality | Alignment: Learning Part-Level Representation for Visible-Infrared Person Re-Identification |
Bi-directional Cross- | modality | Feature Propagation with Separation-and-aggregation Gate for RGB-D Semantic Segmentation |
BiCro: Noisy Correspondence Rectification for Multi- | modality | Data via Bi-directional Cross-modal Similarity Consistency |
Bidirectional Mapping-Based Domain Adaptation for Nucleus Detection in Cross- | modality | Microscopy Images |
Bilevel Integrated Model With Data-Driven Layer Ensemble for Multi- | modality | Image Fusion, A |
Biologically Motivated Cross- | modality | Sensory Fusion System for Automatic Target Recognition |
C2ANet: Cross-Scale and Cross- | modality | Aggregation Network for Scene Depth Super-Resolution |
cascaded framework with cross- | modality | transfer learning for whole heart segmentation, A |
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi- | modality | Image Fusion |
CF Distance: A New Domain Discrepancy Metric and Application to Explicit Domain Adaptation for Cross- | modality | Cardiac Image Segmentation |
CGMDRNet: Cross-Guided | modality | Difference Reduction Network for RGB-T Salient Object Detection |
Change Detection in Heterogeneous Remote Sensing Images Based on an Imaging | modality | -Invariant MDS Representation |
CIGAR: Cross- | modality | Graph Reasoning for Domain Adaptive Object Detection |
CIR-Net: Cross- | modality | Interaction and Refinement for RGB-D Salient Object Detection |
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer With | modality | -Correlated Cross-Attention for Brain Tumor Segmentation |
CL-GAN: Contrastive Learning-based Generative Adversarial Network for | modality | Transfer with Limited Paired Data |
Clothes-Changing Person Re-identification with RGB | modality | Only |
CM-MaskSD: Cross- | modality | Masked Self-Distillation for Referring Image Segmentation |
CM-NAS: Cross- | modality | Neural Architecture Search for Visible-Infrared Person Re-Identification |
CMA-CLIP: Cross- | modality | Attention Clip for Text-Image Classification |
CMDA: Cross- | modality | Domain Adaptation for Nighttime Semantic Segmentation |
CMOS-GAN: Semi-Supervised Generative Adversarial Model for Cross- | modality | Face Image Synthesis |
COCGV: a method for multi- | modality | 3D volume registration |
Cohesive Multi- | modality | Feature Learning and Fusion for COVID-19 Patient Severity Prediction |
Comparative evaluation of 3D vs. 2D | modality | for automatic detection of facial action units |
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only | modality | |
Compton Scattering Tomography: Feature Reconstruction and Rotation-Free | modality | |
Concept-Driven Multi- | modality | Fusion for Video Search |
Conditional Convolutional Neural Network for | modality | -Aware Face Recognition |
Context-guided ground truth sampling for multi- | modality | data augmentation in autonomous driving |
Context-Sensitive Single- | modality | Image Emotion Analysis: A Unified Architecture from Dataset Construction to CNN Classification |
ContextDesc: Local Descriptor Augmentation With Cross- | modality | Context |
Continuous Global Evidence-Based Bayesian | modality | Fusion for Simultaneous Tracking of Multiple Objects |
Contrastive Image Synthesis and Self-supervised Feature Adaptation for Cross- | modality | Biomedical Image Segmentation |
Controllable Text-to-Image Synthesis for Multi- | modality | MR Images |
Cooperative Light-Field Image Super-Resolution Based on Multi- | modality | Embedding and Fusion With Frequency Attention |
Counterfactual attention alignment for visible-infrared cross- | modality | person re-identification |
Cross | modality | Knowledge Distillation for Multi-modal Aerial View Object Classification |
Cross | modality | label fusion in multi-atlas segmentation |
Cross-Guided Feature Fusion with Intra- | modality | Reweighting for Multi-Spectral Pedestrian Detection |
Cross-modal alignment and translation for missing | modality | action recognition |
Cross- | modality | 3D Object Detection |
Cross- | modality | Anatomical Landmark Detection Using Histograms of Unsigned Gradient Orientations and Atlas Location Autocontext |
Cross- | modality | Attention and Multimodal Fusion Transformer for Pedestrian Detection |
Cross- | modality | attentive feature fusion for object detection in multispectral remote sensing imagery |
Cross- | modality | Augmentation of Brain MR Images Using a Novel Pairwise Generative Adversarial Network for Enhanced Glioma Classification |
Cross- | modality | Automatic Face Model Training from Large Video Databases |
Cross- | modality | Binary Code Learning via Fusion Similarity Hashing |
Cross- | modality | Bridging and Knowledge Transferring for Image Understanding |
Cross- | modality | Compensation Convolutional Neural Networks for RGB-D Action Recognition |
Cross- | modality | deep feature learning for brain tumor segmentation |
Cross- | modality | Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection |
Cross- | modality | Face Recognition via Heterogeneous Joint Bayesian |
Cross- | modality | Feature Fusion Network for Few-Shot 3D Point Cloud Classification |
Cross- | modality | Fusion and Progressive Integration Network for Saliency Prediction on Stereoscopic 3D Images |
Cross- | modality | hashing with partial correspondence |
Cross- | modality | Hierarchical Clustering and Refinement for Unsupervised Visible-Infrared Person Re-Identification |
Cross- | modality | Image Registration Using a Training-Time Privileged Third Modality |
Cross- | modality | Image Registration Using a Training-Time Privileged Third Modality |
Cross- | modality | Image Synthesis via Weakly Coupled and Geometry Co-Regularized Joint Dictionary Learning |
Cross- | modality | Knowledge Calibration Network for Video Corpus Moment Retrieval |
Cross- | modality | Knowledge Distillation Network for Monocular 3D Object Detection |
Cross- | modality | Learning Approach for Vessel Segmentation in Retinal Images, A |
Cross- | modality | Matching for Evaluating User Experience of Emerging Mobile EEG Technology |
Cross- | modality | Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning |
Cross- | modality | motion parameterization for fine-grained video prediction |
Cross- | modality | Neuroimage Synthesis: A Survey |
Cross- | modality | person re-identification using hybrid mutual learning |
Cross- | modality | Person Re-Identification via Modality Confusion and Center Aggregation |
Cross- | modality | Person Re-Identification via Modality Confusion and Center Aggregation |
Cross- | modality | Person Re-Identification via Modality-Aware Collaborative Ensemble Learning |
Cross- | modality | Person Re-Identification via Modality-Aware Collaborative Ensemble Learning |
Cross- | modality | person re-identification via multi-task learning |
Cross- | modality | Person Re-Identification With Shared-Specific Feature Transfer |
Cross- | modality | Personalization for Retrieval |
Cross- | modality | pose-invariant facial expression |
Cross- | modality | Proposal-Guided Feature Mining for Unregistered RGB-Thermal Pedestrian Detection |
Cross- | modality | Pyramid Alignment for Visual Intention Understanding |
Cross- | modality | Transformer for Visible-Infrared Person Re-Identification |
Cross- | modality | Transformer With Modality Mining for Visible-Infrared Person Re-Identification |
Cross- | modality | Transformer With Modality Mining for Visible-Infrared Person Re-Identification |
Cycle-Consistent Generative Rendering for 2D-3D | modality | Translation |
Data fusion through cross- | modality | metric learning using similarity-sensitive hashing |
Data gap decomposed by auxiliary | modality | for NIR-VIS heterogeneous face recognition |
DDFM: Denoising Diffusion Model for Multi- | modality | Image Fusion |
Deep Active Learning from Multispectral Data Through Cross- | modality | Prediction Inconsistency |
Deep Cross- | modality | Adaptation via Semantics Preserving Adversarial Learning for Sketch-Based 3D Shape Retrieval |
Deep hard | modality | alignment for visible thermal person re-identification |
Deep Learning of Radiometrical and Geometrical SAR Distorsions for Image | modality | translations |
Deep | modality | Assistance Co-Training Network for Semi-Supervised Multi-Label Semantic Decoding |
Deep | modality | Invariant Adversarial Network for Shared Representation Learning |
Deep Multi- | modality | Adversarial Networks for Unsupervised Domain Adaptation |
Deep multisensor learning for missing- | modality | all-weather mapping |
Deep Symmetric Adaptation Network for Cross- | modality | Medical Image Segmentation |
Dense | modality | Interaction Network for Audio-Visual Event Localization |
DGFNet: Depth-Guided Cross- | modality | Fusion Network for RGB-D Salient Object Detection |
Discover Cross- | modality | Nuances for Visible-Infrared Person Re-Identification |
Discriminative Cross- | modality | Attention Network for Temporal Inconsistent Audio-Visual Event Localization |
Discriminative multi- | modality | non-negative sparse graph model for action recognition |
Discriminative Multi- | modality | Speech Recognition |
Disease-Image-Specific Learning for Diagnosis-Oriented Neuroimage Synthesis With Incomplete Multi- | modality | Data |
Disentangle First, Then Distill: A Unified Framework for Missing | modality | Imputation and Alzheimer's Disease Diagnosis |
Distance Based Training for Cross- | modality | Person Re-Identification |
Diverse Embedding Expansion Network and Low-Light Cross- | modality | Benchmark for Visible-Infrared Person Re-identification |
DLFace: Deep local descriptor for cross- | modality | face recognition |
Domain Adaptation Meets Zero-Shot Learning: An Annotation-Efficient Approach to Multi- | modality | Medical Image Segmentation |
Domain-Agnostic Learning With Anatomy-Consistent Embedding for Cross- | modality | Liver Segmentation |
Double cross- | modality | progressively guided network for RGB-D salient object detection |
Drone-Based RGB-Infrared Cross- | modality | Vehicle Detection Via Uncertainty-Aware Learning |
Dual | modality | Collaborative Learning for Cross-Source Remote Sensing Retrieval |
Dual | modality | Prompt Tuning for Vision-Language Pre-Trained Model |
Dual Mutual Learning for Cross- | modality | Person Re-Identification |
Dual Polarization | modality | Fusion Network for Assisting Pathological Diagnosis |
DUAL-GLOW: Conditional Flow-Based Generative Model for | modality | Transfer |
Dual- | modality | 3D brain PET-CT image segmentation based on probabilistic brain atlas and classification fusion |
Dual- | modality | imager based on ultrasonic modulation of incoherent light in turbid medium |
Dual- | modality | spatiotemporal feature learning for spontaneous facial expression recognition in e-learning using hybrid deep neural network |
Dual- | modality | Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers |
Dual- | modality | Vehicle Anomaly Detection via Bilateral Trajectory Tracing |
Dynamic Center Aggregation Loss With Mixed | modality | for Visible-Infrared Person Re-Identification |
Dynamic Fusion With Intra- and Inter- | modality | Attention Flow for Visual Question Answering |
Dynamic graph fusion label propagation for semi-supervised multi- | modality | classification |
Ea-GANs: Edge-Aware Generative Adversarial Networks for Cross- | modality | MR Image Synthesis |
Effective adaptation of multimedia documents with | modality | conversion |
Efficient Cross- | modality | Graph Reasoning for RGB-Infrared Person Re-Identification |
Efficient Deep Visual and Inertial Odometry with Adaptive Visual | modality | Selection |
efficient framework for visible-infrared cross | modality | person re-identification, An |
Efficient RGB-T Tracking via Cross- | modality | Distillation |
Electroencephalographic Phase-Amplitude Coupling in Simulated Driving With Varying | modality | -Specific Attentional Demand |
Enabling | modality | interactions for RGB-T salient object detection |
End-to-end cross- | modality | retrieval with CCA projections and pairwise ranking loss |
Enhanced fingerprint verification through novel matching | modality | |
Enhanced Invariant Feature Joint Learning via | modality | -Invariant Neighbor Relations for Cross-Modality Person Re-Identification |
Enhanced Invariant Feature Joint Learning via | modality | -Invariant Neighbor Relations for Cross-Modality Person Re-Identification |
Enhancing | modality | -Agnostic Representations via Meta-learning for Brain Tumor Segmentation |
Ensemble classification with modified SIFT descriptor for medical image | modality | |
Ensemble of deep convolutional neural networks based multi- | modality | images for Alzheimer's disease diagnosis |
Evaluation of a Low Cost EMG Sensor as a | modality | for Use in Virtual Reality Applications |
Explainable Medical Imaging Framework for | modality | Classifications Trained Using Small Datasets, An |
Exploration of Feature Detector Performance in the Thermal-Infrared | modality | , An |
Exploring Cross- | modality | Affective Reactions for Audiovisual Emotion Recognition |
Exploring | modality | -shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification |
Exploring | modality | -shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification |
Exploring | modality | -shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification |
Exploring Task Structure for Brain Tumor Segmentation From Multi- | modality | MR Images |
Extended | modality | Propagation: Image Synthesis of Pathological Cases |
External Commonsense Knowledge as a | modality | for Social Intelligence Question-Answering |
Facial action unit detection: 3D versus 2D | modality | |
fast piecewise deformable method for multi- | modality | image registration, A |
Feature fusion and latent feature learning guided brain tumor segmentation and missing | modality | recovery network |
Feature-Supervised Action | modality | Transfer |
Fidelity-driven Optimization Reconstruction and Details Preserving Guided Fusion for Multi- | modality | Medical Image |
Financial time series forecasting with multi- | modality | graph neural network |
FMCNet: Feature-Level | modality | Compensation for Visible-Infrared Person Re-Identification |
From Within to Between: Knowledge Distillation for Cross | modality | Retrieval |
FULLER: Unified Multi- | modality | Multi-task 3D Perception via Multi-level Gradient Calibration |
Fully automatic 3D feature-based registration of multi- | modality | medical images |
Fusion of Intra- and Inter- | modality | Algorithms for Face-Sketch Recognition |
Fuzzy classification for multi- | modality | image fusion |
Generalizable Cross- | modality | Medical Image Segmentation via Style Augmentation and Dual Normalization |
Geometric Deep Learning Using Vascular Surface Meshes for | modality | -Independent Unruptured Intracranial Aneurysm Detection |
Gestures in Human-Computer Interaction: Just Another | modality | ? |
Graph-Based Progressive Fusion Network for Multi- | modality | Vehicle Re-Identification |
Graph-based semi-supervised learning with multi- | modality | propagation for large-scale image datasets |
Gray Augmentation Exploration with All- | modality | Center-Triplet Loss for Visible-Infrared Person Re-Identification |
HalluciDet: Hallucinating RGB | modality | for Person Detection Through Privileged Information |
HDNet: Multi- | modality | Hierarchy-Aware Decision Network for RGB-D Salient Object Detection |
Hepatic Lesion Segmentation by Combining Plain and Contrast-Enhanced CT Images with | modality | Weighted U-Net |
Heterogeneous Face Recognition by Margin-Based Cross- | modality | Metric Learning |
Hi-CMD: Hierarchical Cross- | modality | Disentanglement for Visible-Infrared Person Re-Identification |
Hierarchical Forgery Classifier on Multi- | modality | Face Forgery Clues |
Hierarchical Framework for Motion Trajectory Forecasting Based on | modality | Sampling, A |
Hierarchical hyperlingual-words for multi- | modality | face classification |
HPILN: a feature learning framework for cross- | modality | person re-identification |
HRTransNet: HRFormer-Driven Two- | modality | Salient Object Detection |
Human action recognition using hull convexity defect features with multi- | modality | setups |
Human forehead recognition: a novel biometric | modality | based on near-infrared laser backscattering feature image using deep transfer learning |
Human Lips as Emerging Biometrics | modality | |
Illumination-aware window transformer for RGBT | modality | fusion |
Image | modality | classification: a late fusion method based on confidence indicator and closeness matrix |
Improved DS acoustic-seismic | modality | fusion for ground-moving target classification in wireless sensor networks |
Improving Image Description with Auxiliary | modality | for Visual Localization in Challenging Conditions |
Improving Multispectral Pedestrian Detection by Addressing | modality | Imbalance Problems |
Improving RGB-D Salient Object Detection via | modality | -Aware Decoder |
Improving RGB-Infrared Object Detection by Reducing Cross- | modality | Redundancy |
Improving Rgb-Infrared Pedestrian Detection by Reducing Cross- | modality | Redundancy |
Infrared-Visible Person Re-Identification Via Cross- | modality | Batch Normalized Identity Embedding And Mutual Learning |
Inshore Ship Detection Based on Multi- | modality | Saliency for Synthetic Aperture Radar Images |
Integration graph attention network and multi-centre constrained loss for cross- | modality | person re-identification |
Inter-Intra Cross- | modality | Self-Supervised Video Representation Learning by Contrastive Clustering |
Inter- | modality | Face Recognition |
Inter- | modality | Fusion Based Attention for Zero-Shot Cross-Modal Retrieval |
Inter- | modality | registration of NMRi and histological section images using neural networks regression in Gabor feature space |
Interactive Image Segmentation with Cross- | modality | Vision Transformers |
Interpretable Fusion Siamese Network for Multi- | modality | Remote Sensing Ship Image Retrieval, An |
Intrinsic Structured Graph Alignment Module With | modality | -Invariant Representations for NIR-VIS Face Recognition, An |
investigation into automated age estimation using sclera images: a novel | modality | , An |
Iterative registration for multi- | modality | retinal fundus photographs using directional vessel skeleton |
Joint Cross-Attention Network With Deep | modality | Prior for Fast MRI Reconstruction |
Joint Feature Selection and Graph Regularization for | modality | -Dependent Cross-Modal Retrieval |
Joint Sequence Learning and Cross- | modality | Convolution for 3D Biomedical Segmentation |
Joint spatiograms for multi- | modality | tracking with online update |
Joint Visual-Textual Sentiment Analysis Based on Cross- | modality | Attention Mechanism |
Large image | modality | labeling initiative using semi-supervised and optimized clustering |
Large-Scale Cross- | modality | Search via Collective Matrix Factorization Hashing |
Late fusion of deep learning and handcrafted visual features for biomedical image | modality | classification |
Latent Representation Learning for Alzheimer's Disease Diagnosis With Incomplete Multi- | modality | Neuroimaging and Genetic Data |
Learnable Irrelevant | modality | Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos |
Learnable Irrelevant | modality | Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos |
Learnable manifold alignment (LeMA): A semi-supervised cross- | modality | learning framework for land cover and land use classification |
Learning based Multi- | modality | Image and Video Compression |
Learning Cross- | modality | Representations From Multi-Modal Images |
Learning cross- | modality | similarity for multinomial data |
Learning Discriminative Cross- | modality | Features for RGB-D Saliency Detection |
Learning Dual-fused | modality | -aware Representations for RGBD Tracking |
Learning Memory-Augmented Unidirectional Metrics for Cross- | modality | Person Re-identification |
Learning | modality | Interaction for Temporal Sentence Localization and Event Captioning in Videos |
Learning | modality | -invariant binary descriptor for crossing palmprint to palm-vein recognition |
Learning | modality | -invariant features for heterogeneous face recognition |
Learning | modality | -Specific Representations for Visible-Infrared Person Re-Identification |
Learning Visual Representation from | modality | -Shared Contrastive Language-Image Pre-training |
Learning with Privileged Information via Adversarial Discriminative | modality | Distillation |
Learning with Side Information through | modality | Hallucination |
LKDA-GAN: Cross- | modality | image synthesis via Generative Adversarial Network aggregating large kernel decomposable attention bottleneck block |
LLM: Learning Cross- | modality | Person Re-Identification via Low-Rank Local Matching |
Local cross- | modality | image alignment using unsupervised learning |
Locally Confined | modality | Fusion Network With a Global Perspective for Multimodal Human Affective Computing |
Low-cost Multispectral Scene Analysis with | modality | Distillation |
LSTM-MA: A LSTM Method with Multi- | modality | and Adjacency Constraint for Brain Image Segmentation |
M2FINet: | modality | -specific and Modality-shared Features Interaction Network for RGB-IR Person Re-Identification |
M2FINet: | modality | -specific and Modality-shared Features Interaction Network for RGB-IR Person Re-Identification |
M3L: Multi- | modality | mining for metric learning in person re-Identification |
Machine Learned Texture Prior From Full-Dose CT Database via Multi- | modality | Feature Selection for Bayesian Reconstruction of Low-Dose CT |
Maintaining multi- | modality | through mixture tracking |
MARS: Learning | modality | -Agnostic Representation for Scalable Cross-Media Retrieval |
MCANet: Multiscale Cross- | modality | Attention Network for Multispectral Pedestrian Detection |
MCMT-GAN: Multi-Task Coherent | modality | Transferable GAN for 3D Brain Image Synthesis |
Medical image | modality | classification using discrete Bayesian networks |
MemBridge: Video-Language Pre-Training With Memory-Augmented Inter- | modality | Bridge |
MeshTalk: 3D Face Animation from Speech using Cross- | modality | Disentanglement |
METGAN: Generative Tumour Inpainting and | modality | Synthesis in Light Sheet Microscopy |
MFGNet: Dynamic | modality | -Aware Filter Generation for RGB-T Tracking |
MFHI: Taking | modality | -Free Human Identification as Zero-Shot Learning |
Mind the Gap: Learning | modality | -Agnostic Representations With a Cross-Modality UNet |
Mind the Gap: Learning | modality | -Agnostic Representations With a Cross-Modality UNet |
Missing | modality | Robustness in Semi-Supervised Multi-Modal Semantic Segmentation |
Missing | modality | Transfer Learning via Latent Low-Rank Constraint |
MixSpeech: Cross- | modality | Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition |
MMANet: Margin-Aware Distillation and | modality | -Aware Regularization for Incomplete Multimodal Learning |
| modality | adversarial neural network for visible-thermal person re-identification |
| modality | and Component Aware Feature Fusion for RGB-D Scene Classification |
| modality | and Event Adversarial Networks for Multi-Modal Fake News Detection |
| modality | Classification and Concept Detection in Medical Images Using Deep Transfer Learning |
| modality | Combination Techniques for Continuous Sign Language Recognition |
| modality | Compensation Network: Cross-Modal Adaptation for Action Recognition |
| modality | conversion for QoS management in universal multimedia access |
| modality | Converting Approach for Image Annotation to Overcome the Inconsistent Labels in Training Data, A |
| modality | Disentangled Discriminator for Text-to-Image Synthesis |
| modality | Distillation with Multiple Stream Networks for Action Recognition |
| modality | Exploration, Retrieval and Adaptation for Trajectory Prediction |
| modality | fusion for object tracking with training system and method |
| modality | Independent Elastography (MIE): A New Approach to Elasticity Imaging |
| modality | Meets Long-Term Tracker: A Siamese Dual Fusion Framework for Tracking UAV |
| modality | Mixer for Multi-modal Action Recognition |
| modality | Mixture Projections for Semantic Video Event Detection |
| modality | Shifting Attention Network for Multi-Modal Video Question Answering |
| modality | Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-Identification |
| modality | Unifying Network for Visible-Infrared Person Re-Identification |
| modality | -Agnostic Debiasing for Single Domain Generalization |
| modality | -Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection |
| modality | -Aware OOD Suppression Using Feature Discrepancy for Multi-Modal Emotion Recognition |
| modality | -Aware Representation Learning for Zero-shot Sketch-based Image Retrieval |
| modality | -correlation-aware sparse representation for RGB-infrared object tracking |
| modality | -Free Feature Detector and Descriptor for Multimodal Remote Sensing Image Registration |
| modality | -Guided Subnetwork for Salient Object Detection |
| modality | -Independent Regression and Training for Improving Multispectral Pedestrian Detection |
| modality | -Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection |
| modality | -Invariant Image Classification Based on Modality Uniqueness and Dictionary Learning |
| modality | -Invariant Image Classification Based on Modality Uniqueness and Dictionary Learning |
| modality | -invariant Visual Odometry for Embodied Vision |
| modality | -Oriented Graph Learning Toward Outfit Compatibility Modeling |
| modality | -specific and shared generative adversarial network for cross-modal retrieval |
| modality | -Specific Cross-Modal Similarity Measurement With Recurrent Attention Network |
| modality | -Specific Information Disentanglement From Multi-Parametric MRI for Breast Tumor Segmentation and Computer-Aided Diagnosis |
| modality | -wise relational reasoning for one-shot sensor-based activity recognition |
Modeling Both Intra- and Inter- | modality | Uncertainty for Multimodal Fake News Detection |
Modselect: Automatic | modality | Selection for Synthetic-to-real Domain Generalization |
Motion Based X-Ray Imaging | modality | |
MSMFN: An Ultrasound Based Multi-Step | modality | Fusion Network for Identifying the Histologic Subtypes of Metastatic Cervical Lymphadenopathy |
Multi-interactive Feature Learning and a Full-time Multi- | modality | Benchmark for Image Fusion and Segmentation |
Multi-Label classification of multi- | modality | skin lesion via hyper-connected convolutional neural network |
Multi-Modal Learning with Missing | modality | via Shared-Specific Feature Modelling |
multi-modal multi-view dataset for human fall analysis and preliminary investigation on | modality | , A |
Multi-Modal Retinal Image Classification with | modality | -Specific Attention Network |
Multi- | modality | American Sign Language recognition |
Multi- | modality | and Multi-Scale Attention Fusion Network for Land Cover Classification from VHR Remote Sensing Images |
Multi- | modality | Associative Bridging through Memory: Speech Sound Recollected from Face Video |
Multi- | modality | Atherosclerosis Imaging and Diagnosis |
Multi- | modality | Cross Attention Network for Image and Sentence Matching |
Multi- | modality | Deep Restoration of Extremely Compressed Face Videos |
Multi- | modality | Diversity Fusion Network with Swintransformer for RGB-D Salient Object Detection |
Multi- | modality | Empowered Network for Facial Action Unit Detection |
Multi- | modality | Feature Transform: An Interactive Image Segmentation Approach |
Multi- | modality | Fusion and Gated Multi-Filter U-Net for Water Area Segmentation in Remote Sensing, A |
Multi- | modality | fusion learning for the automatic diagnosis of optic neuropathy |
Multi- | modality | Gesture Detection and Recognition with Un-supervision, Randomization and Discrimination |
Multi- | modality | Image Fusion Using the Nonsubsampled Contourlet Transform |
Multi- | modality | Image Registration Using Mutual Information Based on Gradient Vector Flow |
Multi- | modality | imagery database for plant phenotyping |
Multi- | modality | Imaging Enables Detailed Hemodynamic Simulations in Dissecting Aneurysms in Mice |
Multi- | modality | Latent Interaction Network for Visual Question Answering |
Multi- | modality | medical image fusion based on separable dictionary learning and Gabor filtering |
Multi- | modality | model-based registration in the cardiac domain |
Multi- | modality | movie scene detection using Kernel Canonical Correlation Analysis |
Multi- | modality | Multi-Task Recurrent Neural Network for Online Action Detection |
Multi- | modality | Network with Visual and Geometrical Information for Micro Emotion Recognition |
Multi- | modality | Self-Distillation for Weakly Supervised Temporal Action Localization |
Multi- | modality | Sensing and Data Fusion for Multi-Vehicle Detection |
Multi- | modality | Stereo with Varying Spatial, Temporal, and Spectral Resolution |
Multi- | modality | Transfer Based on Multi-Graph Optimization for Domain Adaptive Video Concept Annotation |
Multi- | modality | Vertebra Recognition in Arbitrary Views Using 3D Deformable Hierarchical Model |
Multi- | modality | -based Arabic sign language recognition |
Multi-Scale Transformer Network With Edge-Aware Pre-Training for Cross- | modality | MR Image Synthesis |
Multi-Visual- | modality | Human Activity Understanding |
Multicenter and Multichannel Pooling GCN for Early AD Diagnosis Based on Dual- | modality | Fused Brain Network |
Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra- | modality | Interactions |
Multimodal biomedical image retrieval using hierarchical classification and | modality | fusion |
Multimodal Boosting: Addressing Noisy Modalities and Identifying | modality | Contribution |
Multimodal Emotion Recognition with | modality | -Pairwise Unsupervised Contrastive Loss |
Multimodal MR Synthesis via | modality | -Invariant Latent Representation |
Multimodal Reconstruct and Align Net for Missing | modality | Problem in Sentiment Analysis |
Multiple kernel learning based | modality | classification for medical images |
Mutual-Assistance Learning for Standalone Mono- | modality | Survival Analysis of Human Cancers |
New | modality | : Emoji Challenges in Prediction, Anticipation, and Retrieval |
novel and efficient deep learning approach for COVID-19 detection using X-ray imaging | modality | , A |
Online Multi-Face Tracking With Multi- | modality | Cascaded Matching |
Orthogonal | modality | Disentanglement and Representation Alignment Network for NIR-VIS Face Recognition |
Partial Unbalanced Feature Transport for Cross- | modality | Cardiac Image Segmentation |
POLO: Learning Explicit Cross- | modality | Fusion for Temporal Action Localization |
Pose aligned | modality | -invariant feature learning for NIR-VIS heterogeneous face recognition |
Preserving | modality | Structure Improves Multi-Modal Learning |
Prior-Aware Cross | modality | Augmentation Learning for Continuous Sign Language Recognition |
Privileged | modality | Distillation for Vessel Border Detection in Intracoronary Imaging |
Privileged | modality | Learning via Multimodal Hallucination |
Progressive | modality | Cooperation for Multi-Modality Domain Adaptation |
Progressive | modality | Cooperation for Multi-Modality Domain Adaptation |
Progressive | modality | Reinforcement for Human Multimodal Emotion Recognition from Unaligned Multimodal Sequences |
Progressive | modality | -complement aggregative multitransformer for domain multi-modal neural machine translation |
PSIGAN: Joint Probabilistic Segmentation and Image Distribution Matching for Unpaired Cross- | modality | Adaptation-Based MRI Segmentation |
Psumnet: Unified | modality | Part Streams Are All You Need for Efficient Pose-based Action Recognition |
Quasi-Conformal Hybrid Multi- | modality | Image Registration and its Application to Medical Image Fusion |
Real-Time Cross- | modality | Correlation Filtering Method for Referring Expression Comprehension, A |
Real-time patch-based medical image | modality | propagation by GPU computing |
ReCoNet: Recurrent Correction Network for Fast and Efficient Multi- | modality | Image Fusion |
Recovery of audio-to-video synchronization through analysis of cross- | modality | correlation |
Registration and Enhancement of Double-Sided Degraded Manuscripts Acquired in Multispectral | modality | |
Relation-Aware Shared Representation Learning for Cancer Prognosis Analysis with Auxiliary Clinical Variables and Incomplete Multi- | modality | Data |
RELIEF-based | modality | weighting approach for multimodal information retrieval, A |
Representation Learning for Cross- | modality | Classification |
Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed- | modality | |
Rethinking Semantic Image Compression: Scalable Representation with Cross- | modality | Transfer |
Rethinking Shared Features and Re-ranking for Cross- | modality | Person Re-identification |
Revisiting | modality | Imbalance In Multimodal Pedestrian Detection |
Revisiting | modality | -Specific Feature Compensation for Visible-Infrared Person Re-Identification |
RGB-D Salient Object Detection with Cross- | modality | Modulation and Selection |
RGB-Infrared Cross- | modality | Person Re-identification |
RGB-Infrared Cross- | modality | Person Re-Identification via Joint Pixel and Feature Alignment |
RGB-Infrared Person Re-identification via Image | modality | Conversion |
RGB-Infrared Person Re-Identification Via Multi- | modality | Relation Aggregation and Graph Convolution Network |
RGB-IR cross- | modality | person ReID based on teacher-student GAN model |
RGB-IR Person Re-identification by Cross- | modality | Similarity Preservation |
RGB-T tracking by | modality | difference reduction and feature re-selection |
Robust AMD Stage Grading with Exclusively OCTA | modality | Leveraging 3D Volume |
Robust LiDAR-Camera Alignment With | modality | Adapted Local-to-Global Representation |
Robust Multi- | modality | Multi-Object Tracking |
Sample-Adaptive GANs: Linking Global and Local Mappings for Cross- | modality | MR Image Synthesis |
SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross- | modality | Medical Image Segmentation |
See Finer, See More: Implicit | modality | Alignment for Text-based Person Retrieval |
Segmentation of multi- | modality | MR images by means of evidence theory for 3d reconstruction of brain tumors |
Self-Attentive Spatial Adaptive Normalization for Cross- | modality | Domain Adaptation |
Self-supervised Disentanglement of | modality | -Specific and Shared Factors Improves Multimodal Generative Models |
Self-supervised Feature Learning by Cross- | modality | and Cross-view Correspondences |
Semantic Event Fusion of Different Visual | modality | Concepts for Activity Recognition |
Semi-supervised cross-modal hashing via | modality | -specific and cross-modal graph convolutional networks |
Semi-Supervised Cross- | modality | Action Recognition by Latent Tensor Transfer Learning |
Sequential Discrete Hashing for Scalable Cross- | modality | Similarity Retrieval |
Shape-Based Statistical Inversion Method for EIT/URT Dual- | modality | Imaging, A |
Shared-boundary fusion for estimation of noisy multi- | modality | atherosclerotic plaque imagery |
Simple and effective visual question answering in a single | modality | |
Simple and Robust Framework for Cross- | modality | Medical Image Segmentation applied to Vision Transformers, A |
Simple Multi- | modality | Transfer Learning Baseline for Sign Language Translation, A |
Simultaneous Super-Resolution and Cross- | modality | Synthesis of 3D Medical Images Using Weakly-Supervised Joint Convolutional Sparse Coding |
Simultaneous Ultrasound and MRI System for Breast Biopsy: Compatibility Assessment and Demonstration in a Dual | modality | Phantom |
Single Pair Cross- | modality | Super Resolution |
Sparse low-rank fusion based deep features for missing | modality | face recognition |
Structure-Adaptive Feature Extraction and Representation for Multi- | modality | Lung Images Retrieval |
Structure-Aware Framework of Unsupervised Cross- | modality | Domain Adaptation via Frequency and Spatial Knowledge Distillation, A |
Structure-Driven Unsupervised Domain Adaptation for Cross- | modality | Cardiac Segmentation |
Syncretic | modality | Collaborative Learning for Visible Infrared Person Re-Identification |
SynSeg-Net: Synthetic Segmentation Without Target | modality | Ground Truth |
Target-aware Dual Adversarial Learning and a Multi-scenario Multi- | modality | Benchmark to Fuse Infrared and Visible for Object Detection |
TCGM: An Information-theoretic Framework for Semi-supervised Multi- | modality | Learning |
test-bed for computer-assisted fusion of multi- | modality | medical images, A |
Three Steps to Multimodal Trajectory Prediction: | modality | Clustering, Classification and Synthesis |
Top-Push Constrained | modality | -Adaptive Dictionary Learning for Cross-Modality Person Re-Identification |
Top-Push Constrained | modality | -Adaptive Dictionary Learning for Cross-Modality Person Re-Identification |
Towards | modality | -Agnostic Person Re-identification with Descriptive Query |
Towards Real-Time Multi- | modality | 3-D Medical Image Registration |
Transformer-based Model for Preoperative Early Recurrence Prediction of Hepatocellular Carcinoma with Muti- | modality | Mri, A |
Translation, Association and Augmentation: Learning Cross- | modality | Re-Identification From Single-Modality Annotation |
Translation, Association and Augmentation: Learning Cross- | modality | Re-Identification From Single-Modality Annotation |
Tri-Level | modality | -Information Disentanglement for Visible-Infrared Person Re-Identification |
Triple-attention interaction network for breast tumor classification based on multi- | modality | images |
Triplet interactive attention network for cross- | modality | person re-identification |
TS-NET: Combining | modality | Specific and Common Features for Multimodal Patch Matching |
Two-stage | modality | -graphs regularized manifold ranking for RGB-T tracking |
Two-Stream Video Classification with Cross- | modality | Attention |
Unbiased Multi- | modality | Guidance for Image Inpainting |
Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross- | modality | Regression Forest |
Understanding and Constructing Latent | modality | Structures in Multi-Modal Representation Learning |
UniDistill: A Universal Cross- | modality | Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View |
Unpaired Cross- | modality | Educed Distillation (CMEDL) for Medical Image Segmentation |
Unsupervised Bidirectional Cross- | modality | Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation |
Unsupervised Cross-Modal Hashing With | modality | -Interaction |
Unsupervised Cross- | modality | Adaptation via Dual Structural-Oriented Guidance for 3D Medical Image Segmentation |
Unsupervised Deep Cross- | modality | Spectral Hashing |
Unsupervised Fusion of Misaligned PAT and MRI Images via Mutually Reinforcing Cross- | modality | Image Generation and Registration |
Unsupervised | modality | -Transferable Video Highlight Detection With Representation Activation Sequence Learning |
Using Affect as a Communication | modality | to Improve Human-Robot Communication in Robot-Assisted Search and Rescue Scenarios |
Using | modality | Replacement to Facilitate Communication between Visually and Hearing-Impaired People |
Variational Bayes Inference Based Segmentation of Heterogeneous Lymphoma Volumes in Dual- | modality | PET-CT Images |
VEFNet: an Event-RGB Cross | modality | Fusion Network for Visual Place Recognition |
Video Event Detection via Multi- | modality | Deep Learning |
Virtual Imaging Platform for Multi- | modality | Medical Image Simulation, A |
Virtual Multi- | modality | Self-Supervised Foreground Matting for Human-Object Interaction |
Visible-Infrared Person Re-Identification via Cross- | modality | Interaction Transformer |
Visible-Infrared Person Re-Identification With | modality | -Specific Memory Network |
Visual Question Answering With Dense Inter- and Intra- | modality | Interactions |
Watch to Listen Clearly: Visual Speech Enhancement Driven Multi- | modality | Speech Recognition |
What | modality | Matters? Exploiting Highly Relevant Features for Video Advertisement Insertion |
What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and | modality | Attention |
XMP-Font: Self-Supervised Cross- | modality | Pre-training for Few-Shot Font Generation |
478 for modality