_ | vil | _ |
---|---|---|
e- | vil | : A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks |
FAME- | vil | : Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks |
vil | -100: A New Dataset and A Baseline Model for Video Instance Lane Detection |