标签: 多模态 | 小嗷犬

多模态

2024

【论文笔记】LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

【论文笔记】LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment11

论文笔记多模态

2024-12-08

【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation12

大模型论文笔记手语翻译多模态

2024-12-01

【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions13

大模型论文笔记多模态

2024-11-30

【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga

【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga14

大模型论文笔记多模态

2024-11-24

【论文笔记】Improved Baselines with Visual Instruction Tuning

【论文笔记】Improved Baselines with Visual Instruction Tuning15

大模型论文笔记多模态

2024-11-24

【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection16

大模型论文笔记多模态

2024-11-24

【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step

【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step17

大模型论文笔记多模态

2024-11-23

【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices18

大模型论文笔记多模态

2024-11-23

【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density

【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density19

大模型论文笔记手语翻译多模态

2024-11-18

【论文笔记】LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

【论文笔记】LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models20

大模型论文笔记多模态

2024-11-17