加载头像
多模态
2024
【论文笔记】LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
【论文笔记】LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment11
【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation12
【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions13
【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga
【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga14
【论文笔记】Improved Baselines with Visual Instruction Tuning
【论文笔记】Improved Baselines with Visual Instruction Tuning15
【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection16
【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step
【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step17
【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices18
【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density
【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density19
【论文笔记】LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
【论文笔记】LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models20
引用到评论
随便逛逛博客分类文章标签
复制地址关闭热评深色模式轉為繁體