标签: 大模型 | 小嗷犬

大模型

2024

【论文笔记】Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

【论文笔记】Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion11

大模型论文笔记多模态

2024-12-08

【论文笔记】VisionZip: Longer is Better but Not Necessary in Vision Language Models

【论文笔记】VisionZip: Longer is Better but Not Necessary in Vision Language Models12

大模型论文笔记多模态

2024-12-08

【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation

【论文笔记】Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation13

大模型论文笔记手语翻译多模态

2024-12-01

【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

【论文笔记】Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions14

大模型论文笔记多模态

2024-11-30

【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga

【论文笔记】Number it: Temporal Grounding Videos like Flipping Manga15

大模型论文笔记多模态

2024-11-24

【论文笔记】Improved Baselines with Visual Instruction Tuning

【论文笔记】Improved Baselines with Visual Instruction Tuning16

大模型论文笔记多模态

2024-11-24

【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

【论文笔记】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection17

大模型论文笔记多模态

2024-11-24

【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step

【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step18

大模型论文笔记多模态

2024-11-23

【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

【论文笔记】BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices19

大模型论文笔记多模态

2024-11-23

【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density

【论文笔记】Improving Gloss-free Sign Language Translation by Reducing Representation Density20

大模型论文笔记手语翻译多模态

2024-11-18