标签: 大模型 | 小嗷犬

大模型

2024

【论文笔记】Dense Connector for MLLMs

【论文笔记】Dense Connector for MLLMs31

大模型论文笔记多模态

2024-11-03

【论文笔记】Attention Prompting on Image for Large Vision-Language Models

【论文笔记】Attention Prompting on Image for Large Vision-Language Models32

大模型论文笔记多模态

2024-11-02

【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models33

大模型论文笔记多模态

2024-10-27

【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs34

大模型论文笔记多模态

2024-10-24

【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs35

大模型论文笔记多模态

2024-10-20

【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding36

大模型论文笔记多模态

2024-10-17

【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation

【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation37

大模型论文笔记手语翻译多模态

2024-10-17

【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation38

大模型论文笔记手语翻译多模态

2024-10-11

【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs39

大模型论文笔记多模态

2024-10-08

【论文笔记】Flamingo: a Visual Language Model for Few-Shot Learning

【论文笔记】Flamingo: a Visual Language Model for Few-Shot Learning40

大模型论文笔记多模态

2024-09-30