加载头像
大模型
2024
【论文笔记】VCoder: Versatile Vision Encoders for Multimodal Large Language Models
【论文笔记】VCoder: Versatile Vision Encoders for Multimodal Large Language Models11
【论文笔记】Dense Connector for MLLMs
【论文笔记】Dense Connector for MLLMs12
【论文笔记】Attention Prompting on Image for Large  Vision-Language Models
【论文笔记】Attention Prompting on Image for Large Vision-Language Models13
【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
【论文笔记】xGen-MM (BLIP-3): A Family of Open Large Multimodal Models14
【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
【论文笔记】xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs15
【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
【论文笔记】X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs16
【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
【论文笔记】MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding17
【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation
【论文笔记】Sign2GPT Leveraging Large Language Models for Gloss-Free Sign Language Translation18
【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation
【论文笔记】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation19
【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
【论文笔记】VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs20
引用到评论
随便逛逛博客分类文章标签
复制地址关闭热评深色模式轉為繁體