CV计算机视觉每日开源代码Paper with code速览
2026/5/7 17:14:53 网站建设 项目流程

墙裂推荐:想获取更多前沿论文算法优化idea冲击顶会或发表专利,包含目标检测目标跟踪图像分割视频分割Visual Grounding可见光红外融合多任务学习多模态基础模型文生图自动驾驶BEV占用预测具身智能VLA深度估计动作识别表情识别三维重建、点云3D检测医学图像分割医学图像目标检测医学大模型缺陷检测异常检测遥感图像分割遥感图像变化检测数字人知识蒸馏、视频理解、3D生成、姿态估计、图像增强、人群/目标计数、视频编辑、图像去雨等众多主题,请参考:https://qcno08je5sgu.feishu.cn/

1.【图像融合】UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation

  • 论文地址:https://arxiv.org//pdf/2603.14214
  • 开源代码:https://github.com/dusongcheng/UniFusion

2.【多模态大模型】UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding

  • 论文地址:https://arxiv.org//pdf/2603.14336
  • 工程主页:SOCIAL MEDIA TITLE TAG
  • 开源代码:https://github.com/ZhanYang-nwpu/UAVBench-and-UAVIT-1M

3.【多模态大模型】Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models

  • 论文地址:https://arxiv.org//pdf/2603.14184
  • 开源代码(即将开源):https://github.com/Ivine11/VRGA

4.【医学大模型】(ICLR2026)How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images

  • 论文地址:https://arxiv.org//pdf/2603.14323
  • 工程主页:How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
  • 开源代码:https://github.com/Guimeng-Leo-Liu/Medical-MLLMs-Fail

5.【行人重识别】(CVPR2026)BIT: Matching-based Bi-directional Interaction Transformation Network for Visible-Infrared Person Re-Identification

  • 论文地址:https://arxiv.org//pdf/2603.14243
  • 开源代码(即将开源):https://github.com/Xuan266/BIT

6.【数字人】AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising

  • 论文地址:https://arxiv.org//pdf/2603.14331
  • 工程主页:https://cuiliyuan121.github.io/AvatarForcing/
  • 开源代码:https://github.com/KlingAIResearch/AvatarForcing/tree/main

7.【视觉语言导航】AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control

  • 论文地址:https://arxiv.org//pdf/2603.14363
  • 开源代码:https://github.com/XuPeng23/AerialVLA

8.【视觉语言导航】(ICLR2026)All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

  • 论文地址:https://arxiv.org//pdf/2603.14276
  • 工程主页:All-Day Multi-Scenes Lifelong Vision-And-Language Navigation With Tucker-Adaption
  • 开源代码:https://github.com/Ganvin-Li/AlldayWalker

9.【文生图】Fair Benchmarking of Emerging One-Step Generative Models Against Multistep Diffusion and Flow Models

  • 论文地址:https://arxiv.org//pdf/2603.14186
  • 开源代码:https://github.com/Harvard-AI-and-Robotics-Lab/FairBenchmarkingFlow

10.【文生视频】Early Failure Detection and Intervention in Video Diffusion Models

  • 论文地址:https://arxiv.org//pdf/2603.14320
  • 开源代码(即将开源):https://github.com/kaist-ami/Early-failure-video-diffusion

11.【文生视频】Seeking Physics in Diffusion Noise

  • 论文地址:https://arxiv.org//pdf/2603.14294
  • 工程主页:Seeking Physics in Diffusion Noise
  • 代码即将开源

12.【图像生成】Representation Alignment for Just Image Transformers is not Easier than You Think

  • 论文地址:https://arxiv.org//pdf/2603.14366
  • 开源代码:https://github.com/kaist-cvml/PixelREPA

群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬,不定期分享技术知识、面试技巧和内推招聘信息

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询