Biography
I am an Associate Professor in School of Computer Science at Fudan University, and a member of the Fudan Vision and Learning Laboratory. I recieved my Ph.D. in Computer Science from the University of Maryland with Prof. Larry Davis in 2020. My research interests are in computer vision and deep learning. My current research particularly focuses on large-scale video understanding, video generation, and efficient architectures.
I'm currently looking for students with strong coding skills who are excited to design algorithms for visual understanding. If you are interested in working with me, please send me an email.
Publication
- SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation.
European Conference on Computer Vision (ECCV), Milano, Italy, Sept., 2024.
- PromptFusion: Decoupling Stability and Plasticity for Continual Learning.
European Conference on Computer Vision (ECCV), Milano, Italy, Sept., 2024.
- MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing.
European Conference on Computer Vision (ECCV), Milano, Italy, Sept., 2024.
- OmniViD: A Generative Framework for Universal Video Understanding.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024.
- MotionEditor: Editing Video Motion via Content-Aware Diffusion.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024.code
- SimDA: Simple Diffusion Adapter for Efficient Video Generation.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024.
- Learning to Rank Patches for Unbiased Image Redundancy Reduction.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024.
- Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024. code
- BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June, 2024.
- Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation.
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, Dec., 2023.
- Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, Dec., 2023. code
- Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
International Conference on Computer Vision (ICCV), Paris, France, Oct., 2023 (Oral) code
- Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
International Conference on Machine Learning (ICML), Hawaii, USA, July, 2023
- ResFormer: Scaling ViTs with Multi-Resolution Training.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- SVFormer: Semi-Supervised Video Transformer for Action Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Prototypical Residual Networks for Anomaly Detection and Localization.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Enhancing the Self-Universality for Transferable Targeted Attacks.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Vision Transformers are Good Mask Auto-Labelers.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Towards Scalable Neural Representation for Diverse Videos.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June, 2023
- Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning.
The AAAI Conference on Artificial Intelligence (AAAI), Washington DC, USA, Feb., 2023
- OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, Dec., 2022.
- Semi-Supervised Vision Transformers.
European Conference on Computer Vision (ECCV), Tel Aviv, October, 2022. code
- Efficient Video Transformers with Spatial-Temporal Token Selection.
European Conference on Computer Vision (ECCV), Tel Aviv, October, 2022. code
- Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors.
European Conference on Computer Vision (ECCV), Tel Aviv, October, 2022. code
- BEVT: BERT Pretraining of Video Transformers.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, June, 2022 code
- Cross-Modal Transferable Adversarial Attacks from Images to Videos.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, June, 2022
- AdaViT: Adaptive Vision Transformers for Efficient Image Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, June, 2022
- ObjectFormer for Image Manipulation Detection and Localization.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, June, 2022
- Flag: Adversarial data augmentation for graph neural networks.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, June, 2022
- Boosting the Transferability of Video Adversarial Examples via Temporal Translation.
The AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb., 2022
- Attacking Video Recognition Models with Bullet-Screen Comments.
The AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb., 2022
- Towards Transferable Adversarial Attacks on Vision Transformers.
The AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb., 2022
- Rethinking Pseudo Labels for Semi-Supervised Object Detection.
The AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb., 2022
- Encoding Robustness to Image Style via Adversarial Feature Perturbations.
Advances in Neural Information Processing Systems (NeurIPS), Virtual, Dec., 2021.
- Deep Video Inpainting Detection.
British Machine Vision Conference (BMVC), Virtual, Oct., 2021
- GTA: Global Temporal Attention for Video Action Understanding.
British Machine Vision Conference (BMVC), Virtual, Oct., 2021
- VideoLT: Large-scale Long-tailed Video Recognition.
International Conference on Computer Vision (ICCV), Virtual, Oct., 2021
- Exploring Visual Engagement Signals for Representation Learning.
International Conference on Computer Vision (ICCV), Virtual, Oct., 2021
- 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June, 2021
- Intentonomy: a Dataset and Study towards Human Intent Understanding.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June, 2021 (Oral) code
- Efficient Object Embedding for Manipulated Image Retrieval.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June, 2021
- Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors.
European Conference on Computer Vision (ECCV), Virtual, August, 2020. code
- Learning from Noisy Anchors for One-stage Object Detection.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June, 2020
- LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition.
Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, Dec., 2019. code
- FiNet: Compatible and Diverse Fashion Image Inpainting.
International Conference on Computer Vision (ICCV), Seoul, Korea, Oct., 2019. (Oral)
- ACE: Adapting to Changing Environments for Semantic Segmentation.
International Conference on Computer Vision (ICCV), Seoul, Korea, Oct., 2019
- AdaFrame: Adaptive Frame Selection for Fast Video Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June, 2019
- The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June, 2019
- Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning.
ACM Trans. Multimedia Comput. Commun (ACM TOMM), vol. 15, issue 1, pp. 6:1-6:22, 2019.
- Self-Monitoring Navigation Agent via Auxiliary Progress Estimation.
International Conference on Learning Representations (ICLR), New Orleans, USA, May, 2019
- DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation.
European Conference on Computer Vision (ECCV), Munich, Germany, September, 2018. code
- BlockDrop: Dynamic Inference Paths in Residual Networks.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June, 2018. (Spotlight) code
- VITON: An Image-based Virtual Try-on Network.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June, 2018. (Spotlight) code
- Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 40, Issue 2, pp. 352-364, 2018.
Fudan-Columbia Video Dataset (FCVID), one of the largest public Web video datasets with manual annotations. - Deep Learning for Video Classification and Video Captioning.
In Frontiers of Multimedia Research, Shih-Fu Chang (Ed.), ACM Morgan & Claypool, New York, NY, USA, pp. 3-29, 2018
Surveying 100+ recent literatures on video classification and captioning with deep learning. - Weakly-Supervised Spatial Context Networks.
arXiv preprint arXiv:1704.02998
- Automatic Spatially-aware Fashion Concept Discovery.
International Conference on Computer Vision (ICCV), Venice, Italy, Oct., 2017
- Learning Fashion Compatibility with Bidirectional LSTMs.
ACM Multimedia (ACM MM), Mountain View, USA, Oct., 2017
- Learning Semantic Feature Map for Visual Content Recognition.
ACM Multimedia (ACM MM), Mountain View, USA, Oct., 2017
- Harnessing Object and Scene Semantics for Large-Scale Video Understanding.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June, 2016. (Spotlight)
Featured in Tech2, ACM Technews - Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.
ACM Multimedia (ACM MM), Amsterdam, the Netherlands, Oct., 2016. (Oral Paper)
- Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification.
ACM Multimedia (ACM MM), Brisbane, Australia, Oct., 2015. (Oral Paper)
Obtain 91.3% accuracy on the UCF-101 dataset. - Evaluating Two-Stream CNN for Video Classification.
ACM International Conference on Multimedia Retrieval (ICMR), Shanghai, China, June, 2015 motion CNN model
- Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification.
ACM Multimedia (ACM MM), Orlando, USA, Nov., 2014. (Oral Paper)
Professional Service
- Area Chair:
- Advances in Neural Information Processing Systems 2023-2024
- IEEE Conference on Computer Vision and Pattern Recognition 2023-2024
- Senior Program Committee:
- AAAI Conference on Artificial Intelligenc 2023-2024
- International Joint Conference on Artificial Intelligence 2023