Skip to menu Skip to content Skip to footer

2025

Conference Publication

EasyCraft: a robust and efficient framework for automatic avatar crafting

Wang, Suzhen, Chen, Weijie, Zhang, Wei, Zhao, Minda, Li, Lincheng, Zhang, Rongsheng, Hu, Zhipeng and Yu, Xin (2025). EasyCraft: a robust and efficient framework for automatic avatar crafting. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN USA, 10-17 June 2025. New York, NY USA: IEEE Computer Society. doi: 10.1109/CVPR52734.2025.00524

EasyCraft: a robust and efficient framework for automatic avatar crafting

2025

Conference Publication

Robust audio-visual segmentation via audio-guided visual convergent alignment

Liu, Chen, Li, Peike, Yang, Liying, Wang, Dadong, Li, Lincheng and Yu, Xin (2025). Robust audio-visual segmentation via audio-guided visual convergent alignment. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN USA, 10-17 June 2025. Piscataway, NJ USA: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvpr52734.2025.02693

Robust audio-visual segmentation via audio-guided visual convergent alignment

2025

Conference Publication

Blind bitstream-corrupted video recovery via metadata-guided diffusion model

Wang, Shuyun, Zhang, Hu, Shen, Xin, Wang, Dadong and Yu, Xin (2025). Blind bitstream-corrupted video recovery via metadata-guided diffusion model. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN USA, 10-17 June 2025. New York, NY USA: IEEE Computer Society. doi: 10.1109/CVPR52734.2025.02139

Blind bitstream-corrupted video recovery via metadata-guided diffusion model

2025

Conference Publication

Dynamic derivation and elimination: audio visual segmentation with enhanced audio semantics

Liu, Chen, Yang, Liying, Li, Peike, Wang, Dadong, Li, Lincheng and Yu, Xin (2025). Dynamic derivation and elimination: audio visual segmentation with enhanced audio semantics. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN USA, 10-17 June 2025. Piscataway, NJ USA: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvpr52734.2025.00298

Dynamic derivation and elimination: audio visual segmentation with enhanced audio semantics

2025

Conference Publication

M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings

Xu, Qingzheng, Cao, Ru, Shen, Xin, Du, Heming, Wang, Sen and Yu, Xin (2025). M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, United States, 10 - 17 June 2025. Washington, DC, United States: I E E E Computer Society. doi: 10.1109/cvpr52734.2025.01147

M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings

2025

Conference Publication

Cross-view isolated sign language recognition challenge: design, results and future research

Shen, Xin, Du, Heming, Xu, Miao, Liu, Miaomiao and Yu, Xin (2025). Cross-view isolated sign language recognition challenge: design, results and future research. WWW '25: The ACM Web Conference 2025, Sydney, NSW Australia, 28 April-2 May 2025. New York, NY USA: Association for Computing Machinery. doi: 10.1145/3701716.3717522

Cross-view isolated sign language recognition challenge: design, results and future research

2025

Conference Publication

MDAM 3: a misinformation detection and analysis framework for multitype multimodal media

Xu, Qingzheng, Du, Heming, Łukasik, Szymon, Zhu, Tianqing, Wang, Sen and Yu, Xin (2025). MDAM 3: a misinformation detection and analysis framework for multitype multimodal media. WWW '25: The ACM Web Conference 2025, Sydney, NSW Australia, 28 April-2 May 2025. New York, NY USA: Association for Computing Machinery. doi: 10.1145/3696410.3714498

MDAM 3: a misinformation detection and analysis framework for multitype multimodal media

2025

Conference Publication

FlashVTG: feature layering and adaptive score handling network for video temporal grounding

Cao, Zhuo, Zhang, Bingqing, Du, Heming, Yu, Xin, Li, Xue and Wang, Sen (2025). FlashVTG: feature layering and adaptive score handling network for video temporal grounding. 2025 Winter Conference on Applications of Computer Vision-WACV, Tucson, AZ, United States, 28 February-4 March 2025. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/wacv61041.2025.00894

FlashVTG: feature layering and adaptive score handling network for video temporal grounding

2025

Conference Publication

Vision-based abnormal action dataset for recognising body motion disorders

Ying, Jiaying, Shen, Xin and Yu, Xin (2025). Vision-based abnormal action dataset for recognising body motion disorders. 37th Australasian Joint Conference on Artificial Intelligence, AI 2024, Melbourne, VIC, Australia, 25 - 29 November 2024. Singapore, Singapore: Springer Nature Singapore. doi: 10.1007/978-981-96-0351-0_33

Vision-based abnormal action dataset for recognising body motion disorders

2025

Conference Publication

Compound expression recognition via curriculum learning

Liu, Chen, Qiu, Feng, Zhang, Wei, Li, Lincheng, Wang, Dadong and Yu, Xin (2025). Compound expression recognition via curriculum learning. ECCV 2024 Workshops, Milan, Italy, 29 September - 4 October 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-3-031-91581-9_20

Compound expression recognition via curriculum learning

2025

Conference Publication

Transferable attacks for semantic segmentation

He, Mengqi, Zhang, Jing and Yu, Xin (2025). Transferable attacks for semantic segmentation. 35th Australasian Database Conference, Gold Coast, QLD, Australia, 16-18 December 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-981-96-1242-0_28

Transferable attacks for semantic segmentation

2025

Conference Publication

Affective behaviour analysis via progressive learning

Liu, Chen, Zhang, Wei, Qiu, Feng, Li, Lincheng, Wang, Dadong and Yu, Xin (2025). Affective behaviour analysis via progressive learning. ECCV 2024 Workshops, Milan, Italy, 29 September - 4 October 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-3-031-91581-9_26

Affective behaviour analysis via progressive learning

2024

Conference Publication

Machine Unlearning via Null Space Calibration

Chen, Huiqiang, Zhu, Tianqing, Yu, Xin and Zhou, Wanlei (2024). Machine Unlearning via Null Space Calibration. 33rd International Joint Conference on Artificial Intelligence (IJCAI), Jeju, South Korea, 3-9 August 2024. California: International Joint Conferences on Artificial Intelligence Organization. doi: 10.24963/ijcai.2024/40

Machine Unlearning via Null Space Calibration

2024

Conference Publication

DiPEx: Dispersing Prompt Expansion for class-agnostic object detection

Lim, Jia Syuen, Chen, Zhuoxiao, Baktashmotlagh, Mahsa, Chen, Zhi, Yu, Xin, Huang, Zi and Luo, Yadan (2024). DiPEx: Dispersing Prompt Expansion for class-agnostic object detection. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10-15 December 2024. San Mateo, CA, United States: Morgan Kaufmann Publishers. doi: 10.52202/079017-0781

DiPEx: Dispersing Prompt Expansion for class-agnostic object detection

2024

Conference Publication

An empirical analysis on spatial reasoning capabilities of large multimodal models

Shiri, Fatemeh, Guo, Xiao-Yu, Far, Mona Golestan, Yu, Xin, Haffari, Gholamreza and Li, Yuan-Fang (2024). An empirical analysis on spatial reasoning capabilities of large multimodal models. 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, United States, 12-16 November 2024. Kerrville, TX, United States: Association for Computational Linguistics (ACL). doi: 10.18653/v1/2024.emnlp-main.1195

An empirical analysis on spatial reasoning capabilities of large multimodal models