|
2024 Conference Publication Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wildWei, Tianqi, Chen, Zhi and Yu, Xin (2024). Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wild. MMASIA ’24, Auckland, New Zealand, 3-6 December 2024. New York, United States: ACM. doi: 10.1145/3696409.3700293 |
|
2024 Journal Article M3 A: A multimodal misinformation dataset for media authenticity analysisXu, Qingzheng, Chen, Huiqiang, Du, Heming, Zhang, Hu, Łukasik, Szymon, Zhu, Tianqing and Yu, Xin (2024). M3 A: A multimodal misinformation dataset for media authenticity analysis. Computer Vision and Image Understanding, 249 104205. doi: 10.1016/j.cviu.2024.104205 |
|
2024 Book Chapter OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object DetectionZhang, Hu, Xu, Jianhua, Tang, Tao, Sun, Haiyang, Yu, Xin, Huang, Zi and Yu, Kaicheng (2024). OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection. Lecture Notes in Computer Science. (pp. 1-19) Cham: Springer Nature Switzerland. doi: 10.1007/978-3-031-72907-2_1 |
|
2024 Conference Publication Benchmarking in-the-wild multimodal disease recognition and a versatile baselineWei, Tianqi, Chen, Zhi, Huang, Zi and Yu, Xin (2024). Benchmarking in-the-wild multimodal disease recognition and a versatile baseline. MM '24: The 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October-1 November 2024. New York, United States: Association for Computing Machinery. doi: 10.1145/3664647.3680599 |
|
2024 Journal Article Ethics-aware face recognition aided by synthetic face imagesDu, Xiaobiao, Yu, Xin, Liu, Jinhui, Dai, Beifen and Xu, Feng (2024). Ethics-aware face recognition aided by synthetic face images. Neurocomputing, 600 128129, 128129. doi: 10.1016/j.neucom.2024.128129 |
|
2024 Conference Publication Machine Unlearning via Null Space CalibrationChen, Huiqiang, Zhu, Tianqing, Yu, Xin and Zhou, Wanlei (2024). Machine Unlearning via Null Space Calibration. 33rd International Joint Conference on Artificial Intelligence (IJCAI), Jeju, South Korea, 3-9 August 2024. California: International Joint Conferences on Artificial Intelligence Organization. doi: 10.24963/ijcai.2024/40 |
|
2024 Conference Publication Language-guided multi-modal emotional mimicry intensity estimationQiu, Feng, Zhang, Wei, Liu, Chen, Li, Lincheng, Du, Heming, Guo, Tianchen and Yu, Xin (2024). Language-guided multi-modal emotional mimicry intensity estimation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, United States, 17-18 June 2024. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvprw63382.2024.00477 |
|
2024 Conference Publication Learning transferable compound expressions from Masked AutoEncoder pretrainingQiu, Feng, Du, Heming, Zhang, Wei, Liu, Chen, Li, Lincheng, Guo, Tianchen and Yu, Xin (2024). Learning transferable compound expressions from Masked AutoEncoder pretraining. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, United States, 17-18 June 2024. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvprw63382.2024.00476 |
|
2024 Conference Publication An effective ensemble learning framework for affective behaviour analysisZhang, Wei, Qiu, Feng, Liu, Chen, Li, Lincheng, Du, Heming, Guo, Tianchen and Yu, Xin (2024). An effective ensemble learning framework for affective behaviour analysis. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, United States, 17-18 June 2024. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvprw63382.2024.00479 |
|
2024 Journal Article Proactive image manipulation detection via deep semi-fragile watermarkZhao, Yuan, Liu, Bo, Zhu, Tianqing, Ding, Ming, Yu, Xin and Zhou, Wanlei (2024). Proactive image manipulation detection via deep semi-fragile watermark. Neurocomputing, 585 127593. doi: 10.1016/j.neucom.2024.127593 |
|
2024 Conference Publication DiPEx: Dispersing Prompt Expansion for class-agnostic object detectionLim, Jia Syuen, Chen, Zhuoxiao, Baktashmotlagh, Mahsa, Chen, Zhi, Yu, Xin, Huang, Zi and Luo, Yadan (2024). DiPEx: Dispersing Prompt Expansion for class-agnostic object detection. 38th International Conference on Neural Information Processing Systems, Vancouver, BC Canada, 10-15 December 2024. New York, NY USA: Association for Computing Machinery. |
|
2024 Journal Article BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledgeLiu, Chen, Li, Peike, Zhang, Hu, Li, Lincheng, Huang, Zi, Wang, Dadong and Yu, Xin (2024). BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledge. IEEE Transactions on Multimedia, 26, 10015-10028. doi: 10.1109/tmm.2024.3405622 |
|
2024 Journal Article AI empowered Auslan learning for parents of deaf children and children of deaf adultsSheng, Hongwei, Shen, Xin, Du, Heming, Zhang, Hu, Huang, Zi and Yu, Xin (2024). AI empowered Auslan learning for parents of deaf children and children of deaf adults. AI and Ethics, 4 (4), 1-11. doi: 10.1007/s43681-024-00457-y |
|
2024 Journal Article Detecting facial action units from global-local fine-grained expressionsZhang, Wei, Li, Lincheng, Ding, Yu, Chen, Wei, Deng, Zhigang and Yu, Xin (2024). Detecting facial action units from global-local fine-grained expressions. IEEE Transactions on Circuits and Systems for Video Technology, 34 (2), 983-994. doi: 10.1109/tcsvt.2023.3288903 |
|
2024 Conference Publication When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervisionYu, Qingtao, Du, Heming, Liu, Chen and Yu, Xin (2024). When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, United States, 3-8 January 2024. Piscataway, NJ, United States: IEEE. doi: 10.1109/wacv57701.2024.00368 |
|
2024 Conference Publication Benchmarking audio visual segmentation for long-untrimmed videosLiu, Chen, Li, Peike Patrick, Yu, Qingtao, Sheng, Hongwei, Wang, Dadong, Li, Lincheng and Yu, Xin (2024). Benchmarking audio visual segmentation for long-untrimmed videos. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.02143 |
|
2024 Conference Publication Text-guided 3D face synthesis - from generation to editingWu, Yunjie, Meng, Yapeng, Hu, Zhipeng, Li, Lincheng, Wu, Haoqian, Zhou, Kun, Xu, Weiwei and Yu, Xin (2024). Text-guided 3D face synthesis - from generation to editing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.00126 |
|
2024 Conference Publication MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition datasetShen, Xin, Du, Heming, Sheng, Hongwei, Wang, Shuyun, Chen, Hui, Chen, Huiqiang, Wu, Zhuojie, Du, Xiaobiao, Ying, Jiaying, Lu, Ruihan, Xu, Qingzheng and Yu, Xin (2024). MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition dataset. NeurIPS 2024, Vancouver, BC, Canada, 10 - 15 December 2024. Maryland Heights, MO, United States: Morgan Kaufmann Publishers. |
|
2024 Journal Article StyleTalk++: A unified framework for controlling the speaking styles of talking headsWang, Suzhen, Ma, Yifeng, Ding, Yu, Hu, Zhipeng, Fan, Changjie, Lv, Tangjie, Deng, Zhidong and Yu, Xin (2024). StyleTalk++: A unified framework for controlling the speaking styles of talking heads. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 (6), 4331-4347. doi: 10.1109/tpami.2024.3357808 |
|
2024 Journal Article CMGNet: Collaborative multi-modal graph network for video captioningRao, Qi, Yu, Xin, Li, Guang and Zhu, Linchao (2024). CMGNet: Collaborative multi-modal graph network for video captioning. Computer Vision and Image Understanding, 238 103864, 1-10. doi: 10.1016/j.cviu.2023.103864 |