Associate Professor

Xin Yu

Email:: xin.yu@uq.edu.au

Positions

Affiliate of ARC COE for Children and Families Over the Lifecourse: ARC COE for Children and Families Over the Lifecourse; Faculty of Humanities, Arts and Social Sciences

Affiliate of Centre for Enterprise AI: Centre for Enterprise AI; Faculty of Engineering, Architecture and Information Technology

Honorary Associate Professor: School of Electrical Engineering and Computer Science; Faculty of Engineering, Architecture and Information Technology

Background

My name is Xin Yu, an Associate Professor at the University of Queensland. I am an Australian Research Council Discovery Early Career Researcher Award 2023-2025 (DECRA) recipient and an awardee of the prestigious Google Research Scholar Program in 2021. I am also a Google Visiting Faculty. Previously, I was a research fellow at the Australian National University (ANU). I received my PhD degree from the Australian National Unversity under the supervision of Prof. Richard Hartley, Prof. Fatih Porikli and Dr. Basura Fernando. I also received a PhD degree from Tsinghua University supervised by Prof. Li Zhang. I am interested in Computer Vision and Machine Learning topics.

My research topics includes various computer vision and machine learning tasks, especially in efficient low-level image processing, image retrieval and localization, action recognition, 3D pose estimation, visual navigation and sign language recognition and translation.

Availability

Associate Professor Xin Yu is:: Available for supervision

Research impacts

One of my research papers has been awarded "Best Paper Honorable Mention" award in the premium computer vision conference WACV 2020, and one paper has been nominated for the Best Paper Award in CVPR 2020.

I was awarded the Outstanding Reviewer Award in ECCV 2020, CVPR 2021 and ICCV 2021. CVPR, ICCV and ECCV are internationally world-leading computer vision and machine learning conferences. My research interests include deep learning techniques, image processing, and computer vision tasks. I am a program committee member of top-tier computer vision and machine learning conferences, such as CVPR, ICCV, ECCV, ICML, ICLR and NeurIPS, and a reviewer of prestigious journals, such as TPAMI, IJCV and TIP.

I am happy to supervise self-motivated PhD and MPhil students. If you are an undergraduate student and willing to conduct your honour project, please drop me an email.

Search Professor Xin Yu’s works on UQ eSpace

182 works between 2011 and 2026

All (182) Journal Article (61) Conference Publication (119) Book Chapter (2)

2025

Conference Publication

M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings

Xu, Qingzheng, Cao, Ru, Shen, Xin, Du, Heming, Wang, Sen and Yu, Xin (2025). M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, United States, 10 - 17 June 2025. Washington, DC, United States: I E E E Computer Society. doi: 10.1109/cvpr52734.2025.01147

M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings

2025

Conference Publication

Cross-view isolated sign language recognition challenge: design, results and future research

Shen, Xin, Du, Heming, Xu, Miao, Liu, Miaomiao and Yu, Xin (2025). Cross-view isolated sign language recognition challenge: design, results and future research. WWW '25: The ACM Web Conference 2025, Sydney, NSW Australia, 28 April-2 May 2025. New York, NY USA: Association for Computing Machinery. doi: 10.1145/3701716.3717522

Cross-view isolated sign language recognition challenge: design, results and future research

2025

Conference Publication

MDAM 3: a misinformation detection and analysis framework for multitype multimodal media

Xu, Qingzheng, Du, Heming, Łukasik, Szymon, Zhu, Tianqing, Wang, Sen and Yu, Xin (2025). MDAM 3: a misinformation detection and analysis framework for multitype multimodal media. WWW '25: The ACM Web Conference 2025, Sydney, NSW Australia, 28 April-2 May 2025. New York, NY USA: Association for Computing Machinery. doi: 10.1145/3696410.3714498

MDAM 3: a misinformation detection and analysis framework for multitype multimodal media

2025

Journal Article

ICE: interactive 3D game character facial editing via dialogue

Wu, Haoqian, Zhao, Minda, Hu, Zhipeng, Fan, Changjie, Li, Lincheng, Chen, Weijie, Zhao, Rui and Yu, Xin (2025). ICE: interactive 3D game character facial editing via dialogue. IEEE Transactions on Multimedia, 27, 3210-4223. doi: 10.1109/tmm.2025.3557611

ICE: interactive 3D game character facial editing via dialogue

2025

Conference Publication

FlashVTG: feature layering and adaptive score handling network for video temporal grounding

Cao, Zhuo, Zhang, Bingqing, Du, Heming, Yu, Xin, Li, Xue and Wang, Sen (2025). FlashVTG: feature layering and adaptive score handling network for video temporal grounding. 2025 Winter Conference on Applications of Computer Vision-WACV, Tucson, AZ, United States, 28 February-4 March 2025. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/wacv61041.2025.00894

FlashVTG: feature layering and adaptive score handling network for video temporal grounding

2025

Conference Publication

TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm

Zhang, Bingqing, Cao, Zhuo, Du, Heming, Yu, Xin, Li, Xue, Liu, Jiajun and Wang, Sen (2025). TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ United States, 26 February - 6 March 2025. Piscataway, NJ United States: IEEE. doi: 10.1109/wacv61041.2025.00485

TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm

2025

Journal Article

DreamCar: leveraging car-specific prior for in-the-wild 3D car reconstruction

Du, Xiaobiao, Sun, Haiyang, Lu, Ming, Zhu, Tianqing and Yu, Xin (2025). DreamCar: leveraging car-specific prior for in-the-wild 3D car reconstruction. IEEE Robotics and Automation Letters, 10 (2), 1840-1847. doi: 10.1109/lra.2024.3523231

DreamCar: leveraging car-specific prior for in-the-wild 3D car reconstruction

2025

Conference Publication

Vision-based abnormal action dataset for recognising body motion disorders

Ying, Jiaying, Shen, Xin and Yu, Xin (2025). Vision-based abnormal action dataset for recognising body motion disorders. 37th Australasian Joint Conference on Artificial Intelligence, AI 2024, Melbourne, VIC, Australia, 25 - 29 November 2024. Singapore, Singapore: Springer Nature Singapore. doi: 10.1007/978-981-96-0351-0_33

Vision-based abnormal action dataset for recognising body motion disorders

2025

Conference Publication

Compound expression recognition via curriculum learning

Liu, Chen, Qiu, Feng, Zhang, Wei, Li, Lincheng, Wang, Dadong and Yu, Xin (2025). Compound expression recognition via curriculum learning. ECCV 2024 Workshops, Milan, Italy, 29 September - 4 October 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-3-031-91581-9_20

Compound expression recognition via curriculum learning

2025

Conference Publication

Who is Being Impersonated? Deepfake audio detection and impersonated identification via extraction of ID-specific features

Guo, Tianchen, Du, Heming, Huo, Huan, Liu, Bo and Yu, Xin (2025). Who is Being Impersonated? Deepfake audio detection and impersonated identification via extraction of ID-specific features. 24th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2024), Macau, China, 29-31 October 2024. Singapore: Springer. doi: 10.1007/978-981-96-1548-3_21

Who is Being Impersonated? Deepfake audio detection and impersonated identification via extraction of ID-specific features

2025

Conference Publication

QGait: Toward Accurate Quantization for Gait Recognition

Tian, Senmao, Gao, Haoyu, Hong, Gangyi, Wang, Shuyun, Wang, Jingjie, Yu, Xin and Zhang, Shunli (2025). QGait: Toward Accurate Quantization for Gait Recognition. Institute of Electrical and Electronics Engineers Inc.. doi: 10.1109/IJCB65343.2025.11411264

QGait: Toward Accurate Quantization for Gait Recognition

2025

Journal Article

TalkCLIP: talking head generation with text-guided expressive speaking styles

Ma, Yifeng, Wang, Suzhen, Ding, Yu, Ma, Bowen, Lv, Tangjie, Fan, Changjie, Hu, Zhipeng, Deng, Zhidong and Yu, Xin (2025). TalkCLIP: talking head generation with text-guided expressive speaking styles. IEEE Transactions on Multimedia, 27, 6335-6346. doi: 10.1109/tmm.2025.3581808

TalkCLIP: talking head generation with text-guided expressive speaking styles

2025

Conference Publication

Affective behaviour analysis via progressive learning

Liu, Chen, Zhang, Wei, Qiu, Feng, Li, Lincheng, Wang, Dadong and Yu, Xin (2025). Affective behaviour analysis via progressive learning. ECCV 2024 Workshops, Milan, Italy, 29 September - 4 October 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-3-031-91581-9_26

Affective behaviour analysis via progressive learning

2025

Conference Publication

Transferable attacks for semantic segmentation

He, Mengqi, Zhang, Jing and Yu, Xin (2025). Transferable attacks for semantic segmentation. 35th Australasian Database Conference, Gold Coast, QLD, Australia, 16-18 December 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-981-96-1242-0_28

Transferable attacks for semantic segmentation

2024

Conference Publication

CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance

Hu, Zhipeng, Zhang, Yongqiang, Liu, Chen, Li, Lincheng, Peng, Sida, Zhou, Xiaowei, Fan, Changjie and Yu, Xin (2024). CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance. 18th European Conference on Computer Vision, ECCV 2024, Milan, Italy, 29 September –4 October 2024. Cham, Switzerland: Springer. doi: 10.1007/978-3-031-73464-9_14

CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance

2024

Conference Publication

FreeAvatar: robust 3D facial animation transfer by learning an expression foundation model

Qiu, Feng, Zhang, Wei, Liu, Chen, An, Rudong, Li, Lincheng, Ding, Yu, Fan, Changjie, Hu, Zhipeng and Yu, Xin (2024). FreeAvatar: robust 3D facial animation transfer by learning an expression foundation model. SA '24: SIGGRAPH Asia 2024, Tokyo, Japan, 3-6 December 2024. New York, NY, United States: ACM. doi: 10.1145/3680528.3687669

FreeAvatar: robust 3D facial animation transfer by learning an expression foundation model

2024

Conference Publication

Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wild

Wei, Tianqi, Chen, Zhi and Yu, Xin (2024). Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wild. MMASIA ’24, Auckland, New Zealand, 3-6 December 2024. New York, United States: ACM. doi: 10.1145/3696409.3700293

Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wild

2024

Journal Article

M3 A: A multimodal misinformation dataset for media authenticity analysis

Xu, Qingzheng, Chen, Huiqiang, Du, Heming, Zhang, Hu, Łukasik, Szymon, Zhu, Tianqing and Yu, Xin (2024). M3 A: A multimodal misinformation dataset for media authenticity analysis. Computer Vision and Image Understanding, 249 104205, 104205. doi: 10.1016/j.cviu.2024.104205

M3 A: A multimodal misinformation dataset for media authenticity analysis

2024

Book Chapter

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Zhang, Hu, Xu, Jianhua, Tang, Tao, Sun, Haiyang, Yu, Xin, Huang, Zi and Yu, Kaicheng (2024). OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection. Lecture Notes in Computer Science. (pp. 1-19) Cham: Springer Nature Switzerland. doi: 10.1007/978-3-031-72907-2_1

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

2024

Conference Publication

Benchmarking in-the-wild multimodal disease recognition and a versatile baseline

Wei, Tianqi, Chen, Zhi, Huang, Zi and Yu, Xin (2024). Benchmarking in-the-wild multimodal disease recognition and a versatile baseline. MM '24: The 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October-1 November 2024. New York, United States: Association for Computing Machinery. doi: 10.1145/3664647.3680599

Benchmarking in-the-wild multimodal disease recognition and a versatile baseline

Current funding

2025 - 2026

Creation of an interactive online seaweed production map to support policy-making for the Indonesian seaweed industry

KONEKSI Environment and Climate Change Extension Support

Open grant
2024 - 2027

AI-Empowered and Video-Based uplift of Paralympic classification systems (AQIRP project administered by Follow Me AI)

Follow Me AI Pty Ltd

Open grant
2023 - 2028

Breaking the Communication Barrier for the Australian Deaf Community: Vision Based Australian Sign Language Translation and Production

Google Asia Pacific Pte Ltd

Open grant
2023 - 2028

ARC Research Hub to Advance Timber for Australia's Future Built Environment

ARC Industrial Transformation Research Hubs

Open grant
2023 - 2027

Analytics for the Australian Grains Industry (AAGI)

Grains Research & Development Corporation

Open grant

Past funding

2024

Breaking the Communication Barrier for the Australian Deaf Community: Vision Based Australian Sign Language Translation and Production

Google Inc

Open grant
2023 - 2024

Developing applications of satellite imagery for modelling environmental and social impacts of climate change on seaweed farming in Indonesia (KONEKSI Grant administered by Griffith University)

Griffith University

Open grant
2023 - 2025

Advancing Human Perception: Countering Evolving Malicious Fake Visual Data

ARC Discovery Early Career Researcher Award

Open grant
2023 - 2025

Two-way Auslan: Automatic Machine Translation of Australian Sign Language (ARC Discovery Project administered by ANU)

The Australian National University

Open grant

Availability

Associate Professor Xin Yu is:: Available for supervision

Looking for a supervisor? Read our advice on how to choose a supervisor.

Supervision history

Current supervision

Doctor Philosophy

Integrating Deep Learning and Remote Sensing for Precision Agriculture in Staple Crops

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Towards Comprehensive Australian Sign Language Understanding: Datasets, Methodologies, and Systems

Principal Advisor

Other advisors: Professor Helen Huang, Dr Heming Du
Doctor Philosophy

Human Posture Recognition Applied to Physical Activity

Principal Advisor

Other advisors: Professor Sean Tweedy
Doctor Philosophy

Combating evolving deceptive fake visual information through deepfake detection

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Understanding Human Movements and Sport Performance Analysis

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Human Understanding in Sports

Principal Advisor

Other advisors: Associate Professor Sen Wang, Dr Heming Du
Doctor Philosophy

Understanding Human Intention and Performance

Principal Advisor

Other advisors: Associate Professor Sen Wang
Doctor Philosophy

Understanding Human Intention and Performance

Principal Advisor

Other advisors: Dr Heming Du, Dr Miao Xu
Doctor Philosophy

Compressed Video Restoration

Principal Advisor

Other advisors: Dr Miao Xu, Dr Heming Du
Doctor Philosophy

The Unlabeled Truth: Rethinking Medical Imaging Supervision for Foundation Models in the Wild

Associate Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh, Dr Heming Du
Doctor Philosophy

Enhancing Robustness and Generalizability in Computational Models

Associate Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh
Doctor Philosophy

Towards knowledge discovery from imperfect and evolving data

Associate Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Pose Estimation for Human with Disabilities

Associate Advisor

Other advisors: Dr Heming Du
Doctor Philosophy

Multimodal foundation model design and analysis

Associate Advisor

Other advisors: Dr Miao Xu, Dr Heming Du
Doctor Philosophy

Data driven approaches for smart farming

Associate Advisor

Other advisors: Professor Helen Huang
Doctor Philosophy

Effective Visual Data Compression

Associate Advisor

Other advisors: Associate Professor Sen Wang, Dr Heming Du
Doctor Philosophy

Remote Sensing Analysis in computer vision

Associate Advisor

Other advisors: Professor Helen Huang

Completed supervision

2026

Doctor Philosophy

Towards Data-Driven Analysis of Handheld Fundus Videos

Principal Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh, Dr Heming Du
2026

Doctor Philosophy

Object-Centric Audio-Visual Alignment for Sounding Source Segmentation

Principal Advisor

Other advisors: Associate Professor Sen Wang

Enquiries

For media enquiries about Associate Professor Xin Yu's areas of expertise, story ideas and help finding experts, contact our Media team:

communications@uq.edu.au

External profiles

Personal links

Update my profile

Xin Yu

Overview

Background

Availability

Research impacts

Works

M3GYM: a large-scale multimodal multi-view multi-person pose dataset for fitness activity understanding in real-world settings

Cross-view isolated sign language recognition challenge: design, results and future research

MDAM 3: a misinformation detection and analysis framework for multitype multimodal media

ICE: interactive 3D game character facial editing via dialogue

FlashVTG: feature layering and adaptive score handling network for video temporal grounding

TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm

DreamCar: leveraging car-specific prior for in-the-wild 3D car reconstruction

Vision-based abnormal action dataset for&nbsp;recognising body motion disorders

Compound expression recognition via&nbsp;curriculum learning

Who is Being Impersonated? Deepfake audio detection and&nbsp;impersonated identification via&nbsp;extraction of&nbsp;ID-specific features

QGait: Toward Accurate Quantization for Gait Recognition

TalkCLIP: talking head generation with text-guided expressive speaking styles

Affective behaviour analysis via&nbsp;progressive learning

Transferable attacks for&nbsp;semantic segmentation

CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance

FreeAvatar: robust 3D facial animation transfer by learning an expression foundation model

Snap and diagnose: an advanced multimodal retrieval system for identifying plant diseases in the wild

M3 A: A multimodal misinformation dataset for media authenticity analysis

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Benchmarking in-the-wild multimodal disease recognition and a versatile baseline

Funding

Current funding

Past funding

Supervision

Availability

Supervision history

Current supervision

Integrating Deep Learning and Remote Sensing for Precision Agriculture in Staple Crops

Towards Comprehensive Australian Sign Language Understanding: Datasets, Methodologies, and Systems

Human Posture Recognition Applied to Physical Activity

Combating evolving deceptive fake visual information through deepfake detection

Understanding Human Movements and Sport Performance Analysis

Human Understanding in Sports

Understanding Human Intention and Performance

Understanding Human Intention and Performance

Compressed Video Restoration

The Unlabeled Truth: Rethinking Medical Imaging Supervision for Foundation Models in the Wild

Enhancing Robustness and Generalizability in Computational Models

Towards knowledge discovery from imperfect and evolving data

Pose Estimation for Human with Disabilities

Multimodal foundation model design and analysis

Data driven approaches for smart farming

Effective Visual Data Compression

Remote Sensing Analysis in computer vision

Completed supervision

Towards Data-Driven Analysis of Handheld Fundus Videos

Object-Centric Audio-Visual Alignment for Sounding Source Segmentation

Media

Enquiries

Vision-based abnormal action dataset for recognising body motion disorders

Compound expression recognition via curriculum learning

Who is Being Impersonated? Deepfake audio detection and impersonated identification via extraction of ID-specific features

Affective behaviour analysis via progressive learning

Transferable attacks for semantic segmentation