Associate Professor

Xin Yu

Email:: xin.yu@uq.edu.au

Positions

Affiliate of ARC COE for Children and Families Over the Lifecourse: ARC COE for Children and Families Over the Lifecourse; Faculty of Humanities, Arts and Social Sciences

Affiliate of Centre for Enterprise AI: Centre for Enterprise AI; Faculty of Engineering, Architecture and Information Technology

Honorary Associate Professor: School of Electrical Engineering and Computer Science; Faculty of Engineering, Architecture and Information Technology

Background

My name is Xin Yu, an Associate Professor at the University of Queensland. I am an Australian Research Council Discovery Early Career Researcher Award 2023-2025 (DECRA) recipient and an awardee of the prestigious Google Research Scholar Program in 2021. I am also a Google Visiting Faculty. Previously, I was a research fellow at the Australian National University (ANU). I received my PhD degree from the Australian National Unversity under the supervision of Prof. Richard Hartley, Prof. Fatih Porikli and Dr. Basura Fernando. I also received a PhD degree from Tsinghua University supervised by Prof. Li Zhang. I am interested in Computer Vision and Machine Learning topics.

My research topics includes various computer vision and machine learning tasks, especially in efficient low-level image processing, image retrieval and localization, action recognition, 3D pose estimation, visual navigation and sign language recognition and translation.

Availability

Associate Professor Xin Yu is:: Available for supervision

Research impacts

One of my research papers has been awarded "Best Paper Honorable Mention" award in the premium computer vision conference WACV 2020, and one paper has been nominated for the Best Paper Award in CVPR 2020.

I was awarded the Outstanding Reviewer Award in ECCV 2020, CVPR 2021 and ICCV 2021. CVPR, ICCV and ECCV are internationally world-leading computer vision and machine learning conferences. My research interests include deep learning techniques, image processing, and computer vision tasks. I am a program committee member of top-tier computer vision and machine learning conferences, such as CVPR, ICCV, ECCV, ICML, ICLR and NeurIPS, and a reviewer of prestigious journals, such as TPAMI, IJCV and TIP.

I am happy to supervise self-motivated PhD and MPhil students. If you are an undergraduate student and willing to conduct your honour project, please drop me an email.

Search Professor Xin Yu’s works on UQ eSpace

179 works between 2011 and 2026

All (179) Journal Article (60) Conference Publication (117) Book Chapter (2)

2024

Conference Publication

Learning transferable compound expressions from Masked AutoEncoder pretraining

Qiu, Feng, Du, Heming, Zhang, Wei, Liu, Chen, Li, Lincheng, Guo, Tianchen and Yu, Xin (2024). Learning transferable compound expressions from Masked AutoEncoder pretraining. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, United States, 17-18 June 2024. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvprw63382.2024.00476

Learning transferable compound expressions from Masked AutoEncoder pretraining

2024

Conference Publication

An effective ensemble learning framework for affective behaviour analysis

Zhang, Wei, Qiu, Feng, Liu, Chen, Li, Lincheng, Du, Heming, Guo, Tianchen and Yu, Xin (2024). An effective ensemble learning framework for affective behaviour analysis. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, United States, 17-18 June 2024. Piscataway, NJ, United States: Institute of Electrical and Electronics Engineers. doi: 10.1109/cvprw63382.2024.00479

An effective ensemble learning framework for affective behaviour analysis

2024

Journal Article

Proactive image manipulation detection via deep semi-fragile watermark

Zhao, Yuan, Liu, Bo, Zhu, Tianqing, Ding, Ming, Yu, Xin and Zhou, Wanlei (2024). Proactive image manipulation detection via deep semi-fragile watermark. Neurocomputing, 585 127593. doi: 10.1016/j.neucom.2024.127593

Proactive image manipulation detection via deep semi-fragile watermark

2024

Conference Publication

DiPEx: Dispersing Prompt Expansion for class-agnostic object detection

Lim, Jia Syuen, Chen, Zhuoxiao, Baktashmotlagh, Mahsa, Chen, Zhi, Yu, Xin, Huang, Zi and Luo, Yadan (2024). DiPEx: Dispersing Prompt Expansion for class-agnostic object detection. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10-15 December 2024. San Mateo, CA, United States: Morgan Kaufmann Publishers. doi: 10.52202/079017-0781

DiPEx: Dispersing Prompt Expansion for class-agnostic object detection

2024

Journal Article

BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledge

Liu, Chen, Li, Peike, Zhang, Hu, Li, Lincheng, Huang, Zi, Wang, Dadong and Yu, Xin (2024). BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledge. IEEE Transactions on Multimedia, 26, 10015-10028. doi: 10.1109/tmm.2024.3405622

BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledge

2024

Journal Article

AI empowered Auslan learning for parents of deaf children and children of deaf adults

Sheng, Hongwei, Shen, Xin, Du, Heming, Zhang, Hu, Huang, Zi and Yu, Xin (2024). AI empowered Auslan learning for parents of deaf children and children of deaf adults. AI and Ethics, 4 (4), 1-11. doi: 10.1007/s43681-024-00457-y

AI empowered Auslan learning for parents of deaf children and children of deaf adults

2024

Journal Article

Detecting facial action units from global-local fine-grained expressions

Zhang, Wei, Li, Lincheng, Ding, Yu, Chen, Wei, Deng, Zhigang and Yu, Xin (2024). Detecting facial action units from global-local fine-grained expressions. IEEE Transactions on Circuits and Systems for Video Technology, 34 (2), 983-994. doi: 10.1109/tcsvt.2023.3288903

Detecting facial action units from global-local fine-grained expressions

2024

Conference Publication

When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision

Yu, Qingtao, Du, Heming, Liu, Chen and Yu, Xin (2024). When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, United States, 3-8 January 2024. Piscataway, NJ, United States: IEEE. doi: 10.1109/wacv57701.2024.00368

When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision

2024

Conference Publication

Benchmarking audio visual segmentation for long-untrimmed videos

Liu, Chen, Li, Peike Patrick, Yu, Qingtao, Sheng, Hongwei, Wang, Dadong, Li, Lincheng and Yu, Xin (2024). Benchmarking audio visual segmentation for long-untrimmed videos. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.02143

Benchmarking audio visual segmentation for long-untrimmed videos

2024

Conference Publication

Text-guided 3D face synthesis - from generation to editing

Wu, Yunjie, Meng, Yapeng, Hu, Zhipeng, Li, Lincheng, Wu, Haoqian, Zhou, Kun, Xu, Weiwei and Yu, Xin (2024). Text-guided 3D face synthesis - from generation to editing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.00126

Text-guided 3D face synthesis - from generation to editing

2024

Conference Publication

MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition dataset

Shen, Xin, Du, Heming, Sheng, Hongwei, Wang, Shuyun, Chen, Hui, Chen, Huiqiang, Wu, Zhuojie, Du, Xiaobiao, Ying, Jiaying, Lu, Ruihan, Xu, Qingzheng and Yu, Xin (2024). MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition dataset. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10-15 December 2024. San Mateo, CA, United States: Morgan Kaufmann Publishers. doi: 10.52202/079017-2227

MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition dataset

2024

Journal Article

StyleTalk++: A unified framework for controlling the speaking styles of talking heads

Wang, Suzhen, Ma, Yifeng, Ding, Yu, Hu, Zhipeng, Fan, Changjie, Lv, Tangjie, Deng, Zhidong and Yu, Xin (2024). StyleTalk++: A unified framework for controlling the speaking styles of talking heads. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 (6), 4331-4347. doi: 10.1109/tpami.2024.3357808

StyleTalk++: A unified framework for controlling the speaking styles of talking heads

2024

Conference Publication

An empirical analysis on spatial reasoning capabilities of large multimodal models

Shiri, Fatemeh, Guo, Xiao-Yu, Far, Mona Golestan, Yu, Xin, Haffari, Gholamreza and Li, Yuan-Fang (2024). An empirical analysis on spatial reasoning capabilities of large multimodal models. 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, United States, 12-16 November 2024. Kerrville, TX, United States: Association for Computational Linguistics (ACL). doi: 10.18653/v1/2024.emnlp-main.1195

An empirical analysis on spatial reasoning capabilities of large multimodal models

2024

Journal Article

CMGNet: Collaborative multi-modal graph network for video captioning

Rao, Qi, Yu, Xin, Li, Guang and Zhu, Linchao (2024). CMGNet: Collaborative multi-modal graph network for video captioning. Computer Vision and Image Understanding, 238 103864, 1-10. doi: 10.1016/j.cviu.2023.103864

CMGNet: Collaborative multi-modal graph network for video captioning

2024

Journal Article

MarkerNet: A divide-and-conquer solution to motion capture solving from raw markers

Hu, Zhipeng, Tang, Jilin, Li, Lincheng, Hou, Jie, Xin, Haoran, Yu, Xin and Bu, Jiajun (2024). MarkerNet: A divide-and-conquer solution to motion capture solving from raw markers. Computer Animation and Virtual Worlds, 35 (1) e2228, 1-19. doi: 10.1002/cav.2228

MarkerNet: A divide-and-conquer solution to motion capture solving from raw markers

2024

Journal Article

EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation

Qi, Xingqun, Liu, Chen, Li, Lincheng, Hou, Jie, Xin, Haoran and Yu, Xin (2024). EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation. IEEE Transactions on Multimedia, 26, 10420-10430. doi: 10.1109/tmm.2024.3407692

EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation

2024

Conference Publication

EfficientDreamer: high-fidelity and stable 3D creation via orthogonal-view diffusion priors

Hu, Zhipeng, Zhao, Minda, Zhao, Chaoyi, Liang, Xinyue, Li, Lincheng, Zhao, Zeng, Fan, Changjie, Zhou, Xiaowei and Yu, Xin (2024). EfficientDreamer: high-fidelity and stable 3D creation via orthogonal-view diffusion priors. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.00473

EfficientDreamer: high-fidelity and stable 3D creation via orthogonal-view diffusion priors

2024

Journal Article

CBARF: cascaded bundle-adjusting neural radiance fields from imperfect camera poses

Fu, Hongyu, Yu, Xin, Li, Lincheng and Zhang, Li (2024). CBARF: cascaded bundle-adjusting neural radiance fields from imperfect camera poses. IEEE Transactions on Multimedia, 26, 9304-9315. doi: 10.1109/tmm.2024.3388929

CBARF: cascaded bundle-adjusting neural radiance fields from imperfect camera poses

2024

Conference Publication

MMOOC: a multimodal misinformation dataset for out-of-context news analysis

Xu, Qingzheng, Du, Heming, Chen, Huiqiang, Liu, Bo and Yu, Xin (2024). MMOOC: a multimodal misinformation dataset for out-of-context news analysis. 29th Australasian Conference, ACISP 2024, Sydney, NSW, Australia, 15–17 July 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-981-97-5101-3_24

MMOOC: a multimodal misinformation dataset for out-of-context news analysis

2024

Conference Publication

AS-NeRF: learning auxiliary sampling for generalizable novel view synthesis from sparse views

Tang, Jilin, Li, Lincheng, Qi, Xingqun, Chen, Yingfeng, Fan, Changjie and Yu, Xin (2024). AS-NeRF: learning auxiliary sampling for generalizable novel view synthesis from sparse views. 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15-19 July 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/ICME57554.2024.10688126

AS-NeRF: learning auxiliary sampling for generalizable novel view synthesis from sparse views

Current funding

2025 - 2026

Creation of an interactive online seaweed production map to support policy-making for the Indonesian seaweed industry

KONEKSI Environment and Climate Change Extension Support

Open grant
2024 - 2027

AI-Empowered and Video-Based uplift of Paralympic classification systems (AQIRP project administered by Follow Me AI)

Follow Me AI Pty Ltd

Open grant
2023 - 2028

Breaking the Communication Barrier for the Australian Deaf Community: Vision Based Australian Sign Language Translation and Production

Google Asia Pacific Pte Ltd

Open grant
2023 - 2028

ARC Research Hub to Advance Timber for Australia's Future Built Environment

ARC Industrial Transformation Research Hubs

Open grant
2023 - 2027

Analytics for the Australian Grains Industry (AAGI)

Grains Research & Development Corporation

Open grant

Past funding

2024

Breaking the Communication Barrier for the Australian Deaf Community: Vision Based Australian Sign Language Translation and Production

Google Inc

Open grant
2023 - 2024

Developing applications of satellite imagery for modelling environmental and social impacts of climate change on seaweed farming in Indonesia (KONEKSI Grant administered by Griffith University)

Griffith University

Open grant
2023 - 2025

Advancing Human Perception: Countering Evolving Malicious Fake Visual Data

ARC Discovery Early Career Researcher Award

Open grant
2023 - 2025

Two-way Auslan: Automatic Machine Translation of Australian Sign Language (ARC Discovery Project administered by ANU)

The Australian National University

Open grant

Availability

Associate Professor Xin Yu is:: Available for supervision

Looking for a supervisor? Read our advice on how to choose a supervisor.

Supervision history

Current supervision

Doctor Philosophy

Compressed Video Restoration

Principal Advisor

Other advisors: Dr Miao Xu, Dr Heming Du
Doctor Philosophy

Human Posture Recognition Applied to Physical Activity

Principal Advisor

Other advisors: Professor Sean Tweedy
Doctor Philosophy

Understanding Human Intention and Performance

Principal Advisor

Other advisors: Dr Heming Du, Dr Miao Xu
Doctor Philosophy

Understanding Human Intention and Performance

Principal Advisor

Other advisors: Associate Professor Sen Wang
Doctor Philosophy

Towards Comprehensive Australian Sign Language Understanding: Datasets, Methodologies, and Systems

Principal Advisor

Other advisors: Professor Helen Huang, Dr Heming Du
Doctor Philosophy

Understanding Human Movements and Sport Performance Analysis

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Combating evolving deceptive fake visual information through deepfake detection

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Towards Data-Driven Analysis of Handheld Fundus Videos

Principal Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh, Dr Heming Du
Doctor Philosophy

Human Understanding in Sports

Principal Advisor

Other advisors: Associate Professor Sen Wang, Dr Heming Du
Doctor Philosophy

Integrating Deep Learning and Remote Sensing for Precision Agriculture in Staple Crops

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Enhancing Robustness and Generalizability in Computational Models

Associate Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh
Doctor Philosophy

Data driven approaches for smart farming

Associate Advisor

Other advisors: Professor Helen Huang
Doctor Philosophy

Towards knowledge discovery from imperfect and evolving data

Associate Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Remote Sensing Analysis in computer vision

Associate Advisor

Other advisors: Professor Helen Huang
Doctor Philosophy

Effective Visual Data Compression

Associate Advisor

Other advisors: Associate Professor Sen Wang, Dr Heming Du
Doctor Philosophy

Pose Estimation for Human with Disabilities

Associate Advisor

Other advisors: Dr Heming Du
Doctor Philosophy

The Unlabeled Truth: Rethinking Medical Imaging Supervision for Foundation Models in the Wild

Associate Advisor

Other advisors: Associate Professor Mahsa Baktashmotlagh, Dr Heming Du
Doctor Philosophy

Multimodal foundation model design and analysis

Associate Advisor

Other advisors: Dr Miao Xu, Dr Heming Du

Completed supervision

2026

Doctor Philosophy

Object-Centric Audio-Visual Alignment for Sounding Source Segmentation

Principal Advisor

Other advisors: Associate Professor Sen Wang

Enquiries

For media enquiries about Associate Professor Xin Yu's areas of expertise, story ideas and help finding experts, contact our Media team:

communications@uq.edu.au

External profiles

Personal links

Update my profile

Xin Yu

Overview

Background

Availability

Research impacts

Works

Learning transferable compound expressions from Masked AutoEncoder pretraining

An effective ensemble learning framework for affective behaviour analysis

Proactive image manipulation detection via deep semi-fragile watermark

DiPEx: Dispersing Prompt Expansion for class-agnostic object detection

BAVS: Bootstrapping audio-visual segmentation by integrating foundation knowledge

AI empowered Auslan learning for parents of deaf children and children of deaf adults

Detecting facial action units from global-local fine-grained expressions

When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision

Benchmarking audio visual segmentation for long-untrimmed videos

Text-guided 3D face synthesis - from generation to editing

MM-WLAuslan: multi-view multi-modal word-level Australian Sign Language recognition dataset

StyleTalk++: A unified framework for controlling the speaking styles of talking heads

An empirical analysis on spatial reasoning capabilities of large multimodal models

CMGNet: Collaborative multi-modal graph network for video captioning

MarkerNet: A divide-and-conquer solution to motion capture solving from raw markers

EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation

EfficientDreamer: high-fidelity and stable 3D creation via orthogonal-view diffusion priors

CBARF: cascaded bundle-adjusting neural radiance fields from imperfect camera poses

MMOOC: a multimodal misinformation dataset for&nbsp;out-of-context news analysis

AS-NeRF: learning auxiliary sampling for generalizable novel view synthesis from sparse views

Funding

Current funding

Past funding

Supervision

Availability

Supervision history

Current supervision

Compressed Video Restoration

Human Posture Recognition Applied to Physical Activity

Understanding Human Intention and Performance

Understanding Human Intention and Performance

Towards Comprehensive Australian Sign Language Understanding: Datasets, Methodologies, and Systems

Understanding Human Movements and Sport Performance Analysis

Combating evolving deceptive fake visual information through deepfake detection

Towards Data-Driven Analysis of Handheld Fundus Videos

Human Understanding in Sports

Integrating Deep Learning and Remote Sensing for Precision Agriculture in Staple Crops

Enhancing Robustness and Generalizability in Computational Models

Data driven approaches for smart farming

Towards knowledge discovery from imperfect and evolving data

Remote Sensing Analysis in computer vision

Effective Visual Data Compression

Pose Estimation for Human with Disabilities

The Unlabeled Truth: Rethinking Medical Imaging Supervision for Foundation Models in the Wild

Multimodal foundation model design and analysis

Completed supervision

Object-Centric Audio-Visual Alignment for Sounding Source Segmentation

Media

Enquiries

MMOOC: a multimodal misinformation dataset for out-of-context news analysis