Yujun Cai

Email:: yujun.cai@uq.edu.au

Availability

Dr Yujun Cai is:: Available for supervision

Qualifications

Doctor of Philosophy of Electrical Engineering and Computer Science, Nanyang Technological University

Search Professor Yujun Cai’s works on UQ eSpace

45 works between 2018 and 2025

All (45) Journal Article (4) Conference Publication (41)

2025

Conference Publication

LatentHOI: on the generalizable hand object motion generation with latent hand diffusion

Li, Muchen, Christen, Sammy, Wan, Chengde, Cai, Yujun, Liao, Renjie, Sigal, Leonid and Ma, Shugao (2025). LatentHOI: on the generalizable hand object motion generation with latent hand diffusion. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, United States, 10-17 June 2025. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52734.2025.01623

LatentHOI: on the generalizable hand object motion generation with latent hand diffusion

2025

Conference Publication

Vulnerability of LLMs to vertically aligned text manipulations

Li, Zhecheng, Wang, Yiwei, Hooi, Bryan, Cai, Yujun, Xiong, Zhen, Peng, Nanyun and Chang, Kai-Wei (2025). Vulnerability of LLMs to vertically aligned text manipulations. 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 27 July-1 August 2025. Stroudsburg, PA USA: Association for Computational Linguistics. doi: 10.18653/v1/2025.acl-long.978

Vulnerability of LLMs to vertically aligned text manipulations

2025

Conference Publication

Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models

Hao, Shuyang, Hooi, Bryan, Liu, Jun, Chang, Kai-Wei, Huang, Zi and Cai, Yujun (2025). Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TX, United States, 10 - 17 June 2025. Washington, DC, United States: I E E E Computer Society. doi: 10.1109/cvpr52734.2025.01852

Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models

2025

Journal Article

SED-MVS: segmentation-driven and edge-aligned deformation multi-view stereo with depth restoration and occlusion constraint

Yuan, Zhenlong, Yang, Zhidong, Cai, Yujun, Wu, Kuangxin, Liu, Mufan, Zhang, Dapeng, Jiang, Hao, Li, Zhaoxin and Wang, Zhaoqi (2025). SED-MVS: segmentation-driven and edge-aligned deformation multi-view stereo with depth restoration and occlusion constraint. IEEE Transactions on Circuits and Systems for Video Technology, 35 (11), 11244-11257. doi: 10.1109/TCSVT.2025.3574473

SED-MVS: segmentation-driven and edge-aligned deformation multi-view stereo with depth restoration and occlusion constraint

2025

Conference Publication

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking

Li, Sifan, Cai, Yujun and Wang, Yiwei (2025). SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking. Association for Computational Linguistics (ACL). doi: 10.18653/v1/2025.emnlp-main.1381

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking

2025

Conference Publication

CON-RECALL: Detecting Pre-training Data in LLMs via Contrastive Decoding

Wang, Cheng, Wang, Yiwei, Hooi, Bryan, Cai, Yujun, Peng, Nanyun and Chang, Kai-Wei (2025). CON-RECALL: Detecting Pre-training Data in LLMs via Contrastive Decoding. 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19-24 January 2025. Stroudsburg, PA, United States: Association for Computational Linguistics (ACL).

CON-RECALL: Detecting Pre-training Data in LLMs via Contrastive Decoding

2025

Conference Publication

DRS: Deep question reformulation with structured output

Li, Zhecheng, Wang, Yiwei, Hooi, Bryan, Cai, Yujun, Peng, Nanyun and Chang, Kai-Wei (2025). DRS: Deep question reformulation with structured output. 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, 27 July-1 August 2025. Stroudsburg, PA, United States: Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-acl.666

DRS: Deep question reformulation with structured output

2025

Conference Publication

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Wu, Hang, Chen, Hongkai, Cai, Yujun, Liu, Chang, Ye, Qingwen, Yang, Ming-Hsuan and Wang, Yiwei (2025). DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning. Association for Computational Linguistics (ACL). doi: 10.18653/v1/2025.emnlp-main.1334

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

2025

Conference Publication

Energy-Calibrated VAE with Test Time Free Lunch

Luo, Yihong, Qiu, Siya, Tao, Xingjian, Cai, Yujun and Tang, Jing (2025). Energy-Calibrated VAE with Test Time Free Lunch. 18th European Conference on Computer Vision (ECCV), Milan Italy, Sep 29-Oct 04, 2024. Heidelberg, Germany: Springer. doi: 10.1007/978-3-031-73013-9_19

Energy-Calibrated VAE with Test Time Free Lunch

2025

Conference Publication

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs

Xiong, Zhen, Cai, Yujun, Li, Zhecheng and Wang, Yiwei (2025). Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs. Association for Computational Linguistics (ACL). doi: 10.18653/v1/2025.emnlp-main.896

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs

2025

Conference Publication

Tricking retrievers with influential tokens: an efficient black-box corpus poisoning attack

Wang, Cheng, Wang, Yiwei, Cai, Yujun and Hooi, Bryan (2025). Tricking retrievers with influential tokens: an efficient black-box corpus poisoning attack. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Albuquerque, New Mexico, 29 April-4 May 2025. Albuquerque, New Mexico: Association for Computational Linguistics (ACL). doi: 10.18653/v1/2025.naacl-long.210

Tricking retrievers with influential tokens: an efficient black-box corpus poisoning attack

2025

Conference Publication

VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft

Fu, Honghao, Ren, Junlong, Chai, Qi, Ye, Deheng, Cai, Yujun and Wang, Hao (2025). VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft. Association for Computational Linguistics (ACL). doi: 10.18653/v1/2025.emnlp-main.1111

VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft

2024

Conference Publication

STMG: a machine learning microgesture recognition system for supporting thumb-based VR/AR input

Kin, Kenrick, Wan, Chengde, Koh, Ken, Marin, Andrei, Camgöz, Necati Cihan, Zhang, Yubo, Cai, Yujun, Kovalev, Fedor, Ben-Zacharia, Moshe, Hoople, Shannon, Nunes-Ueno, Marcos, Sanchez-Rodriguez, Mariel, Bhargava, Ayush, Wang, Robert, Sauser, Eric and Ma, Shugao (2024). STMG: a machine learning microgesture recognition system for supporting thumb-based VR/AR input. CHI '24: CHI Conference on Human Factors in Computing Systems, Honolulu, HI USA, 11-16 May 2024. New York, NY USA: Association for Computing Machinery. doi: 10.1145/3613904.3642702

STMG: a machine learning microgesture recognition system for supporting thumb-based VR/AR input

2024

Conference Publication

Social diffusion: long-term multiple human motion anticipation

Tanke, Julian, Zhang, Linguang, Zhao, Amy, Tang, Chengcheng, Cai, Yujun, Wang, Lezi, Wu, Po-Chen, Gall, Juergen and Keskin, Cem (2024). Social diffusion: long-term multiple human motion anticipation. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1-6 October 2023. Piscataway, NJ USA: Institute of Electrical and Electronics Engineers. doi: 10.1109/ICCV51070.2023.00880

Social diffusion: long-term multiple human motion anticipation

2024

Conference Publication

LLMs are good action recognizers

Qu, Haoxuan, Cai, Yujun and Liu, Jun (2024). LLMs are good action recognizers. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Washington, DC, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.01741

LLMs are good action recognizers

2024

Conference Publication

DisC-GS: discontinuity-aware Gaussian splatting

Qu, Haoxuan, Li, Zhuoling, Rahmani, Hossein, Cai, Yujun and Liu, Jun (2024). DisC-GS: discontinuity-aware Gaussian splatting. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10-15 December 2024. San Mateo, CA, United States: Morgan Kaufmann Publishers. doi: 10.52202/079017-3566

DisC-GS: discontinuity-aware Gaussian splatting

2024

Conference Publication

emg2pose: a large and diverse benchmark for surface electromyographic hand pose estimation

Salter, Sasha, Warren, Richard, Schlager, Collin, Spurr, Adrian, Han, Shangchen, Bhasin, Rohin, Cai, Yujun, Walkington, Peter, Bolarinwa, Anuoluwapo, Wang, Robert, Danielson, Nathan, Merel, Josh, Pnevmatikakis, Eftychios and Marshall, Jesse (2024). emg2pose: a large and diverse benchmark for surface electromyographic hand pose estimation. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 10-15 December 2024. San Mateo, CA, United States: Morgan Kaufmann Publishers. doi: 10.52202/079017-1770

emg2pose: a large and diverse benchmark for surface electromyographic hand pose estimation

2024

Conference Publication

6D-Diff: a keypoint diffusion framework for 6D object pose estimation

Xu, Li, Qu, Haoxuan, Cai, Yujun and Liu, Jun (2024). 6D-Diff: a keypoint diffusion framework for 6D object pose estimation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, United States, 16-22 June 2024. Piscataway, NJ, United States: IEEE Computer Society. doi: 10.1109/CVPR52733.2024.00924

6D-Diff: a keypoint diffusion framework for 6D object pose estimation

2023

Conference Publication

LMC: large model collaboration with cross-assessment for training-free open-set object recognition

Qu, Haoxuan, Hui, Xiaofei, Cai, Yujun and Liu, Jun (2023). LMC: large model collaboration with cross-assessment for training-free open-set object recognition. NIPS'23: 37th International Conference on Neural Information Processing Systems, New Orleans, LA USA, 10-16 December 2023. Maryland Heights, MO USA: Morgan Kaufmann Publishers. doi: 10.5555/3666122.3668138

LMC: large model collaboration with cross-assessment for training-free open-set object recognition

2023

Conference Publication

Primacy effect of ChatGPT

Wang, Yiwei, Cai, Yujun, Chen, Muhao, Liang, Yuxuan and Hooi, Bryan (2023). Primacy effect of ChatGPT. 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Singapore, 6-10 December 2023. Kerrville, TX USA: Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.8

Primacy effect of ChatGPT

Availability

Dr Yujun Cai is:: Available for supervision

Looking for a supervisor? Read our advice on how to choose a supervisor.

Available projects

Multi-Modal Perception for Context-Aware Systems

Develop algorithms for multi-modal perception, integrating visual, textual and other data modalities to enhance contextual understanding.

Supervision history

Current supervision

Doctor Philosophy

Multi-Modal Perception for Context-Aware Systems

Principal Advisor

Other advisors: Dr Miao Xu
Doctor Philosophy

Towards Scalable Sign Language Understanding: From Streaming Translation to Low Resource Transfer

Associate Advisor

Other advisors: Dr Heming Du, Associate Professor Xin Yu
Doctor Philosophy

Multimodal Learning for Advanced Graph Data Analysis

Associate Advisor

Other advisors: Professor Helen Huang, Dr Ruihong Qiu
Doctor Philosophy

Multi-Modal Perception for Context-Aware Systems

Associate Advisor

Other advisors: Dr Miao Xu

Enquiries

For media enquiries about Dr Yujun Cai's areas of expertise, story ideas and help finding experts, contact our Media team:

communications@uq.edu.au

External profiles

ORCID
Scopus

Personal links

Update my profile

Yujun Cai

Overview

Availability

Qualifications

Works

LatentHOI: on the generalizable hand object motion generation with latent hand diffusion

Vulnerability of LLMs to vertically aligned text manipulations

Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models

SED-MVS: segmentation-driven and edge-aligned deformation multi-view stereo with depth restoration and occlusion constraint

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking

CON-RECALL: Detecting Pre-training Data in LLMs via Contrastive Decoding

DRS: Deep question reformulation with structured output

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Energy-Calibrated VAE with&nbsp;Test Time Free Lunch

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs

Tricking retrievers with influential tokens: an efficient black-box corpus poisoning attack

VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft

STMG: a machine learning microgesture recognition system for supporting thumb-based VR/AR input

Social diffusion: long-term multiple human motion anticipation

LLMs are good action recognizers

DisC-GS: discontinuity-aware Gaussian splatting

emg2pose: a large and diverse benchmark for surface electromyographic hand pose estimation

6D-Diff: a keypoint diffusion framework for 6D object pose estimation

LMC: large model collaboration with cross-assessment for training-free open-set object recognition

Primacy effect of ChatGPT

Supervision

Availability

Available projects

Multi-Modal Perception for Context-Aware Systems

Supervision history

Current supervision

Multi-Modal Perception for Context-Aware Systems

Towards Scalable Sign Language Understanding: From Streaming Translation to Low Resource Transfer

Multimodal Learning for Advanced Graph Data Analysis

Multi-Modal Perception for Context-Aware Systems

Media

Enquiries

Energy-Calibrated VAE with Test Time Free Lunch