Publication record
Publications
Research across reliable AI, self-correction, causal representation learning, multimodal systems, and world models.
Year Venue Showing 15 of 15 publications
ICML 2026 Oral · FMs for Science, ICLR 2026 Workshop
CausalGame: Benchmarking Causal Thinking of LLM Agents in Games
Zhenhao Chen*, Yongqiang Chen*, Chenxi Liu*, Junchi Yu, Xiangchen Song, Zijian Li, Jialin Li, Philip Torr, Bo Han, Kun Zhang
An interactive benchmark for evaluating how LLM agents design experiments, reason from biased evidence, and recover hidden mechanisms.
LLM agentsAI scientistbenchmarkexperimentation
ICML 2025
Reflection-Window Decoding: Text Generation with Selective Refinement
Zeyu Tang*, Zhenhao Chen*, Xiangchen Song, Loka Li, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang
A decoding strategy that lets language models selectively revisit and refine past tokens during generation.
reasoningdecodingLLMs
arXiv Preprint
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of LLMs
Loka Li*, Zhenhao Chen*, Guangyi Chen*, Yixuan Zhang, Yusheng Su, Eric Xing, Kun Zhang
A study of when models can detect and correct their own errors without external oracles, with confidence as a key factor.
self-correctionalignmentLLMs
ICML 2024
CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation
Guangyi Chen*, Yifan Shen*, Zhenhao Chen*, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, Kun Zhang
A causal representation learning approach for temporal data and video understanding under non-invertible generation.
representation learningtemporal datacausal learning
CVPR 2023
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
Guangyi Chen*, Zhenhao Chen*, Shunxing Fan, Kun Zhang
BOSampler uses unsupervised Bayesian optimization to adaptively mine potential future paths for trajectory prediction.
trajectory predictionBayesian optimizationCVPR
ICLR 2026 Workshop on AI with Recursive Self-Improvement · Spotlight
CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang
A study of causal scratchpads as a mechanism for open-ended AI discovery and recursive self-improvement.
AI discoveryself-improvementagents
AAAI 2026
Maniplvm-r1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models
Zirui Song, Guangxian Ouyang, Mingzhe Li, Yuheng Ji, Chenxi Wang, Zixiang Xu, Zeyu Zhang, Xiaoqing Zhang, Qian Jiang, Zhenhao Chen, Zhongzhi Li, Rui Yan, Xiuying Chen
Reinforcement learning for reasoning-oriented embodied manipulation with large vision-language models.
embodied AIvision-language modelsreinforcement learning
ACL 2026
ServImage: An Image Generation and Editing Benchmark from Real-world Commercial Imaging Services
Fengxian Ji, Jingpu Yang, Zirui Song, Lang Gao, Junhong Liang, Zhenhao Chen, Jinghui Zhang, Xiuying Chen
A benchmark connecting image generation and editing outputs to economic value in real-world commercial design tasks.
image generationbenchmarkcommercial design
ACL 2026 Findings
PrefIx: Understand and Adapt to User Preference in Human-Agent Interaction
Jialin Li, Zhenhao Chen, Hanjun Luo, Hanan Salam
A benchmark and interaction framework for evaluating whether LLM agents infer, adapt to, and respect user preferences.
LLM agentspersonalizationhuman-agent interaction
ACL 2026
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen
A benchmark for evaluating jailbreak vulnerabilities in large audio-language models under realistic adversarial audio attacks.
audio-language modelssafetybenchmark
ICLR 2026
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Loka Li, Wong Yu Kang, Minghao Fu, Guangyi Chen, Zhenhao Chen, Gongxu Luo, Yuewen Sun, Salman Khan, Peter Spirtes, Kun Zhang
A multimodal dataset resource centered on LLM-inferred behavior traits.
multimodaldatasetsbehavior traits
NAACL 2025 Long Paper
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen
A robotics and language study on proactive anomaly detection and resolution in daily environments.
roboticsanomaly detectionlanguage
arXiv Preprint
MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
Zirui Song, Yaohang Li, Meng Fang, Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen
A multimodal agent collaboration copilot for operating system workflows.
multimodal agentscopilotsystems
ICCV 2024
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot
A benchmark for evaluating cross-style visual capability in large multimodal models.
multimodalbenchmarkvision
ICML 2024
Empowering Graph Invariance Learning with Deep Spurious Infomax
Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang
A graph invariance learning method focused on spurious information and robust representation learning.
graph learninginvarianceICML