Publication record

Publications

Research across reliable AI, self-correction, causal representation learning, multimodal systems, and world models.

Showing 15 of 15 publications

ICML 2026 Oral · FMs for Science, ICLR 2026 Workshop

CausalGame: Benchmarking Causal Thinking of LLM Agents in Games

Zhenhao Chen^*, Yongqiang Chen^*, Chenxi Liu^*, Junchi Yu, Xiangchen Song, Zijian Li, Jialin Li, Philip Torr, Bo Han, Kun Zhang

An interactive benchmark for evaluating how LLM agents design experiments, reason from biased evidence, and recover hidden mechanisms.

LLM agentsAI scientistbenchmarkexperimentation

Project Code

ICML 2025

Reflection-Window Decoding: Text Generation with Selective Refinement

Zeyu Tang^*, Zhenhao Chen^*, Xiangchen Song, Loka Li, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang

A decoding strategy that lets language models selectively revisit and refine past tokens during generation.

reasoningdecodingLLMs

Paper

arXiv Preprint

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of LLMs

Loka Li^*, Zhenhao Chen^*, Guangyi Chen^*, Yixuan Zhang, Yusheng Su, Eric Xing, Kun Zhang

A study of when models can detect and correct their own errors without external oracles, with confidence as a key factor.

self-correctionalignmentLLMs

Paper

ICML 2024

CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation

Guangyi Chen^*, Yifan Shen^*, Zhenhao Chen^*, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, Kun Zhang

A causal representation learning approach for temporal data and video understanding under non-invertible generation.

representation learningtemporal datacausal learning

PMLR

CVPR 2023

Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction

Guangyi Chen^*, Zhenhao Chen^*, Shunxing Fan, Kun Zhang

BOSampler uses unsupervised Bayesian optimization to adaptively mine potential future paths for trajectory prediction.

trajectory predictionBayesian optimizationCVPR

Paper

ICLR 2026 Workshop on AI with Recursive Self-Improvement · Spotlight

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang

A study of causal scratchpads as a mechanism for open-ended AI discovery and recursive self-improvement.

AI discoveryself-improvementagents

AAAI 2026

Maniplvm-r1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models

Zirui Song, Guangxian Ouyang, Mingzhe Li, Yuheng Ji, Chenxi Wang, Zixiang Xu, Zeyu Zhang, Xiaoqing Zhang, Qian Jiang, Zhenhao Chen, Zhongzhi Li, Rui Yan, Xiuying Chen

Reinforcement learning for reasoning-oriented embodied manipulation with large vision-language models.

embodied AIvision-language modelsreinforcement learning

ACL 2026

ServImage: An Image Generation and Editing Benchmark from Real-world Commercial Imaging Services

Fengxian Ji, Jingpu Yang, Zirui Song, Lang Gao, Junhong Liang, Zhenhao Chen, Jinghui Zhang, Xiuying Chen

A benchmark connecting image generation and editing outputs to economic value in real-world commercial design tasks.

image generationbenchmarkcommercial design

Paper

ACL 2026 Findings

PrefIx: Understand and Adapt to User Preference in Human-Agent Interaction

Jialin Li, Zhenhao Chen, Hanjun Luo, Hanan Salam

A benchmark and interaction framework for evaluating whether LLM agents infer, adapt to, and respect user preferences.

LLM agentspersonalizationhuman-agent interaction

Paper

ACL 2026

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen

A benchmark for evaluating jailbreak vulnerabilities in large audio-language models under realistic adversarial audio attacks.

audio-language modelssafetybenchmark

Paper

ICLR 2026

PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

Loka Li, Wong Yu Kang, Minghao Fu, Guangyi Chen, Zhenhao Chen, Gongxu Luo, Yuewen Sun, Salman Khan, Peter Spirtes, Kun Zhang

A multimodal dataset resource centered on LLM-inferred behavior traits.

multimodaldatasetsbehavior traits

Dataset

NAACL 2025 Long Paper

Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies

Zirui Song, Guangxian Ouyang, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen

A robotics and language study on proactive anomaly detection and resolution in daily environments.

roboticsanomaly detectionlanguage

arXiv Preprint

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Zirui Song, Yaohang Li, Meng Fang, Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen

A multimodal agent collaboration copilot for operating system workflows.

multimodal agentscopilotsystems

ICCV 2024

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot

A benchmark for evaluating cross-style visual capability in large multimodal models.

multimodalbenchmarkvision

Project

ICML 2024

Empowering Graph Invariance Learning with Deep Spurious Infomax

Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

A graph invariance learning method focused on spurious information and robust representation learning.

graph learninginvarianceICML