Stone Tong ZHANG

I am currently a research intern at vivo AI Research, working on controllable image generation, latent world models, and multimodal reasoning. Previously, I received my M.S. in Computer Engineering from the University of California, Irvine and my B.Eng. from Southern University of Science and Technology. My research focuses on bridging vision and language through structured representation, world simulation, and step-by-step visual reasoning. Feel free to get in touch!

Email / GitHub / LinkedIn

“What magical trick makes us intelligent? The power of intelligence stems from our vast diversity, not from any single, perfect principle.”

— Marvin Minsky, The Society of Mind, p. 308

Sep. 2019–Jun. 2023, Southern University of Science and Technology, BEng, Computer Science and Engineering
Sep. 2022–Jun. 2024, University of California, Irvine, M.S., Computer Engineering

Academic Projects

Unintended Side Effects of Defense Mechanisms in Large Language Models

Prof. Haizhou Li, CUHK-Shenzhen
PyTorch, LaTeX

Conducted a comprehensive analysis of 11 defense mechanisms applied to 6 LLMs, evaluating their impact on model performance, over-refusal, and token overhead
Proposed 9 meta-defenders to systematically analyze trade-offs between safety and utility in model responses
Provided actionable insights on designing robust LLMs for safety-critical applications

Human-Readable SVG Generation with Vision Language Models

Assistant Prof. Haohan Wang, UIUC
PyTorch

Developed S²VG², an innovative method integrating vision language models for scalable vector graphics (SVG) generation
Curated the SVG-SHAPE dataset to benchmark SVG generation and model reasoning capabilities
Demonstrated state-of-the-art performance in SVG reasoning of LLMs and vision metrics

One-shot Controllable Head Avatar Creation

Prof. Xiaohui Xie, UCI
PyTorch

Pioneered CVTHead, a framework for point-based neural rendering from monocular images
Conducted comprehensive benchmarks against leading methods for cross-identity reenactment
Demonstrated state-of-the-art approaches on VoxCeleb1 and VoxCeleb2, with improved efficiency (Accepted by WACV 2024)

Trajectory Prediction and Driving Video Caption

Assistant Prof. Hao Zhao, THU
NumPy, PyTorch

Predicted trajectory on a new interactive motion dataset using AgentFormer and Trajectron++
Trained a novel end-to-end transformer generating description and explanation of the video
Demonstrated state-of-the-art performance (co-authored paper accepted to ICRA 2023)

Professional Experience

Image Editing via Reasoning

vivo AI Research
PyTorch, Python

Developed an energy-based heatmap refinement algorithm for high- fidelity image editing
Explored VLM + CoT paradigms for multi-round editing via “generate–reflect–revise” visual reasoning

Lightweight OCR Models for OpenCV

OpenCV @ Google Summer of Code
PyTorch, ONNX, C++

Implemented the detection part of PP-OCRv2 model in OpenCV Zoo via ONNX
Developed a high-level C++ API for PP-OCRv2 in OpenCV
Built evaluation metrics for text detection (AP, Recall, Precision, Hmean) in OpenCV Zoo

Publications

Tong Zhang, Yiming Chen, Simin Chen, Zexin Li, Xianghu Yue, Cong Liu, Chenyu You, Wei Yang, Haizhou Li, and Tao Xie, “Unintended Side Effects of Defense Mechanisms in Large Language Models: A Comprehensive Study”, Under Review, 2025.
Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang, “Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models”, Preprint, 2024.
Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, and Xiaohui Xie, “CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer”, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou and Jingjing Liu, “ADAPT: Action-aware Driving Caption Transformer”, IEEE International Conference on Robotics and Automation (ICRA), 2023.

Facts about Me

Hometown: Wuhan
Idol: Richard Feynman
Dream: To be a great researcher and design influential software
I enjoy finding potential topics from active discussions and take pride in my creativity. I also write poems and blogs

This page has been accessed

times since Jan. 10, 2023.