ClawPet
A voice-first embodied AI companion with simple physical expression, agent-to-agent chat, and a 3D-printable shell for desktop life.
I am a Ph.D. student in Computer Science at Fudan University, advised by Prof. Tao Xie and Prof. Wei Yang. Previously, I received my M.S. in Computer Engineering from the University of California, Irvine and my B.Eng. from the Southern University of Science and Technology.
My research focuses on Coding Agent and Multimodal Reasoning. Feel free to get in touch!
“What magical trick makes us intelligent? The power of intelligence stems from our vast diversity, not from any single, perfect principle.”
— Marvin Minsky, The Society of Mind
A voice-first embodied AI companion with simple physical expression, agent-to-agent chat, and a 3D-printable shell for desktop life.
An AI-powered mock interview experience for practicing technical and behavioral interviews with interactive feedback.
A reusable LaTeX blog system template with blog-style landing pages, automated compilation, and GitHub Pages deployment.
Conducted a comprehensive analysis of 11 defense mechanisms applied to 6 LLMs, evaluating their impact on model performance, over-refusal, and token overhead.
Developed S²VG², an innovative method integrating vision language models for scalable vector graphics (SVG) generation.
Pioneered CVTHead, a framework for point-based neural rendering from monocular images.
Predicted trajectory on a new interactive motion dataset using AgentFormer and Trajectron++.
For a more complete publication list, please see my CV.
Tong Zhang, Yiming Chen, Simin Chen, Zexin Li, Xianghu Yue, Cong Liu, Chenyu You, and Haizhou Li, “Unintended Side Effects of Defense Mechanisms in Large Language Models: A Comprehensive Study”, Under Review.
Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang, “Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models”, Preprint, 2024.
Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, and Xiaohui Xie, “CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer”, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, and Jingjing Liu, “ADAPT: Action-aware Driving Caption Transformer”, IEEE International Conference on Robotics and Automation (ICRA), 2023.
Sep. 2025 – Dec. 2025
Developed a WeChat Mini Program agent based on vision-language models, and designed agent memory together with experience extraction mechanisms.
May 2025 – Aug. 2025
Proposed an energy-based attention refinement method for image editing, and explored generate–reflect–reason loops for multi-round controllable optimization.
Nov. 2024 – Apr. 2025
Studied defense mechanisms for large language models, their side effects, and token compression for audio language models.
May 2022 – Sep. 2022
Implemented PP-OCRv2 detection in OpenCV Zoo, high-level C++ APIs, and evaluation metrics for text detection.