ClawPet
A voice-first embodied AI companion with simple physical expression, agent-to-agent chat, and a 3D-printable shell for desktop life.
I am a Ph.D. student in Computer Science at Fudan University, advised by Prof. Tao Xie and Prof. Wei Yang. Previously, I received my M.S. in Computer Engineering from the University of California, Irvine and my B.Eng. from the Southern University of Science and Technology.
My research focuses on Coding Agent and Multimodal Reasoning. Feel free to get in touch!
“What magical trick makes us intelligent? The power of intelligence stems from our vast diversity, not from any single, perfect principle.”
— Marvin Minsky, The Society of Mind
A voice-first embodied AI companion with simple physical expression, agent-to-agent chat, and a 3D-printable shell for desktop life.
An AI-powered mock interview experience for practicing technical and behavioral interviews with interactive feedback.
A reusable LaTeX blog system template with blog-style landing pages, automated compilation, and GitHub Pages deployment.
A systematic study of how jailbreak defenses trade off safety against task performance, over-refusal, and inference cost. Benchmarks 11 defenses across 6 open-source LLMs and 5 datasets, organizing them by operational strategy to expose their deployment-time side effects.
Developed S²VG², an innovative method integrating vision language models for scalable vector graphics (SVG) generation.
Pioneered CVTHead, a framework for point-based neural rendering from monocular images.
Predicted trajectory on a new interactive motion dataset using AgentFormer and Trajectron++.
For a more complete publication list, please see my CV.
Tong Zhang, Zexin Li, and Simin Chen, “When LLM Defenses Backfire: Characterizing Safety, Performance, and Cost Trade-offs”, Under Review.
Sinin Zhang, Yunfei Xie, Yuxuan Cheng, Haoyu Zhang, and Tong Zhang, “PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model”, ES-Reasoning @ ICLR, 2026.
Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang, “Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models”, Preprint, 2024.
Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, and Xiaohui Xie, “CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer”, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024.
Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, and Jingjing Liu, “ADAPT: Action-aware Driving Caption Transformer”, IEEE International Conference on Robotics and Automation (ICRA), 2023.
Sep. 2025 – Dec. 2025
Developed a WeChat Mini Program agent based on vision-language models, and designed agent memory together with experience extraction mechanisms.
May 2025 – Aug. 2025
Proposed an energy-based attention refinement method for image editing, and explored generate–reflect–reason loops for multi-round controllable optimization.
Nov. 2024 – Apr. 2025
Studied defense mechanisms for large language models, their side effects, and token compression for audio language models.
May 2022 – Sep. 2022
Implemented PP-OCRv2 detection in OpenCV Zoo, high-level C++ APIs, and evaluation metrics for text detection.