Zhiqiu Xu / 徐之秋 / Oscar

I am a Ph.D. student at University of Pennsylvania, advised by Prof. Mayur Naik. I seek to understand how neural networks work and develop simple methods to improve neural network training.

I complete my Master of Science degree from UC Berkeley, advised by Prof. Trevor Darrell , where I work on computer vision and representation learning.

Before that, I was an undergraduate student at UC Berkeley, majoring in Computer Science and Applied Mathematics. I also interned at Twitter.

I was closely mentored by Dr. Zhuang Liu and Ph.D. candidate Ruiqi Zhong at UC Berkeley. They are primarily reasons why I dived into research. That is also why I am open to mentoring junior students on research. See Mentorship section.

Email  /  Google Scholar  /  Twitter  /  Github  /  Wechat

profile photo

A Coefficient Makes SVRG Effective
Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu
ICLR 2025

In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks. we introduce a multiplicative coefficient α to control the strength and adjust it through a linear decay schedule. We name our method α-SVRG.

Initializing Models with Larger Ones
Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu
ICLR 2024 (Spotlight)

We introduce weight selection, a method for initializing models by selecting a subset of weights from a pretrained larger model. With no extra cost, it is effective for improving the accuracy of a smaller model and reducing its training time needed to reach a certain accuracy level.

Dropout Reduces Underfitting
Zhuang Liu*, Zhiqiu Xu*, Joseph Jin, Zhiqiang Shen, Trevor Darrell (* equal contribution)
ICML 2023

We propose early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.

Anytime Dense Prediction with Confidence Adaptivity
Zhuang Liu, Zhiqiu Xu, Hung-ju Wang, Trevor Darrell, Evan Shelhamer
ICLR 2022

Our full method, named Anytime Dense Prediction with Confidence (ADP-C), achieves the same level of final accuracy with HRNet-w48, and meanwhile significantly reduces total computation.


Just like every other field in human society, the flourishing academic community is reliant on passing the torch of knowledge. If you are interested in my research and deep learning related topics and want to chat with me, please don't hesitate to reach out. Here is a list of students I have mentored. One of them said I was a legendary mentor, and let's trust his judgement for this time.

  • Joseph Jin
  • Maolin Mao
  • Yanjie Chen
  • Academic Services

    Reviewer for ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV.

    • My Chinese name is pronounced as Shoe-Jee-Chyo. I will be very happy if you try to pronounce that in front of me.
    • Occasionally, I post some vlogs on bilibili. Here's my homepage.
    • My profile photo is shot in a studio, I never dress like that in real life.
    • I proudly named myself as Oscar after watching Shark Tale at 6 years old.

    Website design from Jon Barron Borrowed from Ruiqi Zhong