I am a final-year Ph.D. candidate at Harvard University, advised by Prof. Todd Zickler.
I also work closely with Prof. Ko Nishino and have spent two wonderful summers in Kyoto.
Email: xinranhan [at] g [dot] harvard [dot] edu
Previously, I graduated from the University of Pennsylvania majoring in Mathematics and Computer Science.
During my undergrad, I was fortunate to work with Prof. Jianbo Shi
and Prof. Dan Roth.
My research lies at the intersection of generative models, computer vision, and multimodal learning.
I develop efficient architectures, representations and inference-time algorithms for visual perception and reasoning,
drawing on insights from mathematical modeling and human visual perception.
More broadly, I aim to build perceptually grounded visual systems that generalize robustly and learn efficiently.
Some papers are highlighted.
We introduce derivative representation alignment (dREPA) for image-to-video generation and show it improves
subject consistency and leads to better generalization across artistic styles.
We show that a novel pixel-space video diffusion model trained from scratch estimates accurate
shape and material from short videos, and also produces diverse shape and material samples for
ambiguous input images.
We present a bottom-up, patch-based diffusion model for monocular shape from shading that produces multimodal outputs,
similar to multistable perception in humans.
We present new theoretical insight on the equivalence of multi-task and single-task learning
for stationary kernels and develop MPHD for model pre-training on heterogeneous domains.
We present a neural model for inferring a curvature field from shading images that is invariant under lighting and texture variations,
drawing on perceptual insights and mathematical derivations.