I am a final-year Ph.D. candidate at Harvard University, advised by Prof. Todd Zickler.
I also work closely with Prof. Ko Nishino and have spent two wonderful summers in Kyoto.
Email: xinranhan [at] g [dot] harvard [dot] edu
Previously, I graduated from the University of Pennsylvania majoring in Mathematics and Computer Science.
During my undergrad, I was fortunate to work with Prof. Jianbo Shi
and Prof. Dan Roth.
I'm broadly interested in computer vision, generative models, multimodal learning.
Specifically, my research combines data-driven methods with mathematical modeling and
insights from the human visual system. My goal is to build generative world models that
are robust, ambiguity-aware, and capable of efficient, human-like generalization.
Some papers are highlighted.
We introduce derivative representation alignment (dREPA) for image-to-video generation and show it improves
subject consistency and leads to better generalization across artistic styles.
We show that a novel pixel-space video diffusion model trained from scratch estimates accurate
shape and material from short videos, and also produces diverse shape and material samples for
ambiguous input images.
We present a bottom-up, patch-based diffusion model for monocular shape from shading that produces multimodal outputs,
similar to multistable perception in humans.
We present new theoretical insight on the equivalence of multi-task and single-task learning
for stationary kernels and develop MPHD for model pre-training on heterogeneous domains.
We present a neural model for inferring a curvature field from shading images that is invariant under lighting and texture variations,
drawing on perceptual insights and mathematical derivations.