Research

My research lies at the intersection of generative models, computer vision, and multimodal learning.

I develop efficient architectures, representations and inference-time algorithms for visual perception and reasoning, drawing on insights from mathematical modeling and human visual perception. More broadly, I aim to build perceptually grounded visual systems that generalize robustly and learn efficiently.
Some papers are highlighted.

Xinran (Nicole) Han, Matias Mendieta, Moein Falahatgar
Preprint, 2025 · Work done during internship at Apple

We introduce derivative representation alignment (dREPA) for image-to-video generation and show it improves subject consistency and leads to better generalization across artistic styles.

Xinran (Nicole) Han, Ko Nishino, Todd Zickler
NeurIPS, 2025

We show that a novel pixel-space video diffusion model trained from scratch estimates accurate shape and material from short videos, and also produces diverse shape and material samples for ambiguous input images.

Xinran (Nicole) Han, Todd Zickler, Ko Nishino
NeurIPS, 2024Spotlight · Top 2%

We present a bottom-up, patch-based diffusion model for monocular shape from shading that produces multimodal outputs, similar to multistable perception in humans.

Zhou Fan, Xinran Han, Zi Wang
Transactions on Machine Learning Research (TMLR), February 2024

We present new theoretical insight on the equivalence of multi-task and single-task learning for stationary kernels and develop MPHD for model pre-training on heterogeneous domains.

Xinran Han, Todd Zickler
NeurIPS Workshop on Symmetry and Geometry in Neural Representations (PMLR 228), 2023

We present a neural model for inferring a curvature field from shading images that is invariant under lighting and texture variations, drawing on perceptual insights and mathematical derivations.

Soham Dan*, Xinran Han*, Dan Roth (* equal contribution)
Findings of EMNLP, 2021

Auxiliary objectives and instruction augmentation improve spatial reasoning in the 'blocks world' task, especially under limited data.

Ziqiang Zheng, Yang Wu, Xinran Han, Jianbo Shi
ECCV, 2020Oral · Top 2%

We introduce a task-agnostic image translation model ForkGAN that effectively disentangles domain-specific and domain-invariant information.

Invited Talks
Mar 2026
Generative Models for Perceptually-Consistent Computer Vision
Stanford University & UC Berkeley
Feb 2026
Perception as Generation: Navigating Ambiguity with Diffusion Models
Computer Science Colloquium, Harvard University
May 2025
Computational Models Exhibit Invariance and Multistability In Shape from Shading
Vision Sciences Society (VSS) Annual Meeting · [Abstract]
Jan 2025
Towards Aligning Human and Computer Shape Perception
Boston University
Jun 2024
Multistable Shape from Shading Emerges from Patch Diffusion
Kyoto University
Dec 2023
Curvature Fields from Shading Fields
New England Computer Vision Workshop (NECV)
Miscellaneous

Outside of research, I enjoy visiting art museums, watching movies and reading about philosophy and psychology.