Spring 2026 · University of Missouri

MATH 9787: Mathematical Foundations of AI

Invited speaker series, concluded, 6 talks.

Invited Speakers

  • Mar 30 2026
    12:00 - 1:00 pm Zoom

    Qingsong Wang UCSD

    Steering Diffusion Models

    Guidance mechanisms enable controllable generation from diffusion models at inference time. Classifier guidance steers sampling using gradients from a noise-aware classifier, offering principled control but requiring a separately trained network. Classifier-free guidance eliminates the external classifier by interpolating conditional and unconditional predictions, yet demands paired training.

    Read moreShow less

    Training-free methods such as universal guidance repurpose off-the-shelf networks, but rely on per-step gradient optimization that is expensive and often unstable.

    In this talk, I present a general recipe for efficiently steering unconditional diffusion models without gradient guidance during inference. Our approach rests on two structural observations. First, noise alignment: even at early, highly corrupted stages of the reverse process, coarse semantic steering is possible using a lightweight, offline-computed guidance signal--no per-step or per-sample gradients required.

    Second, transferable concept vectors: a concept direction in activation space, once learned, transfers across both timesteps and samples. A single fixed steering vector learned near low noise levels remains effective when injected at intermediate noise levels for every generation trajectory, providing refined conditional control at negligible cost. These directions are identified via Recursive Feature Machines (RFM), a backpropagation-free feature learning method.

    Experiments on CIFAR-10, ImageNet, and CelebA demonstrate improved accuracy and generation quality over gradient-based guidance, with significant inference speedups.

  • Apr 8 2026
    12:00 - 1:00 pm Zoom

    Jakiw Pidstrigach Gridmatic

    Fine-tuning and Steering of Diffusions with Non-Differentiable Rewards

    We consider stochastic differential equations that are modified by reward functions or likelihood based weights in order to promote specific events. This perspective applies both to diffusion type models used in generative modeling and to SDEs describing physical phenomena such as molecular dynamics or weather. The main emphasis is on rewards that are non smooth or singular, as they appear in conditioning, threshold objectives, and rare event simulation.

    Read moreShow less

    We discuss diffusion bridges as a central example, where one seeks typical trajectories connecting prescribed endpoints, for instance during a molecular transition between stable states or between two atmospheric configurations. We also discuss fine tuning of diffusion models with non differentiable rewards, motivated by applications that prioritize tail events and other low probability regions.

  • Apr 10 2026
    12:00 - 1:00 pm Zoom

    Qing Qu University of Michigan

    Harnessing Low-Dimensionality for Generalizable and Trustworthy Generative AI

    Generative AI has rapidly transformed machine learning, with diffusion and autoregressive models achieving unprecedented performance across vision, language, and scientific discovery.

    Read moreShow less

    Despite this success, our theoretical understanding still lags far behind practice: why do these models generalize so effectively from finite data in high dimensions? In this talk, I present a mathematical framework that shows that intrinsic low-dimensional structure is the key to understanding this phenomenon and provides a foundation for building more trustworthy generative AI.

    Through the lens of mixtures of low-rank Gaussian models, I show that learning high-dimensional distributions can be reduced to a canonical subspace clustering problem. This connection yields provable guarantees: the sample complexity scales with the intrinsic dimension of the data, rather than the ambient dimension, thereby breaking the curse of dimensionality for generalization.

    I will then turn to the role of representation learning in generalization, using two-layer denoising autoencoders as a tractable model to show that the optimal representations and weight structures differ fundamentally between the memorization and generalization regimes. These results offer a unified perspective on how generative models both learn meaningful structure in latent spaces and synthesize new data in high dimensions.

    We translate these theoretical insights into practical guidelines for controlled generation, ensuring model safety and privacy. Finally, we conclude by contrasting the generalization performance of diffusion and autoregressive models in the context of state prediction for stochastic dynamical systems. These findings inform new data assimilation methods and provide critical insights across many scientific applications, and establish a foundation for next-generation generative modeling.

  • Apr 15 2026
    12:00 - 1:00 pm Strickland Hall 109

    Binxu Wang Harvard

    Diffusion Models Through the Linear Lens: Exact Analysis of Sampling, Learning, Receptive Fields, and Consistency

    Diffusion models are powerful generative systems, yet their internal mechanisms remain difficult to analyze. Taking a physicist's approach, we study the simplest tractable case: a diffusion model with a linear score function.

    Read moreShow less

    A key duality links architecture and distribution -- a Gaussian dataset implies a linear optimal score, and a linear score network implies the learned distribution is the Gaussian approximation of the data. This duality enables fully analytical treatment across four aspects of diffusion models.

    Sampling dynamics. The linear score yields a closed-form, low-dimensional, rotation-like sampling trajectory governed by data covariance -- and precisely predicts the early phase of pretrained diffusion models, revealing dominant linear structure across a wide range of noise scales. Learning dynamics. Deep linear networks admit analytical training dynamics, uncovering a spectral bias: structure is learned first along the top eigendimensions of the data.

    Receptive field structure. The effective receptive field is shaped by data covariance rather than architectural priors -- it need not be local or equivariant -- yielding predictions that extend recent work by Kamb and Ganguli. Sample consistency. Using random matrix theory, we predict sensitivity to dataset resampling, identifying which noise seeds yield consistent versus variable outputs.

    This work shows how a tractable linear regime provides a rigorous analytical lens into the sampling, learning, receptive field structure, and consistency of diffusion models -- with insights that extend surprisingly far into the nonlinear setting.

  • Apr 22 2026
    12:00 - 1:00 pm Zoom

    Ye He Georgia Tech

    Diffusion Model's Generalization via Data-Dependent Ridge Manifolds

    When a diffusion model is not memorizing the training samples, what does it generate, and why? In this talk, I will describe a quantitative framework for understanding the distribution produced by a learned diffusion model through a data-driven geometric object: a log-density ridge manifold of the smoothed training distribution.

    Read moreShow less

    This manifold acts as a backbone for generation and reveals a three-stage inference behavior: trajectories first reach the ridge, then align in normal directions, and finally slide along tangent directions.

    This perspective allows us to quantify how training error influences generation in different directions, and to explain when inter-mode generations arise. I will also present a random feature example in which the model's inductive bias can be decomposed explicitly into architectural bias and optimization error, and tracked along the inference dynamics. Experiments on synthetic multimodal distributions and MNIST latent diffusion support the theory in both low- and high-dimensional settings.

  • Apr 29 2026
    12:00 - 1:00 pm Zoom

    Mason Kamb Stanford

    Local Theories of Diffusion Model Generalization in High Dimensions

    Modern generative diffusion models are distinguished by their ability to generalize, consistently and robustly, in very high dimensional spaces. They produce a combinatorial explosion of novel images from a relatively small training set, subverting normal concerns about the curse of dimensionality. Yet, their generations also sometimes fall short, exhibiting distinctive flaws such as spatial inconsistency (e.g. excessive limbs).

    Read moreShow less

    I will discuss an analytical theory that, making only the assumptions of A) locality and B) (broken) equivariance, explains 1) how models are able to generalize combinatorially, mixing and matching from different images in their training data, 2) why models are able to generalize consistently and robustly in high dimensional spaces, and 3) mechanistically explains the origins of spatial consistency issues such as the "excess limbs" phenomenon.

    This theory is totally solvable in terms of the training dataset, and we show that it is able to predict on a case-by-case basis the behavior of certain classes of weak diffusion models: 1) small convolutional neural networks, and 2) diffusion models early in their training process. I will then comment on what is still needed to further explain the mysteries of generalization in more powerful models.