S&DS 659: Mathematics of Deep Learning

Description

The goal of this course is to provide an introduction to selected topics in deep learning theory. I will present a number of mathematical models and theoretical concepts that have emerged in recent years to understand neural networks.

Lectures: Wednesdays 4:00pm–5:50pm
Office Hours: Thursdays 4:00pm–5:00pm, Kline Tower 1049

Prerequesites: I will not assume specific background in machine learning, let alone neural networks. On the other hand, I will assume a degree of mathematical maturity, in particular in linear algebra, analysis, and probability theory (at the level of S&DS 241/541).

Assignments: Scribe one lecture during the semester. You will have to write a report on a research topic related to deep learning theory, and give a presentation at the end of the semester.

Course syllabus

  • Week 1: General Introduction

    • Empirical risk minimization and the classical paradigm of statistical learning.

    • Tractability via overparametrization, implicit bias.

    • Universal approximation, Barron's theorem, uniform convergence.

  • Week 2: Generalization and Uniform Convergence

    • Basics of uniform convergence theory.

    • Norm-based uniform convergence for multilayer NNs.

  • Week 3: Implicit Bias

    • Implicit bias of learning algorithms.

    • Examples of mirror descent and steepest descent.

  • Week 4: Benign Overfitting/Double Descent

    • Overfitting and double descent phenomena.

    • Benign overfitting in linear regression, self-induced regularization.

    • Inner-product kernels on the sphere.

  • Week 5: Lazy Regime and NTK

    • Lazy training regime in optimization.

    • Global convergence of two-layer NNs.

    • Neural Tangent Kernel.

  • Week 6: Kernel Methods

    • Background on kernel methods.

    • Deterministic equivalents for ridge regression.

    • Curse-of-dimensionality, learning lower bounds for linear methods.

  • Week 7: Mean-Field Description

    • Infinite-width limits and mu parametrization.

    • Mean-field theory for two-layer NNs: McKean–Vlasov PDE and Optimal transport formulations.

    • Global convergence guarantees.

  • Week 8: Linear Methods vs Feature Selection vs Feature Learning

    • Convex neural networks.

    • Case study: multi-index functions.

    • Staircase mechanism.

  • Week 9: Power and Limitations of Differentiable Learning

    • Computational hardness of deep learning.

    • Poly-time universality of SGD on NNs.

  • Week 10: High-dimensional Landscapes and Dynamics

    • Landscape concentration.

    • Dynamics on non-convex problems in high dimensions.

  • Week 11: Transformers, Attention, and In-Context Learning

    • Transformer architecture, attention layer.

    • In-context linear regression.

  • Week 12: Edge-of-Stability, Neural Scaling Laws, Emergence, and Beyond

    • Review of a number of phenomena.

    • Open ended discussions.

  • Week 13: In-class presentations + pizza