0.0 Course directors
Saul Kato ([email protected]), Christoph.Kirst ([email protected])
TAs: Jackson Borchardt ([email protected]), May Li ([email protected]), Jianhui Chen ([email protected])
0.1 Course description
Many machine learning approaches can be thought of as a process of encoding high-dimensional data items into a low-dimensional space, then (optionally) decoding them back into a high-dimensional data space. This paradigm encompasses the endeavors of dimensionality reduction, feature learning, classification, and of particular recent excitement, generative models. It has even been proposed as a model of human cognition. This course will survey uses of encoder-decoder models in current neuroscience research and teach the basic theory and practice of applying these models. Lectures will be given by UCSF and other neuroscientists or machine learning practitioners.
0.2 Schedule
MWF 9:30-11:30am | Room MH-1407 (M) or 1406 (WF) in Mission Hall
First class: Monday April 20, 2026
Last class: Friday May 8, 2026
8 lectures from instructors and guests, plus student presentations at the last class.
Lecture schedule, subject to change:
https://docs.google.com/spreadsheets/d/1TUW6xFYSmklDIJ-tOkWLwdzBeZT9NnlhX1eUfArqnOE/edit?gid=0#gid=0
0.3 Project, group or individual
Proposal:
250-word maximum proposal (.pdf), due Sunday April 26 at midnight emailed to [email protected]. Figures optional. References optional but appreciated.
Be sure to answer:
(1) What data are you analyzing?
(2) What question(s) would you like to ask of your data?
(3) What model / algorithm will you try first?
(4) What "positive computational control" (labeled ground truth data and/or synthetic data) will you validate your model on?
(5) What performance metric(s) will you assess your model with?
Deliverables:
(1) .ipynb (Jupiter) notebook or git repo with a step-by-step readme due Thursday May 7 at midnight, emailed to [email protected], just a URL is fine. [we will try to run your code]
(2) 10 minute presentation on May 8 (on Google Slides, max 10 slides)
(3) course evaluation filed by end of class of May 8 (important!).
Rubric:
you will get a Pass if you
(1) have acquired or found experimental/real-world data to analyze
(2) have created a positive control dataset (human-labeled ground truth data and/or synthetic data)
(3) have shown your model to give good results on your positive control data (and can explain what “good” means)
(4) have measured the performance of your model on real-world (held out) probe data, or even better, done some jackknifing
you will get a Pass- if you:
do all of the above but do not properly hold out your probe data
you will get a Pass+ if you:
(1) use your model to "hallucinate" new data
(2a) show that your model outperforms baseline naive analysis (e.g. "guess the mean")
and/or
(2b) tweak your model at least once to improve model performance
Project tip: if you need to get up and running in a Python notebook with a minimum of fuss, this is a good way to do it: https://colab.research.google.com/
0.4 Bibliography
Lecture 1: the encoder-decoder framework
Background reading
No BS Guide to Linear Algebra (2020, Savov)
Principal Component Analysis (2002, Jolliffe) [pdf for UC] yes, you can easily write a whole book on PCA. yes, it is worth reading
Pattern Recognition and Machine Learning (2006, Bishop) [pdf] probably the most used pre-deep-learning-era ML textbook. Highly readable
Generative Deep Learning, 2nd edition, O’Reilly Series (2023, Foster)
History of neural networks
McCulloch, Warren S.; Pitts, Walter (1943-12-01). "A logical calculus of the ideas immanent in nervous activity". The Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259
Rosenblatt, Frank (1958) "The perceptron: A probabilistic model for information storage and organization in the brain". Psychological Review. 65 (6): 386–408. doi:10.1037/h0042519
Minsky, Marvin; Papert, Seymour (1988) Perceptrons: An Introduction to Computational Geometry. MIT Press. The proof that 2-layer perceptrons couldn’t compute the XOR function triggered the “AI winter”.
Rumelhart D., Hinton G., Williams R. (1986) “Learning representations by back-propagating errors”. Nature. Showed how you could train a multi-layer neural network by gradient descent and applying the chain rule to compute d(cost function)/d(parameter) for all network parameters.
Universal function approximation
Palm G (1979) On the representation and approximation of nonlinear systems. Part lI: Discrete time. Biol Cybern 34:49-52
Kolmogoroff, A.N. (1957) “On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition” (Russian). Dokl. Akad. Nauk. SSSR 114:953-956; 1957; AMS Transl. 2:55-59; 1963.
Hornik K, Stinchombe M, White H. (1989) “Multilayer Feedforward Networks are Universal Approximators” Neural Networks, Vol. 2, pp. 35Y-366
VAEs
Kingma D, Welling M (2013) “Auto-encoding Variational Bayes”
2D embedding methods: t-SNE and UMAP
https://pair-code.github.io/understanding-umap/ awesome interactive comparison
Transformers
Vaswani, A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, and Polosukhin I (2017) “Attention is all you need.” Advances in Neural Information Processing Systems.
https://www.youtube.com/@statquest explainer youtube videos about neural networks
Intro to Neuronal Networks:
https://ucsf.box.com/s/q8tcvu4hsugnik6a0w3rylv0b2zou1ri
Auto-grad via Pytorch:
https://ucsf.box.com/s/693gbycfrx5ag2avxtuwa06j041g47ii
Dimension Reduction:
https://ucsf.box.com/s/q8tcvu4hsugnik6a0w3rylv0b2zou1ri
https://ucsf.box.com/s/eeahu5apn7osp4bq2nseud0rjzlll1rd
https://ucsf.box.com/s/rldjjiscapz8bk1fw10wtx5hm05xrzn1
Transformer / SSM:
https://ucsf.box.com/s/61579d5enk2g2r7xxhhvlgwyhrm5cbth
Questions: email saul.kato /at/ ucsf.edu, christoph.kirst /at/ ucsf.edu
Day 1. Coding exercise