Math 285 Course Page
Back to my homepage

MATH 285: Selected Topics in High Dimensional Data Modeling

Fall 2015, San Jose State University

Course description

This is an advanced topics course in machine learning with big data [syllabus]. Topics to be covered include:
  1. Singular value decomposition (SVD)
  2. Dimensionality Reduction
  3. Spectral Clustering
  4. Subspace Clustering
  5. Compressive Sensing
  6. Dictionary Learning
and their applications to image processing. There is no required textbook; we will cover material from various sources (papers, websites, etc.).

Useful textbooks

Some chapters of the following books have overlap with the material taught in this course:

Homework

Course project

This course ends with a project that should be reported in the form of an oral presenation in class and/or a report (see here for instructions).

Learning resources

MATLAB resources

Suggested papers

Principal Component Analysis (PCA)

Multidimensional Scaling (MDS)

Isometric Feature Map (ISOmap)

Kernel Principal Componenet Analysis (Kernel PCA)

  • This is a relatively easy-to-read paper on Kernel PCA (you can ignore the sections about active shape models)
  • Here is a nice blog that tries to explain Kernel PCA with the Gaussian kernel (also called RBF kernel)
  • Read this paper for mathematical derivation of Kernel PCA; the longer version of the paper is available at this link

Clustering basics and kmeans clustering

See below for two excellent lectures: How to initialize kmeans:
  • kmeans++ [slides] [paper]. It has been implemented in Matlab 2014b as the default.
  • kmeans// (parallelized kmeans++ for large data sets) [paper]
How to determine the number of clusters:

Spectral clustering

  • A (long) tutorial on spectral clustering [paper]
  • Normalized cuts and image segmentation [paper] [software]
  • On spectral clustering: analysis and an algorithm [paper]
  • Self-tuning spectral clustering [paper] [webpage]

Subspace clustering

Dictionary learning


Data sets


Useful course websites


Instructor feedback

This is an experimental course in data science, being taught at SJSU for the first time. Your feedback (as early as possible) is encouraged and greatly appreciated, and will be seriously considered by the instructor for improving the course experience for both you and your classmates. Please submit your annonymous feedback through this page.