Unsupervised Learning - Course Notes
Overview#
Unsupervised learning is subjective.
Unsupervised learning is useful during Exploratory Data Analysis.
requirements: linear algebra
Two Main Tasks#
1. Dimensionality Reduction#
- Requires centered data
- Dimension
mneeds to be determined - Example: Principal Component Analysis (PCA)
2. Clustering#
- Ill-posed problem
- Euclidean distance performs poorly in high dimensions
- Cluster number
kneeds to be determined - Example: K-means clustering
Principal Component Analysis (PCA)#
Key Terms#
Principal Components (PCs):
- First PC, Second PC, ...
- PC is a linear combination of loading vectors and data
Loading Vectors:
- The loadings are normalized
- Loading vectors are the slopes
- Indicate how much the current feature contributes to the PC
Singular Value Decomposition (SVD):
- Formula:
X = USV^T U: Left singular vectors (orthogonal matrix)V: Right singular vectors (orthogonal matrix)S: Diagonal matrix (singular values)
Choosing Dimensions#
You can choose the dimension you want to reduce to by selecting the number m in diagonal matrix S.
Information loss can be measured by matrix S:
- Total variance vs. Retained variance
K-means Clustering#
Algorithm Approach#
K-means uses an iterative algorithm instead of directly optimizing the mathematical target function (which is NP-hard).
Determining K#
K-means can use heuristic approaches to find a good cluster number K.
Key Takeaways#
- Unsupervised learning is subjective and exploratory
- PCA: Reduces dimensions while preserving variance
- K-means: Iterative clustering with predetermined K
- Both methods require parameter tuning (m for PCA, k for K-means)