Principal Component Analysis (PCA) to compress computer graphics data

Principal Component Analysis (PCA) is used for compressing data of precomputed radiance transfer (PRT). In this article, I will explain how we can use PCA for computer graphics data by using slides of CPCA in SIGGRAPH 2003.


First, we think about a 2d surfaces with 2d normals.
The number of surface points is 14, so we need 14 * 2 = 28 floats to express this data.


Next, these normals can be mapped to the unit circle.


And we can cluster the normals into 3 cluster groups. These clustering is called VQ ( vector quantization http://d.hatena.ne.jp/hanecci/20121223 ).


If we use the cluster mean for representative normals, the number of data is compressed from (14 * 2) to (3 * 2).


On the other hand, we can compress the data by using PCA. By using PCA, we can express the data with lower dimensional linear approximation.
For this sample, we first compute the mean of the normal (2 parameters) and 1d PCA axis (2 parameters). And compute the scaler weight of the each surface points (14 parameters). So the number of the data is compressed from 14 * 2 to (2 + 2 + 1 * 14). To compute the scaler weight of the each surface points, we substract the point by the mean, and project it to the 1d PCA axis.


But the approximate error is big for both VQ and PCA. So, we first use VQ and compute PCA for each clusters. This method is called CPCA ( Clustered PCA ). In this example, the number of data is compressed from (14 * 2) to (2 + 2 + 1 * 6 ) + ( 2 + 2 + 1 * 6 ) + ( 2 + 2 + 1 * 2 ) by using CPCA.



This example is 2d and used to explain the VQ, PCA and CPCA. But CPCA works well when we handle some high dimension vectors. ( http://d.hatena.ne.jp/hanecci/20130102/p2 )