Sparse PCA
Mar 21st 2005Wallabynerd alert
My advisor emailed me late last night about a statistics lecture one of our brand-new collaborators was giving today. He had to leave early today, so guess who got to go instead?
It was actually a lot better than I’d feared — it was indeed a non-technical survey of “non-linear dimensionality reduction by manifold embedding”, which meant a little less than one equation per slide. And parts of it were really very relevant to what we’re doing, so it was good background.
One thing she talked about was “sparse”
The problem with “classical” PCA is that first, it’s hard to know how many components should be used. A particular program might give us a list of 50 different components, most of which describe variation we just don’t care about — i.e., noise. But should we take the 3 biggest components? or 4? or 5? Secondly, PCA requires a *linear* relationship between the different components… that is, C1 + C2 + C3 = Data. What if, instead, it’s C1 + C2 + C3^2 = Data? PCA will fail, and we won’t necessarily notice that it’s failed. Third, PCA requires a pretty good-sized chunk of data… if there’s 10 different ways the data could vary (i.e., 10 dimensions), then you really need to measure the effects of varying all of them. Finally, for large data sets it can take a *lot* of computer time to calculate the principal components.
“Sparse” PCA apparently improves on at least two of these: It handles sparse data sets (n < < p) by selecting a subset of the data (or of the sample space, rather) which *is* nicely sampled. And, since it only needs to work with this smaller subset of the data, the calculations get a lot simpler. How? By first representing the data in some other form, using a Fourier transform or wavelets. The interesting (high-signal) parts of the transformed data then get PCA’d (yes, it’s a verb — “principal-component analyzed”, maybe?), the first few components are used to reconstruct the transformed data, and then the reconstruction is reverse-transformed back to a nice clean result. Their algorithm also throws in the bonus prize that there’s a handy objective standard for picking the appropriate number of components (dimensions), rather than guessing.
The rest of the talk was mostly about how to handle cases where the data aren’t regularly spaced, or data sets in which there are non-linear relationships. I don’t think that’s particularly applicable to our stuff, although the postdoc I dragged along seemed to disagree — I think he picked out “non-linear” as matching the fact that there’s a non-linear relationship between the intensity of light absorbed by a sample and the concentration of the absorbing species. But that’d be the actual data at each point, surely, rather than, say, a non-linear dependence on the sampling depth into a specimen? Although I guess he’s right, it *is* non-linear in the sense she was talking about. Of course, it’s nicely logarithmic, so it’s pretty easy to transform it to linear data.
Anyway. Now I actually want to try some of this… it’s a very rare “theory” talk that makes me do that. ![]()
One Response to “Sparse PCA”
Wow! This is terrrific stuff! Sounds wonderfull! I’d like to hear more as it developes. I did just a tiny little bit, many, many years ago at Lockeed Georgia Company, in Marietta, Ga. This was the summer that your mother and I were first married… I was tasked with converting a fortran (there simply wasn’t anything else, in those days, and it served us well) program for the IBM 7094, a transistorized(!) computer that I had done systems programming in assembler on the summer before, just after graduation and commission ceremonies. I was to put the program onto their brand-new IBM 360-95, a timesharing computer, using integrated circuits(!)
and this program was a linear or nor-linear modelling least-squares regression analysis.
Well, I did it, then I got to help engineers learn how to use it… More anon.