Principal Component Analysis (PCA) aims to represent high-dimensional data in lower dimensions, while keeping as much of the original information as possible. In this demo, the data points sit in 2D space but are themselves 1-dimensional since all points are collinear. Thus, we can describe these data completely with the single vector going in that direction. We can reduce the dimensionality of the data without losing any information.

As you click and drag the data points around, you will see two perpendicular line segments representing the direction of maximum and minimum variance in the data. The size of a line segment increases when more of the variance can be represented by that vector. Any time the data points look like a line, the largest component aligns to it.

I've heard that a good rule of thumb is to discard the smallest component—and keep the principal component—if the the length of the longest component is at least 99% of the summed length of both components. This is so that we don't lose too much information in the process. It is not likely that this ratio is exactly 100% as it was in this demo (where the data sat perfectly in a line), since some noise is inherent in most, if not all, practical data.

Notice that when we move a data point so much that it becomes a far outlier (if we have "gross sparse noise"), PCA behaves poorly. In order to solve this problem, we can resort to Robust Principal Component Analysis (RPCA), which tries to decompose the data into the sum of a low-rank matrix and a sparse noise matrix.

EDIT — Here is a great visual explanation of PCA: http://setosa.io/ev/principal-component-analysis/