I used python to generate a correlated data set for testing, and then plotted a basic linear least-squares fit. The result looked a bit strange to me, because the line doesn't really seem to pass "centrally" through the data. It looks a little "tilted": So, instead, I then diagonalized the covariance matrix to obtain the eigenvector that gives the direction of maximum variance. This is shown by the black arrow in the figure below. This one does indeed point in the direction that I would expect: So, I am looking for a bit of an intuitive explanation for what is going on here. I know that they are not measuring the same thing - the fit is minimizing the sum of the squared "errors", the vertical distances between the measured values and the fitted model. Whereas the eigenvector is chosen to maximize variance. But I guess I am surprised by the result - I would've expected the fit to go through the center of the data cluster, and not have this bias. Is it something to do with the fact that minimizing the vertical distances somehow breaks the symmetry in the system? Is it inappropriate to apply a basic linear fit to such data? Here is the code: import numpy as np import matplotlib.pyplot as plt def get_correlated_dataset(n, dependency, mu, scale): latent = np.random.randn(n, 2) dependent = latent.dot(dependency) scaled = dependent * scale scaled_with_offset = scaled + mu return scaled_with_offset[:, 0], scaled_with_offset[:, 1] # Generate Data dependency = [[30, 30], [30, 2]] mu = [150,-100] scale = 1 x, y = get_correlated_dataset(20000, dependency, mu, scale) # Calculate direction of maximum variance (eigenvector of covariance matrix) covariance_matrix = np.cov( np.stack((x, y), axis=0) ) EIGVALS,EIGVECS = np.linalg.eig( covariance_matrix ) vec = EIGVECS[:, np.argmax(EIGVALS)] # Get the eigenvector corresponding to largest eigenvalue # Magnitude of vector, 3 standard deviations, and mean (for plotting arrow) vec_norm = np.sqrt(vec[0]**2 + vec[1]**2) sigma3 = np.sqrt(EIGV...
First seen: 2026-01-04 21:21
Last seen: 2026-01-05 17:25