The first SVD mode (SVD1) explains 81.6% of the total covariance between the two fields, and the second and third SVD modes explain only 7.1% and 3.2%. Now we can use SVD to decompose M. Remember that when we decompose M (with rank r) to. Relationship between eigendecomposition and singular value decomposition. A set of vectors spans a space if every other vector in the space can be written as a linear combination of the spanning set. The smaller this distance, the better Ak approximates A. For example if we have, So the transpose of a row vector becomes a column vector with the same elements and vice versa. So their multiplication still gives an nn matrix which is the same approximation of A. D is a diagonal matrix (all values are 0 except the diagonal) and need not be square. SVD is based on eigenvalues computation, it generalizes the eigendecomposition of the square matrix A to any matrix M of dimension mn. What exactly is a Principal component and Empirical Orthogonal Function? - the incident has nothing to do with me; can I use this this way? \newcommand{\vk}{\vec{k}} \newcommand{\vy}{\vec{y}} We see Z1 is the linear combination of X = (X1, X2, X3, Xm) in the m dimensional space. However, the actual values of its elements are a little lower now. We will find the encoding function from the decoding function. In NumPy you can use the transpose() method to calculate the transpose. becomes an nn matrix. The span of a set of vectors is the set of all the points obtainable by linear combination of the original vectors. For rectangular matrices, some interesting relationships hold. What is the relationship between SVD and eigendecomposition? Now let me calculate the projection matrices of matrix A mentioned before. The singular values are 1=11.97, 2=5.57, 3=3.25, and the rank of A is 3. So this matrix will stretch a vector along ui. \newcommand{\Gauss}{\mathcal{N}} \newcommand{\mS}{\mat{S}} NumPy has a function called svd() which can do the same thing for us. So, eigendecomposition is possible. \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} (2) The first component has the largest variance possible. Now that we are familiar with the transpose and dot product, we can define the length (also called the 2-norm) of the vector u as: To normalize a vector u, we simply divide it by its length to have the normalized vector n: The normalized vector n is still in the same direction of u, but its length is 1. \newcommand{\norm}[2]{||{#1}||_{#2}} Please provide meta comments in, In addition to an excellent and detailed amoeba's answer with its further links I might recommend to check. Similarly, we can have a stretching matrix in y-direction: then y=Ax is the vector which results after rotation of x by , and Bx is a vector which is the result of stretching x in the x-direction by a constant factor k. Listing 1 shows how these matrices can be applied to a vector x and visualized in Python. A1 = (QQ1)1 = Q1Q1 A 1 = ( Q Q 1) 1 = Q 1 Q 1 To be able to reconstruct the image using the first 30 singular values we only need to keep the first 30 i, ui, and vi which means storing 30(1+480+423)=27120 values. Since A^T A is a symmetric matrix and has two non-zero eigenvalues, its rank is 2. So we convert these points to a lower dimensional version such that: If l is less than n, then it requires less space for storage. First, we load the dataset: The fetch_olivetti_faces() function has been already imported in Listing 1. It is important to note that the noise in the first element which is represented by u2 is not eliminated. Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are important matrix factorization techniques with many applications in machine learning and other fields. The columns of V are the corresponding eigenvectors in the same order. Let $A = U\Sigma V^T$ be the SVD of $A$. So now my confusion: \( \mV \in \real^{n \times n} \) is an orthogonal matrix. \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} The result is a matrix that is only an approximation of the noiseless matrix that we are looking for. \newcommand{\sA}{\setsymb{A}} So the vectors Avi are perpendicular to each other as shown in Figure 15. Save this norm as A3. \newcommand{\vq}{\vec{q}} \newcommand{\vp}{\vec{p}} Now the eigendecomposition equation becomes: Each of the eigenvectors ui is normalized, so they are unit vectors. Since $A = A^T$, we have $AA^T = A^TA = A^2$ and: Graph neural network (GNN), a popular deep learning framework for graph data is achieving remarkable performances in a variety of such application domains. y is the transformed vector of x. The vector Av is the vector v transformed by the matrix A. In the upcoming learning modules, we will highlight the importance of SVD for processing and analyzing datasets and models. The first element of this tuple is an array that stores the eigenvalues, and the second element is a 2-d array that stores the corresponding eigenvectors. where $v_i$ is the $i$-th Principal Component, or PC, and $\lambda_i$ is the $i$-th eigenvalue of $S$ and is also equal to the variance of the data along the $i$-th PC. The concepts of eigendecompostion is very important in many fields such as computer vision and machine learning using dimension reduction methods of PCA. This vector is the transformation of the vector v1 by A. relationship between svd and eigendecomposition. \newcommand{\rational}{\mathbb{Q}} Lets look at the geometry of a 2 by 2 matrix. Can airtags be tracked from an iMac desktop, with no iPhone? It is important to note that these eigenvalues are not necessarily different from each other and some of them can be equal. Also, is it possible to use the same denominator for $S$? What video game is Charlie playing in Poker Face S01E07? We know that the eigenvalues of A are orthogonal which means each pair of them are perpendicular. If we approximate it using the first singular value, the rank of Ak will be one and Ak multiplied by x will be a line (Figure 20 right). If Data has low rank structure(ie we use a cost function to measure the fit between the given data and its approximation) and a Gaussian Noise added to it, We find the first singular value which is larger than the largest singular value of the noise matrix and we keep all those values and truncate the rest. Here the eigenvectors are linearly independent, but they are not orthogonal (refer to Figure 3), and they do not show the correct direction of stretching for this matrix after transformation. Let the real values data matrix $\mathbf X$ be of $n \times p$ size, where $n$ is the number of samples and $p$ is the number of variables. Spontaneous vaginal delivery What is the connection between these two approaches? Why PCA of data by means of SVD of the data? Eigendecomposition is only defined for square matrices. \newcommand{\min}{\text{min}\;} Some people believe that the eyes are the most important feature of your face. \newcommand{\sX}{\setsymb{X}} To subscribe to this RSS feed, copy and paste this URL into your RSS reader. \newcommand{\labeledset}{\mathbb{L}} The two sides are still equal if we multiply any positive scalar on both sides. Note that \( \mU \) and \( \mV \) are square matrices You can check that the array s in Listing 22 has 400 elements, so we have 400 non-zero singular values and the rank of the matrix is 400. It has some interesting algebraic properties and conveys important geometrical and theoretical insights about linear transformations. So they span Ax and form a basis for col A, and the number of these vectors becomes the dimension of col of A or rank of A. Using the SVD we can represent the same data using only 153+253+3 = 123 15 3 + 25 3 + 3 = 123 units of storage (corresponding to the truncated U, V, and D in the example above). First, we can calculate its eigenvalues and eigenvectors: As you see, it has two eigenvalues (since it is a 22 symmetric matrix). So when we pick k vectors from this set, Ak x is written as a linear combination of u1, u2, uk. , z = Sz ( c ) Transformation y = Uz to the m - dimensional . Understanding the output of SVD when used for PCA, Interpreting matrices of SVD in practical applications. Here we can clearly observe that the direction of both these vectors are same, however, the orange vector is just a scaled version of our original vector(v). In fact, in some cases, it is desirable to ignore irrelevant details to avoid the phenomenon of overfitting. Here the red and green are the basis vectors. Check out the post "Relationship between SVD and PCA. And therein lies the importance of SVD. Finally, the ui and vi vectors reported by svd() have the opposite sign of the ui and vi vectors that were calculated in Listing 10-12. Interested in Machine Learning and Deep Learning. We can show some of them as an example here: In the previous example, we stored our original image in a matrix and then used SVD to decompose it. First look at the ui vectors generated by SVD. This transformation can be decomposed in three sub-transformations: 1. rotation, 2. re-scaling, 3. rotation. We plotted the eigenvectors of A in Figure 3, and it was mentioned that they do not show the directions of stretching for Ax. Here's an important statement that people have trouble remembering. [Math] Intuitively, what is the difference between Eigendecomposition and Singular Value Decomposition [Math] Singular value decomposition of positive definite matrix [Math] Understanding the singular value decomposition (SVD) [Math] Relation between singular values of a data matrix and the eigenvalues of its covariance matrix I hope that you enjoyed reading this article. V and U are from SVD: We make D^+ by transposing and inverse all the diagonal elements. \newcommand{\integer}{\mathbb{Z}} /Filter /FlateDecode As mentioned before an eigenvector simplifies the matrix multiplication into a scalar multiplication. An important property of the symmetric matrices is that an nn symmetric matrix has n linearly independent and orthogonal eigenvectors, and it has n real eigenvalues corresponding to those eigenvectors. We can use the ideas from the paper by Gavish and Donoho on optimal hard thresholding for singular values. This is not a coincidence. Positive semidenite matrices are guarantee that: Positive denite matrices additionally guarantee that: The decoding function has to be a simple matrix multiplication. This data set contains 400 images. To find the sub-transformations: Now we can choose to keep only the first r columns of U, r columns of V and rr sub-matrix of D ie instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the r largest singular values and their corresponding vectors. The SVD can be calculated by calling the svd () function. is called a projection matrix. Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. We call physics-informed DMD (piDMD) as the optimization integrates underlying knowledge of the system physics into the learning framework. Here 2 is rather small. However, it can also be performed via singular value decomposition (SVD) of the data matrix X. In SVD, the roles played by \( \mU, \mD, \mV^T \) are similar to those of \( \mQ, \mLambda, \mQ^{-1} \) in eigendecomposition. The transpose has some important properties. MIT professor Gilbert Strang has a wonderful lecture on the SVD, and he includes an existence proof for the SVD. So the set {vi} is an orthonormal set. Analytics Vidhya is a community of Analytics and Data Science professionals. Each vector ui will have 4096 elements. In this figure, I have tried to visualize an n-dimensional vector space. Singular Value Decomposition (SVD) is a way to factorize a matrix, into singular vectors and singular values. column means have been subtracted and are now equal to zero. \newcommand{\vu}{\vec{u}} Since the rank of A^TA is 2, all the vectors A^TAx lie on a plane. \newcommand{\vtheta}{\vec{\theta}} }}\text{ }} Saturated vs unsaturated fats - Structure in relation to room temperature state? Then this vector is multiplied by i. A place where magic is studied and practiced? A is a Square Matrix and is known. How does temperature affect the concentration of flavonoids in orange juice? The singular value decomposition is similar to Eigen Decomposition except this time we will write A as a product of three matrices: U and V are orthogonal matrices. This can be seen in Figure 25. 2. Eigendecomposition and SVD can be also used for the Principal Component Analysis (PCA). If A is an mp matrix and B is a pn matrix, the matrix product C=AB (which is an mn matrix) is defined as: For example, the rotation matrix in a 2-d space can be defined as: This matrix rotates a vector about the origin by the angle (with counterclockwise rotation for a positive ). Say matrix A is real symmetric matrix, then it can be decomposed as: where Q is an orthogonal matrix composed of eigenvectors of A, and is a diagonal matrix. In addition, though the direction of the reconstructed n is almost correct, its magnitude is smaller compared to the vectors in the first category. If we assume that each eigenvector ui is an n 1 column vector, then the transpose of ui is a 1 n row vector. A Computer Science portal for geeks. The image has been reconstructed using the first 2, 4, and 6 singular values. This transformed vector is a scaled version (scaled by the value ) of the initial vector v. If v is an eigenvector of A, then so is any rescaled vector sv for s R, s!= 0. M is factorized into three matrices, U, and V, it can be expended as linear combination of orthonormal basis diections (u and v) with coefficient . U and V are both orthonormal matrices which means UU = VV = I , I is the identity matrix. Since we will use the same matrix D to decode all the points, we can no longer consider the points in isolation. Not let us consider the following matrix A : Applying the matrix A on this unit circle, we get the following: Now let us compute the SVD of matrix A and then apply individual transformations to the unit circle: Now applying U to the unit circle we get the First Rotation: Now applying the diagonal matrix D we obtain a scaled version on the circle: Now applying the last rotation(V), we obtain the following: Now we can clearly see that this is exactly same as what we obtained when applying A directly to the unit circle. In an n-dimensional space, to find the coordinate of ui, we need to draw a hyper-plane passing from x and parallel to all other eigenvectors except ui and see where it intersects the ui axis. So that's the role of \( \mU \) and \( \mV \), both orthogonal matrices. We can simply use y=Mx to find the corresponding image of each label (x can be any vectors ik, and y will be the corresponding fk). \newcommand{\sP}{\setsymb{P}} So we need a symmetric matrix to express x as a linear combination of the eigenvectors in the above equation. The matrix X^(T)X is called the Covariance Matrix when we centre the data around 0. great eccleston flooding; carlos vela injury update; scorpio ex boyfriend behaviour. is called the change-of-coordinate matrix. How to use Slater Type Orbitals as a basis functions in matrix method correctly? In fact, in the reconstructed vector, the second element (which did not contain noise) has now a lower value compared to the original vector (Figure 36). In fact, the number of non-zero or positive singular values of a matrix is equal to its rank. Suppose that we apply our symmetric matrix A to an arbitrary vector x. The general effect of matrix A on the vectors in x is a combination of rotation and stretching. We showed that A^T A is a symmetric matrix, so it has n real eigenvalues and n linear independent and orthogonal eigenvectors which can form a basis for the n-element vectors that it can transform (in R^n space). All the Code Listings in this article are available for download as a Jupyter notebook from GitHub at: https://github.com/reza-bagheri/SVD_article. So the objective is to lose as little as precision as possible. Suppose is defined as follows: Then D+ is defined as follows: Now, we can see how A^+A works: In the same way, AA^+ = I. The singular value i scales the length of this vector along ui. \newcommand{\setsymb}[1]{#1} The initial vectors (x) on the left side form a circle as mentioned before, but the transformation matrix somehow changes this circle and turns it into an ellipse. It is related to the polar decomposition.. However, explaining it is beyond the scope of this article). In the last paragraph you`re confusing left and right. In many contexts, the squared L norm may be undesirable because it increases very slowly near the origin. 2 Again, the spectral features of the solution of can be . Why do academics stay as adjuncts for years rather than move around? A matrix whose columns are an orthonormal set is called an orthogonal matrix, and V is an orthogonal matrix. \renewcommand{\BigOsymbol}{\mathcal{O}} The only difference is that each element in C is now a vector itself and should be transposed too. Can we apply the SVD concept on the data distribution ? Hence, $A = U \Sigma V^T = W \Lambda W^T$, and $$A^2 = U \Sigma^2 U^T = V \Sigma^2 V^T = W \Lambda^2 W^T$$. Thus, you can calculate the . Among other applications, SVD can be used to perform principal component analysis (PCA) since there is a close relationship between both procedures. But the eigenvectors of a symmetric matrix are orthogonal too. \newcommand{\natural}{\mathbb{N}} To understand how the image information is stored in each of these matrices, we can study a much simpler image. BY . Alternatively, a matrix is singular if and only if it has a determinant of 0. You can see in Chapter 9 of Essential Math for Data Science, that you can use eigendecomposition to diagonalize a matrix (make the matrix diagonal). So if we use a lower rank like 20 we can significantly reduce the noise in the image. \newcommand{\vb}{\vec{b}} \newcommand{\nunlabeledsmall}{u} What is the relationship between SVD and PCA? If the set of vectors B ={v1, v2, v3 , vn} form a basis for a vector space, then every vector x in that space can be uniquely specified using those basis vectors : Now the coordinate of x relative to this basis B is: In fact, when we are writing a vector in R, we are already expressing its coordinate relative to the standard basis. The corresponding eigenvalue of ui is i (which is the same as A), but all the other eigenvalues are zero. PCA needs the data normalized, ideally same unit. Listing 2 shows how this can be done in Python. This is also called as broadcasting. X = \left( This decomposition comes from a general theorem in linear algebra, and some work does have to be done to motivate the relatino to PCA. Why is SVD useful? (27) 4 Trace, Determinant, etc. \newcommand{\hadamard}{\circ} First come the dimen-sions of the four subspaces in Figure 7.3. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions. Now if we use ui as a basis, we can decompose n and find its orthogonal projection onto ui. In addition, B is a pn matrix where each row vector in bi^T is the i-th row of B: Again, the first subscript refers to the row number and the second subscript to the column number. Redundant Vectors in Singular Value Decomposition, Using the singular value decomposition for calculating eigenvalues and eigenvectors of symmetric matrices, Singular Value Decomposition of Symmetric Matrix. This can be seen in Figure 32. As an example, suppose that we want to calculate the SVD of matrix. Instead, we must minimize the Frobenius norm of the matrix of errors computed over all dimensions and all points: We will start to find only the first principal component (PC). Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. It returns a tuple. In real-world we dont obtain plots like the above. Solving PCA with correlation matrix of a dataset and its singular value decomposition. A symmetric matrix is orthogonally diagonalizable. Here, the columns of \( \mU \) are known as the left-singular vectors of matrix \( \mA \). 2.2 Relationship of PCA and SVD Another approach to the PCA problem, resulting in the same projection directions wi and feature vectors uses Singular Value Decomposition (SVD, [Golub1970, Klema1980, Wall2003]) for the calculations. Is a PhD visitor considered as a visiting scholar? So A^T A is equal to its transpose, and it is a symmetric matrix. In figure 24, the first 2 matrices can capture almost all the information about the left rectangle in the original image. So: In addition, the transpose of a product is the product of the transposes in the reverse order. Or in other words, how to use SVD of the data matrix to perform dimensionality reduction? So to find each coordinate ai, we just need to draw a line perpendicular to an axis of ui through point x and see where it intersects it (refer to Figure 8). \newcommand{\vz}{\vec{z}} george smith north funeral home Why do many companies reject expired SSL certificates as bugs in bug bounties? We know that the singular values are the square root of the eigenvalues (i=i) as shown in (Figure 172). \newcommand{\mSigma}{\mat{\Sigma}} Do roots of these polynomials approach the negative of the Euler-Mascheroni constant? When we deal with a matrix (as a tool of collecting data formed by rows and columns) of high dimensions, is there a way to make it easier to understand the data information and find a lower dimensional representative of it ? Figure 2 shows the plots of x and t and the effect of transformation on two sample vectors x1 and x2 in x. As a result, we already have enough vi vectors to form U. So t is the set of all the vectors in x which have been transformed by A. So we first make an r r diagonal matrix with diagonal entries of 1, 2, , r. Another example is: Here the eigenvectors are not linearly independent. As Figure 8 (left) shows when the eigenvectors are orthogonal (like i and j in R), we just need to draw a line that passes through point x and is perpendicular to the axis that we want to find its coordinate. \newcommand{\expe}[1]{\mathrm{e}^{#1}} October 20, 2021. The right hand side plot is a simple example of the left equation. CSE 6740. & \implies \mV \mD \mU^T \mU \mD \mV^T = \mQ \mLambda \mQ^T \\ They both split up A into the same r matrices u iivT of rank one: column times row. If we can find the orthogonal basis and the stretching magnitude, can we characterize the data ? So x is a 3-d column vector, but Ax is a not 3-dimensional vector, and x and Ax exist in different vector spaces. We know that should be a 33 matrix. \begin{array}{ccccc} @OrvarKorvar: What n x n matrix are you talking about ? For each label k, all the elements are zero except the k-th element. Let A be an mn matrix and rank A = r. So the number of non-zero singular values of A is r. Since they are positive and labeled in decreasing order, we can write them as. They are called the standard basis for R. This derivation is specific to the case of l=1 and recovers only the first principal component. If is an eigenvalue of A, then there exist non-zero x, y Rn such that Ax = x and yTA = yT. But before explaining how the length can be calculated, we need to get familiar with the transpose of a matrix and the dot product. What does this tell you about the relationship between the eigendecomposition and the singular value decomposition? So for a vector like x2 in figure 2, the effect of multiplying by A is like multiplying it with a scalar quantity like . \newcommand{\vv}{\vec{v}} So the result of this transformation is a straight line, not an ellipse. we want to calculate the stretching directions for a non-symmetric matrix., but how can we define the stretching directions mathematically? How does it work? For each of these eigenvectors we can use the definition of length and the rule for the product of transposed matrices to have: Now we assume that the corresponding eigenvalue of vi is i. Full video list and slides: https://www.kamperh.com/data414/ Now let me try another matrix: Now we can plot the eigenvectors on top of the transformed vectors by replacing this new matrix in Listing 5. In fact, in Listing 3 the column u[:,i] is the eigenvector corresponding to the eigenvalue lam[i]. Move on to other advanced topics in mathematics or machine learning. Calculate Singular-Value Decomposition. u1 is so called the normalized first principle component. One useful example is the spectral norm, kMk 2 . The orthogonal projection of Ax1 onto u1 and u2 are, respectively (Figure 175), and by simply adding them together we get Ax1, Here is an example showing how to calculate the SVD of a matrix in Python. How to derive the three matrices of SVD from eigenvalue decomposition in Kernel PCA? We can store an image in a matrix. Any dimensions with zero singular values are essentially squashed. That is we want to reduce the distance between x and g(c). You can find these by considering how $A$ as a linear transformation morphs a unit sphere $\mathbb S$ in its domain to an ellipse: the principal semi-axes of the ellipse align with the $u_i$ and the $v_i$ are their preimages. So. As mentioned before this can be also done using the projection matrix. In this example, we are going to use the Olivetti faces dataset in the Scikit-learn library. Surly Straggler vs. other types of steel frames. \newcommand{\sO}{\setsymb{O}} So it acts as a projection matrix and projects all the vectors in x on the line y=2x. The singular values are the absolute values of the eigenvalues of a matrix A. SVD enables us to discover some of the same kind of information as the eigen decomposition reveals, however, the SVD is more generally applicable. Here I focus on a 3-d space to be able to visualize the concepts. data are centered), then it's simply the average value of $x_i^2$. In other words, if u1, u2, u3 , un are the eigenvectors of A, and 1, 2, , n are their corresponding eigenvalues respectively, then A can be written as. Follow the above links to first get acquainted with the corresponding concepts. Where A Square Matrix; X Eigenvector; Eigenvalue. All that was required was changing the Python 2 print statements to Python 3 print calls. So multiplying ui ui^T by x, we get the orthogonal projection of x onto ui. So: A vector is a quantity which has both magnitude and direction. The right field is the winter mean SSR over the SEALLH. \newcommand{\vec}[1]{\mathbf{#1}} We can think of a matrix A as a transformation that acts on a vector x by multiplication to produce a new vector Ax. The trace of a matrix is the sum of its eigenvalues, and it is invariant with respect to a change of basis. given VV = I, we can get XV = U and let: Z1 is so called the first component of X corresponding to the largest 1 since 1 2 p 0. Hard to interpret when we do the real word data regression analysis , we cannot say which variables are most important because each one component is a linear combination of original feature space. So the vector Ax can be written as a linear combination of them. For example, if we assume the eigenvalues i have been sorted in descending order. u_i = \frac{1}{\sqrt{(n-1)\lambda_i}} Xv_i\,, For example in Figure 26, we have the image of the national monument of Scotland which has 6 pillars (in the image), and the matrix corresponding to the first singular value can capture the number of pillars in the original image. So we can normalize the Avi vectors by dividing them by their length: Now we have a set {u1, u2, , ur} which is an orthonormal basis for Ax which is r-dimensional. To plot the vectors, the quiver() function in matplotlib has been used. You can now easily see that A was not symmetric. && x_n^T - \mu^T && As a consequence, the SVD appears in numerous algorithms in machine learning. The proof is not deep, but is better covered in a linear algebra course . If we call these vectors x then ||x||=1. So the eigendecomposition mathematically explains an important property of the symmetric matrices that we saw in the plots before. Some details might be lost. In fact, what we get is a less noisy approximation of the white background that we expect to have if there is no noise in the image. We call the vectors in the unit circle x, and plot the transformation of them by the original matrix (Cx). $$A = W \Lambda W^T = \displaystyle \sum_{i=1}^n w_i \lambda_i w_i^T = \sum_{i=1}^n w_i \left| \lambda_i \right| \text{sign}(\lambda_i) w_i^T$$ where $w_i$ are the columns of the matrix $W$. is k, and this maximum is attained at vk. S = V \Lambda V^T = \sum_{i = 1}^r \lambda_i v_i v_i^T \,, Relationship between eigendecomposition and singular value decomposition, We've added a "Necessary cookies only" option to the cookie consent popup, Visualization of Singular Value decomposition of a Symmetric Matrix. Similarly, u2 shows the average direction for the second category. So when A is symmetric, instead of calculating Avi (where vi is the eigenvector of A^T A) we can simply use ui (the eigenvector of A) to have the directions of stretching, and this is exactly what we did for the eigendecomposition process.