Factor Analysis and Principal Component Method - Must for CBSE NET Psychology [ CBSE NET-JRF Updates ]
In an experiment there are several variables which affect the dependent variable (the explored variable). Sometimes the independent variables themselves are found to be correlated. In such cases the technique of factor analysis is used to reduce a number of correlated variables to a smaller number of un-correlated factors. There are two types of factor analysis
Confirmatory and Exploratory Factor Analysis
A confirmatory factor analysis as the name suggests is used to confirm a hypothesis. That is when an investigator knows the factors underlying a variable and the factor analysis is used to confirm that those indeed are the factors.
FA and PCM
When there is no defining hypotheses, exploratory factor analysis is used to find the underlying factors. The factors in factor analysis are “real world” variables such as depression, anxiety, and intelligence. In contrast in principal components analysis (PCA), components are simply geometrical abstractions that may or may not map to any real word constructs.
Another difference between the two approaches is that, in PCA, all of the observed variance is analyzed, whereas in factor analysis (FA) only the shared variance is analyzed. To elaborate, in mathematical terms, the difference between PCA and FA is found in the values that are put in the diagonal of the correlation matrix. In PCA, 1.00’s are put in the diagonal meaning that all of the variance in the matrix is to be accounted for (including variance unique to each variable, variance common among variables, and error variance). That would, therefore, by definition, include all of the variance in the variables. In contrast, in FA, the communalities are put in the diagonal meaning that only the variance shared with other variables is to be accounted for (excluding variance unique to each variable and error variance). That would, therefore, by definition, include only variance that is common among the variables.
Since only shared variance is used in FA, we can say that PCA analyzes variance and FA analyzes covariance. To analyze only the variance that is accounted for in an analysis (as in situations where there is a theory drawn from previous research about the relationships among the variables), a researcher should probably use FA to exclude unique and error variances, in order to see what is going on in the covariance, or common variance. In Exploratory factor analysis (EFA) where, researcher is just exploring without a theory to see what patterns emerge in their data, it makes more sense to perform PCA (and thereby include unique and error variances), just to see what patterns emerge in all of the variance. In other words in this case it is not “safe” to ignore any variance.
Factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables. An example of such a causal structure is presented below.
The ovals represent the latent (unmeasured) factors of “satisfaction with supervision” and “satisfaction with pay. ” These factors are latent in the sense that they are assumed to actually exist in the employee’s belief systems, but cannot be measured directly. However, they do exert an influence on the employee’s responses to the seven items that constitute the job satisfaction questionnaire described earlier (these seven items are represented 10 Principal Component Analysis as the squares labelled V1-V7 in the figure). It can be seen that the “supervision” factor exerts influence on items V1-V4 (the supervision questions), while the “pay” factor exerts influence on items V5-V7 (the pay items). Researchers use factor analysis when they believe that certain latent factors exist that exert causal influence on the observed variables they are studying. Exploratory factor analysis helps the researcher identify the number and nature of these latent factors. In contrast, principal component analysis makes no assumption about an underlying causal model. Principal component analysis is simply a variable reduction procedure that (typically) results in a relatively small number of components that account for most of the variance in a set of observed variables.
Variance of Original Data in PCM or FA
Observed variables are standardized in the course of the analysis. This means that each variable is transformed so that it has a mean of zero and a variance of one. In PCM, since we don’t ignore any variance, therefore the “total variance” in the data set is simply the sum of the variances of these observed variables.
Factor Analysis and PCM at High Level
Here is a flowchart describing the process for doing the factor analysis or PCM. Note that the choice of using the PCM or factor analysis is highly depend to the type of research.
Mechanics of PCM
Principal components analysis is a procedure for identifying a smaller number of uncorrelated variables, called “principal components”, from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components.
Technically, a principal component can be defined as a linear combination of optimally-weighted observed variables such that it explains the maximum variables in the variables. The words “linear combination” refer to the fact that scores on a component are created by adding together scores on the observed variables being analyzed. “Optimally weighted” refers to the fact that the observed variables are weighted in such a way that the resulting components account for a maximal amount of variance in the data set. The weights are produced using the eigen-equations of the covariance matrix. PC analyzes and reproduces a version of the R matrix that has 1’s in the diagonal. Each value of 1.00 corresponds to the total variance of one standardized measured variable, and the initial set of p components must have sums of squared correlations for each variable across all components that sum to 1.00. This is interpreted as evidence that a p-component PC model can reproduce all the variances of each standardized measured variable.
Number of Components in PCM.
Practically, the number of components extracted in a principal component analysis is equal to the number of observed variables being analyzed. However, in most analyses, only the first few components account for meaningful amounts of variance, so only these first few components are retained, interpreted, and used in subsequent analyses (such as in multiple regression analyses). We assume that the remaining components accounted for only trivial amounts of variance.
How to Interpret the Components of PCM
As shown in figure above we have used the PCA to explain the data in terms of two components (the green axes). The first component extracted in a principal component analysis accounts for a maximal amount of total variance in the observed variables. Under typical conditions, this means that the first component will be correlated with at least some of the observed variables. It may be correlated with many. The second component extracted will have two important characteristics. First, this component will account for a maximal amount of variance in the data set that was not accounted for by the first component. The second characteristic of the second component is that it will be uncorrelated with the first component. That is, correlation between components 1 and 2, would be zero. The remaining components that are extracted in the analysis display the same two characteristics: each component accounts for a maximal amount of variance in the observed variables that was not accounted for by the preceding components, and is uncorrelated with all of the preceding components. A principal component analysis proceeds in this fashion, with each new component accounting for progressively smaller and smaller amounts of variance (this is why only the first few components are usually retained and interpreted). When the analysis is complete, the resulting components will display varying degrees of correlation with the observed variables, but are completely uncorrelated with one another. The components that are extracted in the analysis will partition this variance: perhaps the first component will account for 3.2 units of total variance; perhaps the second component will account for 2.1 units. The analysis continues in this way until all of the variance in the data set has been accounted for.
Orthogonal versus Oblique Solutions of PCM
As said above most of the times principal component analyses results in orthogonal solutions. An orthogonal solution is one in which the components remain uncorrelated. However, it is possible to perform a principal component analysis that results in correlated components. Such a solution is called an oblique solution. In some situations, oblique solutions are superior to orthogonal solutions because they produce cleaner, more easily-interpreted results. However, oblique solutions are also somewhat more complicated to interpret, compared to orthogonal solutions.
Process of FA
The most widely used method in factor analysis is the PAF method. Both PC and PAF are based on slightly different versions of the R correlation matrix (which includes the entire set of correlations among measured X variables). In contrast with PCM, in PAF, we replace the 1s in the diagonal of the correlation matrix R with estimates of communality that represent the proportion of variance in each measured X variable that is predictable from or shared with other X variables in the dataset. Many programs use multiple regression to obtain an initial communality estimate for each variable;
This correlation matrix, by some means represents the inter-correlations between the variables are presented. Now our objective with the analysis is to reduce the dimensionality of this matrix so that the variables which correlate very well with each other (i. e. if one increases the other also increases in same proportion) can be reduced to one underlying variable, called a ‘factor’. The obtained factors create a dimensions that can be visualized as classification axes along which measurement variables can be plotted”. Think about this in the same terms as we think about components of a point or a line along x-axis and y-axis in two dimensions. In three dimensions the components are projected along x, y and z axis. These axis are factors. As a point in two-dimensional plane is expressed in terms or its values along x and y axes, we can express the original variables in terms of their projections or values along these factors. This projection of the scores of the original variables on the factor leads to two results: factor scores and factor loadings. Factor scores are “the scores of a subject on a factor” while factor loadings are the “correlation of the original variable with a factor”. The factor scores can then for example be used as new scores in multiple regression analysis, while the factor loadings are especially useful in determining the “substantive importance of a particular variable to a factor”, by squaring this factor loading. This works because, factor loadings are, after all, a correlation, and the squared correlation of a variable determines the amount of variance accounted for by that particular variable. This is important information in interpreting and naming the factors. Very generally this is the basic idea of factor analysis.
In the next tutorial we would look at the math behind the PCM and FA and also how to actually interpret the results. Further we would look at their application and also how they have been used traditionally. Subscribe to our updates to keep getting links to these tutorials as soon as we develop them- right in your email!
Till then you can refer the CBSE NET Psychology Postal Course. If you are looking for past year solutions, you can refer our exclusive online portal Doorsteptutor which has detailed explanatory answers to MCQs.
- Published on: July 9, 2016