# Factor Analysis and Principal Component Method - Must for CBSE NET Psychology

()

In an experiment there are several variables which affect the dependent variable (the explored variable). Sometimes the independent variables themselves are found to be correlated. In such cases the technique of factor analysis is used to reduce a number of correlated variables to a smaller number of un-correlated factors. There are two types of factor analysis

## Confirmatory and Exploratory Factor Analysis

A confirmatory factor analysis as the name suggests is used to confirm a hypothesis. That is when an investigator knows the factors underlying a variable and the factor analysis is used to confirm that those indeed are the factors.

## FA and PCM

When there is no defining hypotheses, exploratory factor analysis is used to find the underlying factors. The factors in factor analysis are “real world” variables such as depression, anxiety, and intelligence. In contrast in principal components analysis (PCA), components are simply geometrical abstractions that may or may not map to any real word constructs.

Another difference between the two approaches is that, in PCA, all of the observed variance is analyzed, whereas in factor analysis (FA) only the shared variance is analyzed. To elaborate, in mathematical terms, the difference between PCA and FA is found in the values that are put in the diagonal of the correlation matrix. In PCA, 1.00’s are put in the diagonal meaning that all of the variance in the matrix is to be accounted for (including variance unique to each variable, variance common among variables, and error variance). That would, therefore, by definition, include all of the variance in the variables. In contrast, in FA, the communalities are put in the diagonal meaning that only the variance shared with other variables is to be accounted for (excluding variance unique to each variable and error variance). That would, therefore, by definition, include only variance that is common among the variables.

Since only shared variance is used in FA, we can say that PCA analyzes variance and FA analyzes covariance. To analyze only the variance that is accounted for in an analysis (as in situations where there is a theory drawn from previous research about the relationships among the variables), a researcher should probably use FA to exclude unique and error variances, in order to see what is going on in the covariance, or common variance. In Exploratory factor analysis (EFA) where, researcher is just exploring without a theory to see what patterns emerge in their data, it makes more sense to perform PCA (and thereby include unique and error variances), just to see what patterns emerge in all of the variance. In other words in this case it is not “safe” to ignore any variance.

Factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables. An example of such a causal structure is presented below.

The ovals represent the latent (unmeasured) factors of “satisfaction with supervision” and “satisfaction with pay. ” These factors are latent in the sense that they are assumed to actually exist in the employee’s belief systems, but cannot be measured directly. However, they do exert an influence on the employee’s responses to the seven items that constitute the job satisfaction questionnaire described earlier (these seven items are represented 10 Principal Component Analysis as the squares labelled V1-V7 in the figure). It can be seen that the “supervision” factor exerts influence on items V1-V4 (the supervision questions), while the “pay” factor exerts influence on items V5-V7 (the pay items). Researchers use factor analysis when they believe that certain latent factors exist that exert causal influence on the observed variables they are studying. Exploratory factor analysis helps the researcher identify the number and nature of these latent factors. In contrast, principal component analysis makes no assumption about an underlying causal model. Principal component analysis is simply a variable reduction procedure that (typically) results in a relatively small number of components that account for most of the variance in a set of observed variables.

## Variance of Original Data in PCM or FA

Observed variables are standardized in the course of the analysis. This means that each variable is transformed so that it has a mean of zero and a variance of one. In PCM, since we don’t ignore any variance, therefore the “total variance” in the data set is simply the sum of the variances of these observed variables.

## Factor Analysis and PCM at High Level

Here is a flowchart describing the process for doing the factor analysis or PCM. Note that the choice of using the PCM or factor analysis is highly depend to the type of research.

## Mechanics of PCM

Principal components analysis is a procedure for identifying a smaller number of uncorrelated variables, called “principal components”, from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components.

Technically, a principal component can be defined as a linear combination of optimally-weighted observed variables such that it explains the maximum variables in the variables. The words “linear combination” refer to the fact that scores on a component are created by adding together scores on the observed variables being analyzed. “Optimally weighted” refers to the fact that the observed variables are weighted in such a way that the resulting components account for a maximal amount of variance in the data set. The weights are produced using the eigen-equations of the covariance matrix. PC analyzes and reproduces a version of the R matrix that has 1’s in the diagonal. Each value of 1.00 corresponds to the total variance of one standardized measured variable, and the initial set of p components must have sums of squared correlations for each variable across all components that sum to 1.00. This is interpreted as evidence that a p-component PC model can reproduce all the variances of each standardized measured variable.

## Number of Components in PCM.

Practically, the number of components extracted in a principal component analysis is equal to the number of observed variables being analyzed. However, in most analyses, only the first few components account for meaningful amounts of variance, so only these first few components are retained, interpreted, and used in subsequent analyses (such as in multiple regression analyses). We assume that the remaining components accounted for only trivial amounts of variance.

## How to Interpret the Components of PCM

As shown in figure above we have used the PCA to explain the data in terms of two components (the green axes). The first component extracted in a principal component analysis accounts for a maximal amount of total variance in the observed variables. Under typical conditions, this means that the first component will be correlated with at least some of the observed variables. It may be correlated with many. The second component extracted will have two important characteristics. First, this component will account for a maximal amount of variance in the data set that was not accounted for by the first component. The second characteristic of the second component is that it will be uncorrelated with the first component. That is, correlation between components 1 and 2, would be zero. The remaining components that are extracted in the analysis display the same two characteristics: each component accounts for a maximal amount of variance in the observed variables that was not accounted for by the preceding components, and is uncorrelated with all of the preceding components. A principal component analysis proceeds in this fashion, with each new component accounting for progressively smaller and smaller amounts of variance (this is why only the first few components are usually retained and interpreted). When the analysis is complete, the resulting components will display varying degrees of correlation with the observed variables, but are completely uncorrelated with one another. The components that are extracted in the analysis will partition this variance: perhaps the first component will account for 3.2 units of total variance; perhaps the second component will account for 2.1 units. The analysis continues in this way until all of the variance in the data set has been accounted for.

## Orthogonal versus Oblique Solutions of PCM

As said above most of the times principal component analyses results in orthogonal solutions. An orthogonal solution is one in which the components remain uncorrelated. However, it is possible to perform a principal component analysis that results in correlated components. Such a solution is called an oblique solution. In some situations, oblique solutions are superior to orthogonal solutions because they produce cleaner, more easily-interpreted results. However, oblique solutions are also somewhat more complicated to interpret, compared to orthogonal solutions.

## Process of FA

The most widely used method in factor analysis is the PAF method. Both PC and PAF are based on slightly different versions of the R correlation matrix (which includes the entire set of correlations among measured X variables). In contrast with PCM, in PAF, we replace the 1s in the diagonal of the correlation matrix R with estimates of communality that represent the proportion of variance in each measured X variable that is predictable from or shared with other X variables in the dataset. Many programs use multiple regression to obtain an initial communality estimate for each variable;

## Principle Axis Method

In the principal axis factoring method, we make an initial estimate of the common variance in which the communalities are less than 1. This initial estimate assumes that the communality of each variable is equal to the square multiple regression coefficient of that variable with respect to the other variables. The principal axis factoring method is implemented by replacing the main diagonal of the correlation matrix (which consists of all ones) by these initial estimates of the communalities. The principal component is now applied to this revised version of the correlation matrix

## Scree Plot

From the point of view of exploratory analysis, the eigenvalues of PCA are inflated component loadings, i.e., contaminated with error variance. That is, the eigenvalue for a given factor measures the variance in all the variables, which is accounted for by that factor. Scree plot: The Cattell scree test plots the components as the X-axis (that is order of Eigen value) and the corresponding eigenvalues as the Y-axis. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell’s scree test says to drop all further components after the one starting the elbow. This rule is sometimes criticized for being amenable to researcher-controlled “fudging”. That is, as picking the “elbow” can be subjective because the curve has multiple elbows or is a smooth curve, the researcher may be tempted to set the cut-off at the number of factors desired by their research agenda.

## Factor Rotation

It is a procedure in which the eigenvectors (factors) are rotated in an attempt to achieve simple structure.

It is any of several methods in factor analysis by which the researcher attempts to relate the calculated factors to theoretical entities.

In factor or principal-components analysis, rotation of the factor axes(dimensions) identified in the initial extraction of factors, in order to obtain simple and interpretable factors.

Rotation is any of a variety of methods (explained below) used to further analyze initial PCA or EFA results with the goal of making the pattern of loadings clearer, or more pronounced. This process is designed to reveal the simple structure.

## Orthogonal versus Oblique Solutions of PCM

As said above most of the times principal component analyses results in orthogonal solutions. An orthogonal solution is one in which the components remain uncorrelated. However, it is possible to perform a principal component analysis that results in correlated components. Such a solution is called an oblique solution. In some situations, oblique solutions are superior to orthogonal solutions because they produce cleaner, more easily-interpreted results. However, oblique solutions are also somewhat more complicated to interpret, compared to orthogonal solutions. In our questions the components came out to be correlated that is the factors are oblique.

Rotation of components is done differently depending upon whether the factors are believed to be correlated (oblique) or uncorrelated (orthogonal).

## Different Types of Rotation

As mentioned earlier, rotation methods are either orthogonal or oblique. Simply put, orthogonal rotation methods assume that the factors in the analysis are uncorrelated. There are four main orthogonal methods: equamax, orthomax, quartimax, and varimax. In contrast, oblique rotation methods assume that the factors are correlated. There are 15 main oblique methods most important once are direct oblimin & promax.

As we said above since the factors were oblique researcher would have applied one of the oblique methods for rotation- oblimin or promax.

Following are other important considerations for deciding the rotation. Thurstone (1947) first proposed and argued for five criteria that needed to be met for simple structure to be achieved:

Each variable should produce at least one zero loading on some factor.

Each factor should have at least as many zero loadings as there are factors.