Correlation matrix validity

When entering or running a Capital Market Simulation in ProVal, you may get a message that the “correlation matrix cannot be factored”, meaning that the matrix you entered is not a valid correlation matrix. This article discusses the problem, its causes, and a suggested solution.

Background

A correlation matrix has the mathematical property of being positive semi-definite, meaning that the determinant of the matrix is always greater than or equal to zero. A matrix that does not have this property cannot be used to represent correlations between random numbers.

Further, if the matrix is singular (i.e. the determinant is exactly equal to zero), this indicates that one or more of the variables is a linear combination of the other variables. In order to use such a matrix to generate random numbers, one or more dimensions of variability must effectively be removed.

As part of the process of generating random numbers with the input correlations, ProVal computes the Cholesky factorization of the correlation matrix. If the matrix is not positive semi-definite or if it is singular, the factorization will fail. When the Cholesky factorization of your input correlation matrix fails, you will receive the message that the “matrix cannot be factored”. Throughout this article, we refer to a matrix that cannot be factored as “invalid”.

Computing individual correlations between pairs of asset classes by using historical sample data does not necessarily produce a valid correlation matrix. See more on this in the section entitled Common Causes of Invalid Matrices below.

Adjusting the Matrix in ProVal

In the Correlations input dialog of the Asset Classes topic, hitting OK will trigger validation of the matrix. If the matrix does not factor, ProVal will give you the option to automatically adjust the matrix. If you click Adjust…, ProVal will compute a valid correlation matrix that is as close as possible to your input matrix. You will then have the option of accepting or rejecting the suggested replacement matrix.

ProVal Computes this adjusted matrix according to the method described in “Computing the nearest correlation matrix - a problem from finance”, by Nicholas J. Higham, in the IMA Journal of Numerical Analysis (2002) 22, 329-343. The algorithm produces the valid correlation matrix that is nearest to the original matrix, where “nearness” is measured by the sum of squared differences between the entries of the two matrices. ProVal’s implementation of this algorithm assumes that the weighting matrix (W in the journal article) is the identity matrix, effectively meaning that all correlations are equally important in measuring nearness.

If you use this feature to produce a valid matrix, it is up to you to decide whether the resulting matrix is acceptable for your work. If you do not accept this adjusted matrix, you will need to enter a valid correlation matrix before the simulation can be run.

Common Causes of Invalid Matrices

As mentioned above, calculating correlations from historical data does not guarantee that the resulting correlation matrix will be valid. Although individual correlations between pairs of asset classes may be reasonable in isolation, the system as a whole may be mathematically unsound. Some of the causes of this are discussed here.

In practice, it is often the case that the data available is not sufficient for calculating the entire correlation matrix. For example, suppose we wish to include 8 asset classes in our study, and we assign them letter names A-H. Say we have 80 years of returns data for classes A-G, but that class H is relatively new, and only 5 years of data is available. Two ideas may come to mind as to how to calculate the correlation matrix, both of which risk generating an invalid matrix:

Use of inconsistent data sets - Using the 80 years of data for all pairs between classes A-G, and the 5 years of data for all pairs that include class H may produce correlations that are invalid when taken as a whole. The underlying reason is that the sample variance of, say, class A, when measured over the last 80 years is not equal to the sample variance when measured over the last 5 years. This inconsistency may cause invalid sample correlations, but not in all cases.
Use of fewer data points than the number of asset classes - Using only the most recent 5 years of data will avoid problem #1, but creates a new problem. Since the number of data points is less than the number of asset classes, there is not enough information to create an 8 x 8 matrix of linearly independent rows of correlations, and the resulting sample correlation matrix will be singular.

In addition to the problem of incomplete data, other factors may cause the matrix to be invalid. When pairs of asset classes are highly correlated (the correlation is near 1 or negative 1), this brings the determinant of the matrix near zero, and it may increase the likelihood that the matrix will not be valid when all classes are taken into account. Also, the limitations of internal precision in computer calculations in some cases can cause calculated correlation matrices to be invalid.

References

Gentle, James E. (2007). Matrix Algebra - Theory, Computations, and Applications in Statistics New York, NY: Springer Science + Business Media, LLC. ISBN 978-0-387-70872-0.

Holton, Glyn A. (2003). Value-at-Risk - Theory and Practice San Diego, CA: Academic Press. ISBN 0-12-345010-0.