Article Content
1 Introduction
Despite its use, in areas such as information geometry, the role of multilinear algebra in statistical theory has been limited. However, as soon as determinants arise in some statistical context, particularly in multivariate analysis, one can claim that we are using multilinear algebra or multilinear geometry. This is true of previous work of the authors (Pronzato et al. 2017, 2018, 2019) which related the expected volume of random simplices, represented by determinants, to the determinants of covariance matrices and marginal covariance matrices; see also Gillard et al. (2022) where the technique of simplicial distances developed in Pronzato et al. (2017, 2018) has been used for detection of outliers and cluster analysis. The expected volumes of simplices have also played a part in definitions of dispersion orderings in previous work (Giovagnoli and Wynn 1995). The ideas can be traced back to the seminal work of Hotelling (1992) in canonical correlation analysis (CCA) and Wilks (Wilks 1932, 1960) in generalised variance. Results of Sect. 4 dealing with cross-covariances can be used in widening the interpretations of the techniques of the standard CCA as well as various extensions of CCA including the regularized CCA (Tenenhaus and Tenenhaus 2011) and deep CCA (Andrew et al. 2013). Note also an extensive use of cross-covariances in the methodology of time series analysis and forecasting called singular spectrum analysis, see Golyandina and Zhigljavsky (2013); Golyandina et al. (2018). The main aim of this paper is to promote the idea that exterior algebra is a natural environment in which to study and extend formulae and show that the inner product in exterior algebra is the key formula for our purposes.
We start with an elementary discussion. Thus, in statistics and probability theory variances and covariances are closely related to metrics. If X and Y are two jointly distributed one-dimensional random variables and denotes expectation with respect to their joint distribution then
If are two independent copies of the random variable X then
If X is a random vector with covariance matrix
then for Euclidean distance and i.i.d. copies
The cross-covariance matrix between two random n-vectors, X and Y, is
In this case,
can be considered as an overall measure of covariance. The present paper revisits the authors’ papers (Pronzato et al. 2017, 2018) with a straightforward use of the exterior product.
In the first part of the paper, we will consider standard vectors, that is, vectors extending from the origin so that simplices are formed with one vertex at the origin. But in the spirit of our previous work we return briefly in Sect. 5 to what we term affine simplices. For example, in one dimension this is the length of the line from point to point and a triangle in three or more dimensions is described by three points , away from the origin. Sections 5 and 6 cover generalised covariances and cross-covariances and Sect. 7 discusses a natural application to dispersion orderings.
2 Exterior algebra
Our calculations are based on n-dimensional base vector space over with vectors written as column vectors
Looking forward to the next section we will write a random vector in as and use independent identically distributed random (vector) copies of a random n-vector X; similarly for Y.
We label the standard unit vectors in as so that we may express a vector as:
Note that any independent basis may be used, but the standard basis is easier conceptually. The book (Darling 1994) is an excellent introduction.
The outer product of two vectors is written . Starting with basis vectors we write formal expression which lie in a formal vector space whose basis vectors are all ordered pairs . Then, we have the decomposition
The coefficients are the determinants of matrices from the appropriate entries of x and y and are signed areas of the triangles formed by the corresponding 2-vectors and the origin.
Starting with the basis of the following rules uniquely define the wedge product. Given real scalars a, b and vectors x, y, z
- 1.,
- 2.,
- 3..
We interpret the terms as an abstract coding or place-holder of the two dimensional space spanned by and , but assigned an orientation expressed by a sign. From the above axioms it follows that , so that
which shows the importance of signs.
The machinery extends to the space of high exterior powers and we define the pth wedge product for vectors by
where is the determinant giving the p-dimensional volumes for directions coordinated by the terms: :
A key construction for us is the inner product on . When , for in we define
The inner product on is defined as
where the inner product on the right hand side is the standard inner product. A matrix formulation is sometimes useful. Thus, the matrix is where .
In order to avoid too much notation we will refer to an ordered subset of size p, by , being careful to fix the context. Thus is the summation over all ordered subsets . This notation is also used to index marginal variables. Thus, is the vector with entries, in order, , and are the marginal random vectors corresponding to .
3 Expectations, generalised variances and covariances
For the random version of x in (2) we write
If denotes expectation with respect to the full joint distribution, then
We assume that all random vectors have zero mean to make formulae a little easier to handle. Thus we define the covariance matrix of a random X as
the cross covariance between random vectors X and Y as
and the full covariance matrix between X and Y as
Definition 3.1
For two random variables X, Y with values in define the generalised variances and the generalised covariance respectively by the following determinants: , and .
These definitions will be used for marginal vectors in dimension p for all , so that we write, for example . The following is essentially similar to the result in Pronzato (1998), but with an alternative proof.
Lemma 3.2
Let X and Y be two random p-vectors and let and be two sets of iid copies of X and Y, respectively. Then
Proof
The Sylvester formula for an inverse of an invertible matrix A is
where the (i, j) entry of the adjugate , is, with appropriate sign, the determinant of the cofactor formed by deleting rows i and j of A. If A is invertible and a, b are n-vectors we have the well known formula
We shall need the more general version which applies whether or not A necessarily invertible:
The proof now proceeds by induction on p. The case is immediate. Now,
The first term on the right is zero because the matrix does not have full rank. Then
where the transition (*) uses the independence between copies. Then, whether or not is invertible, the last formula reduces, by the property of adjugates, to:
as required. .
Recall our notation for margins, namely that and are the -margins of p independent copies of the n-vectors X and Y, respectively. Then, using the inner product in we have the key lemma of the paper.
Lemma 3.3
Let be independent copies of the extended base vector (X, Y). Then
where the sum is over all (ordered) index sets of size p.
Proof
The first equality is from the definition of the inner product. The second follows by expanding by the Binet–Cauchy theorem and applying Lemma 3.2 to every term. .
Replacing Y by X in the two lemmas replaces all cross-covariances matrices by covariance matrices: C(X, X), that is
Note that before taking expectation the quantity
is the volume of the p-dimensional simplex spanned by the , as studied in Pronzato et al. (2017). We thus have a decomposition of the expectation of the square of this volume and the covariances of the p-margins of the original random variable X. In the case of X, Y the wedge-product formula gives a new type of covariance based on product of the signed areas of two random simplices, one for X and one for Y.
4 Generalised cross-covariances and correlations
4.1 Definitions and a key property
As mentioned in the introduction, considered as a generalised cross-covariance is not as well-known as Wilks’s generalised variance . Despite this we can proceed to the following definition derived from Lemma 3.3.
Definition 4.1
The generalised p-cross covariance of two random n-vectors X and Y is defined as
and the p-covariance for X (similarly, for Y) as
where the summation is over all ordered p-index sets .
The only difference from the formula in Lemma 3.3 is the removal of the multiplier p!. Given the definitions of the p-generalised variances in Pronzato et al. (2017), we have the following natural definition:
Definition 4.2
The generalised p-correlation between random n-vectors X and Y is defined as
where the summations are over all ordered p-index sets .
It is easily established that
for all by using the requirement that the joint covariance matrix of X and Y must be non-negative definite.
An interesting analysis arises in the full n-dimensional case when for random n-vectors X, Y, , the identity. We may arrive at this special case en route to computing canonical correlation, and we shall refer to this case as being canonical. Thus, using spectral square roots, if we take two random n-vectors U, V and set and then C(X, Y) is the canonical cross-correlation matrix and the covariance matrix for (X, Y) is
This study of canonical correlation goes back to Hotelling (1992).
The fine structure of the relationship between X and Y can be studied via the cross-correlation matrix C(X, Y). We have the following lemmas.
Lemma 4.3
(i) For a n-vectors X, Y with , is a valid cross-correlation matrix if and only if:
where is the Loewner ordering, with equality if and only if:
which, in turn, holds if and only if
Proof
, so that all the eigenvalues of CCT are unity. This forces
and must be the identity projector. The converse is immediate.
The condition implies that C(X, Y) is a rotation: formally a member of the orthogonal group . So we have the informal statement that all extreme cross-correlations matrices, C, are related to rotations.
4.2 Two examples
Example 1
Let and consider the covariance matrix in canonical form above. Then
If then the general solution can be written
for an angle .
In this case the set of C(X, Y) is a representation of the rotation group, O(2). For multiples of , we have the subgroup which is the 16 dihedral order group, , of permutation and sign changes with elements and representations:
Example 2
Take , again in canonical form, and
so that C(X, Y) is a member of . We compute:
For example, if , then we have and
which is a member of , as expected.
4.3 The eigenvalues of C
The eigenvalues of C may be complex, but condition in Lemma (4.3) imposes restrictions.
Lemma 4.4
For n-vectors X, Y with every eigenvalue of the cross-correlation matrix satisfies .
Proof
We carry out the proof for the complex case. Let , with u and v real and , be the eigenvector corresponding to a . Then , the complex conjugate of , is the eigenvalue for the conjugate of z namely . Since and
and cancelling gives the result.
It is natural to ask whether in the canonical cross correlation case the matrix C(X, Y) has a representation which might be thought of as a kind of PCA for cross correlations. This is indeed the case but since C(X, Y) is not necessarily symmetric we need the Jordan form decomposition.
In the case that the eigenvalues of C(X, Y) are real and distinct there exists a matrix Q such that
and if there are repeated roots then has the usual Jordan block decomposition.
Complex eigenvalues occur in conjugate pairs: . For distinct conjugate pairs there is a version of the Jordan decomposition which gives a blocks of the form
with extended forms when complex roots are repeated.
When the roots of C(X, Y) are real we have the equivalent linear representation
But in the complex case we have for pairs :
Note, however, that the matrix is, in general, no longer the covariance between X and Y, but between QX and . That is, by transforming C(X, Y) to the canonical form may affect the canonical representation .
In several fields this analysis is used to indicate the presence of feedback. Examples are in control theory and the closely related Granger causality in economics. We can, of course have a mixture of both real and complex eigenvalues.
5 The affine case
In Pronzato et al. (2017) the authors consider what we call here the affine case, motivated by (1). To aid explanation consider the first interesting example, namely triangles in three dimension.
Consider three i.i.d. copies in , labelled as points A, B, C, respectively. They form a triangle ABC whose squared area is
In both cases we are considering the vectors from A to C and B to C. We can then expand by the Binet–Cauchy lemma and write the last expression as
This can be expressed using the wedge inner product as
It is natural to consider the covariance case, namely:
the expansion of which is
Taking expectations we see that our generalised 2-covariance is the expectation of a sum of products of signed areas from blades of dimension . We then adapt the analysis of Sect. 3 to the affine case by extending with a vector of ones, . Thus we replace vectors X by and use the general version of the formulae
Generalising the above argument, Lemma 3.3 is replaced by
Lemma 5.1
Let be independent copies of the base vector (X, Y). Then
When Y is replaced by X we obtain the main result in Pronzato et al. (2017). The results also extend in natural way to obtain an affine version of the development of the covariance representation in Sect. 4, with the analogous explanation in terms of the product of volumes of affine simplices.
6 Hodge star operator and the cross-covariance Pfaffian
The Hodge star construction, in the general case , shows that for elements in and in there is a mapping, called the Hodge star operator, which takes into its Hodge star dual, in such that
We study the case so that and both have dimension p. Taking expectation and suppressing we have the identity
Definition 6.1
Let be independent i.i.d. copies of possibly correlated p-vectors with cross covariance C. Define , equivalently, by (3) or (4) above, as the (generalised) dual cross-covariance of C.
Expand in determinant form, so that:
Then
From the Hodge star theory the values of are all known. In summary, each is a particular complementary base element of with an appropriate sign.
Then, rearranging (5) we transfer the star, again with appropriate sign, to , and write
We are now able to match terms in the Binet–Cauchy expansion in (6) and write
In particular, (7) gives a representation of in terms of determinants of covariance matrices, but with complementary index sets, rather than matched index sets as in Lemma 3.3.
Example 3
For and
we have
Example 4
For and
we obtain
It turns out that is a recognisable quantity which is the subject of considerable research with many application in diverse fields, namely the Pfaffian of C, see Dress and Wenzel (1995).
The Pfaffian of an antisymmetric square matrix (), is a special polynomial function of the entries of A, with integer coefficients, and with the property
In our case we set
The following is the main result of this section, the proof of which can be developed using the arguments above, but which will be included in a subsequent more technical version.
Lemma 6.2
If n is even, then the dual cross-covariance, , of the cross-covariance matrix C is equal to the Pfaffian of the antisymmetric matrix , and is the square root, with appropriate sign, of .
Proof
The following is a sketch. For n even, we first define a class of permutations that maps into blocks which consist of (disjoint) ordered pairs. For example, for we may have , the pairs being (1, 4) and (2, 3). Let for the ordered pairs are . Then for any antisymmetric matrix with we have
We then use the fact that the p pairs are independent i.i.d with mean zero. Many of the terms obtained by expanding the determinant in (7) are zero. Close inspection shows that the remaining terms give (8).
This representation shows that is a function of the differences: . In the case we have
We can check this is equal to the determinant representation above.
This points to being a rather special measure of the symmetry of C. The following is well known: for any real antisymmetric matrix A there is an orthogonal matrix Q such that has has the form of antisymmetric blocks on the diagonal, but with zero diagonal:
In our case and . In our earlier notation, we can consider
In addition,
which is the antisymmmetrized version of the the covariance of variables . Let . In this case and since we have
In summary, we can, after transformation, express as a simple measure of symmetry.
It has been mentioned several times that the main concept in this and the authors’ previous papers is to show that certain types of generalised variances and cross-covariances can be shown to be proportional to the expected volume, or squared volume, of random simplices. It should be pointed out, then, that the determinant in (7) is proportional to the (signed) volume of a random simplex in formed by p random pairs . From the properties of the Pfaffian this quantity is zero (for even n) if and only if the cross-covariance matrix, between X and Y is zero.
7 Stochastic dominance
Recall that standard stochastic dominance: is defined for univariate random variable U, V with cdf’s respectively if for all . Now, starting with the squared volume of the p-dimensional spanned by the columns of an matrix X there is a natural way to introduce a form of stochastic dominance, usually referred to as dispersion ordering. This is an extension of the version introduced in Giovagnoli and Wynn (1995) and studied by others eg Ayala and Lóópez-Díaz (2009).
Definition 7.1
For two random n-vectors and let and be the matrices whose columns are given by respectively p iid copies of and . Then define if and only if
Here we study the linear case by finding the class of matrices A such that if
for all Z which (with abuse of notation), would immediately imply
for any random vector Z.
If X is the matrix
Then,
and
So, we are required to find the class of matrices A such that
for all matrices X.
Let the SVD of be
where , the identity and is the vector of ordered eigen values .
We replace X by , so that
The required conditions on A then reduce to conditions on the . We single out the matrix holding the p largest eigenvalues: .
Theorem 7.2
For an matrix A
if and only if and only if .
Proof. Following the above working it is enough to show that for all matrices Y if and only if .
Split a matrix Y into matrix and matrix : .
(i). Assume first that for all matrices Y. Choose as the identity matrix and as an matrix of zeros. Then and .
(ii). Assume now that and that has full rank p. Expanding and d(Y) , we obtain
As and are non-degenerate,
This gives where and
As all diagonal elements of are smaller than or equal to , we obtain
Moreover, all diagonal elements of are smaller than or equal to ,
where these inequalities are valid in the Loewner sense.
Now since and the matrices and are non-degenerate,
and from (9), we obtain
The last two inequalities imply that imply the result.
Lemma 7.3
Assume that with for all i and C is positive definite. Then
Proof. It is enough to calim that is monotonic as a function of each .
Lemma 7.4
Assume with and u is a vector with all non-zero components. if and only if . This inequality is true despite the fact that is not true in general.
8 Conclusion
The expectation of the squared volume of random simplices formed by iid random vectors, is a natural generalisation of the expectation of squared length. In the latter case we obtain sums of variances (traces) and in the case of simplices the sums of the determinants of marginal covariance matrices. The expression in terms of determinants leads to a natural generalisation of Wilks’s generalised variances. Exterior algebra gives a framework in which marginal determinants can be handled, in a sense simultaneously, via a generalized inner product. There are two special developments: generalised covariances/correlations and application to generalised dispersion orderings.