Article Content

Introduction

Despite its use, in areas such as information geometry, the role of multilinear algebra in statistical theory has been limited. However, as soon as determinants arise in some statistical context, particularly in multivariate analysis, one can claim that we are using multilinear algebra or multilinear geometry. This is true of previous work of the authors (Pronzato et al. 2017, 2018, 2019) which related the expected volume of random simplices, represented by determinants, to the determinants of covariance matrices and marginal covariance matrices; see also Gillard et al. (2022) where the technique of simplicial distances developed in Pronzato et al. (2017, 2018) has been used for detection of outliers and cluster analysis. The expected volumes of simplices have also played a part in definitions of dispersion orderings in previous work (Giovagnoli and Wynn 1995). The ideas can be traced back to the seminal work of Hotelling (1992) in canonical correlation analysis (CCA) and Wilks (Wilks 1932, 1960) in generalised variance. Results of Sect. 4 dealing with cross-covariances can be used in widening the interpretations of the techniques of the standard CCA as well as various extensions of CCA including the regularized CCA (Tenenhaus and Tenenhaus 2011) and deep CCA (Andrew et al. 2013). Note also an extensive use of cross-covariances in the methodology of time series analysis and forecasting called singular spectrum analysis, see Golyandina and Zhigljavsky (2013); Golyandina et al. (2018). The main aim of this paper is to promote the idea that exterior algebra is a natural environment in which to study and extend formulae and show that the inner product in exterior algebra is the key formula for our purposes.

We start with an elementary discussion. Thus, in statistics and probability theory variances and covariances are closely related to metrics. If X and Y are two jointly distributed one-dimensional random variables and  denotes expectation with respect to their joint distribution then

If  are two independent copies of the random variable X then

If X is a random vector with covariance matrix

then for Euclidean distance and i.i.d. copies 

(1)

The cross-covariance matrix between two random n-vectors, X and Y, is

In this case,

can be considered as an overall measure of covariance. The present paper revisits the authors’ papers (Pronzato et al. 2017, 2018) with a straightforward use of the exterior product.

In the first part of the paper, we will consider standard vectors, that is, vectors extending from the origin so that simplices are formed with one vertex at the origin. But in the spirit of our previous work we return briefly in Sect. 5 to what we term affine simplices. For example, in one dimension this is the length of the line from point  to point  and a triangle in three or more dimensions is described by three points , away from the origin. Sections 5 and 6 cover generalised covariances and cross-covariances and Sect. 7 discusses a natural application to dispersion orderings.

Exterior algebra

Our calculations are based on n-dimensional base vector space  over  with vectors  written as column vectors

(2)

Looking forward to the next section we will write a random vector in  as  and use independent identically distributed random (vector) copies of a random n-vector X; similarly for Y.

We label the standard unit vectors in  as  so that we may express a vector  as:

Note that any independent basis may be used, but the standard basis is easier conceptually. The book (Darling 1994) is an excellent introduction.

The outer product of two vectors  is written . Starting with basis vectors we write formal expression which lie in a formal vector space  whose basis vectors are all ordered pairs . Then, we have the decomposition

The coefficients are the determinants of  matrices from the appropriate entries of x and y and are signed areas of the triangles formed by the corresponding 2-vectors and the origin.

Starting with the basis  of  the following rules uniquely define the wedge product. Given real scalars ab and vectors xyz

  1. 1.,
  2. 2.,
  3. 3..

We interpret the terms  as an abstract coding or place-holder of the two dimensional space spanned by  and , but assigned an orientation expressed by a sign. From the above axioms it follows that , so that

which shows the importance of signs.

The machinery extends to the space of high exterior powers  and we define the pth wedge product for vectors  by

where  is the determinant giving the p-dimensional volumes for directions coordinated by the terms: :

A key construction for us is the inner product on . When , for  in  we define

The inner product on  is defined as

where the inner product on the right hand side is the standard inner product. A matrix formulation is sometimes useful. Thus, the matrix  is  where .

In order to avoid too much notation we will refer to an ordered subset of size p by , being careful to fix the context. Thus  is the summation over all  ordered subsets . This notation is also used to index marginal variables. Thus,  is the vector with entries, in order, , and  are the marginal random vectors corresponding to .

Expectations, generalised variances and covariances

For the random version of x in (2) we write

If  denotes expectation with respect to the full joint distribution, then

We assume that all random vectors have zero mean to make formulae a little easier to handle. Thus we define the covariance matrix of a random X as

the cross covariance between random vectors X and Y as

and the full covariance matrix between X and Y as

Definition 3.1

For two random variables XY with values in  define the generalised variances and the generalised covariance respectively by the following determinants:  and .

These definitions will be used for marginal vectors  in dimension p for all , so that we write, for example . The following is essentially similar to the result in Pronzato (1998), but with an alternative proof.

Lemma 3.2

Let X and Y be two random p-vectors and let  and  be two sets of iid copies of X and Y, respectively. Then

Proof

The Sylvester formula for an inverse of an invertible  matrix A is

where the (ij) entry of the adjugate , is, with appropriate sign, the determinant of the  cofactor formed by deleting rows i and j of A. If A is invertible and ab are n-vectors we have the well known formula

We shall need the more general version which applies whether or not A necessarily invertible:

The proof now proceeds by induction on p. The case  is immediate. Now,

The first term on the right is zero because the matrix does not have full rank. Then

where the transition (*) uses the independence between copies. Then, whether or not  is invertible, the last formula reduces, by the property of adjugates, to:

as required. .

Recall our notation for margins, namely that  and  are the -margins of p independent copies of the n-vectors X and Y, respectively. Then, using the inner product in  we have the key lemma of the paper.

Lemma 3.3

Let  be independent copies of the extended base vector (XY). Then

where the sum is over all (ordered) index sets  of size p.

Proof

The first equality is from the definition of the inner product. The second follows by expanding  by the Binet–Cauchy theorem and applying Lemma 3.2 to every term. .

Replacing Y by X in the two lemmas replaces all cross-covariances matrices by covariance matrices: C(XX), that is

Note that before taking expectation the quantity

is the volume of the p-dimensional simplex spanned by the , as studied in Pronzato et al. (2017). We thus have a decomposition of the expectation of the square of this volume and the covariances of the p-margins of the original random variable X. In the case of XY the wedge-product formula gives a new type of covariance based on product of the signed areas of two random simplices, one for X and one for Y.

Generalised cross-covariances and correlations

4.1 Definitions and a key property

As mentioned in the introduction,  considered as a generalised cross-covariance is not as well-known as Wilks’s generalised variance . Despite this we can proceed to the following definition derived from Lemma 3.3.

Definition 4.1

The generalised p-cross covariance of two random n-vectors X and Y is defined as

and the p-covariance for X (similarly, for Y) as

where the summation is over all ordered p-index sets .

The only difference from the formula in Lemma 3.3 is the removal of the multiplier p!. Given the definitions of the p-generalised variances in Pronzato et al. (2017), we have the following natural definition:

Definition 4.2

The generalised p-correlation between random n-vectors X and Y is defined as

where the summations are over all ordered p-index sets .

It is easily established that

for all  by using the requirement that the joint covariance matrix of X and Y must be non-negative definite.

An interesting analysis arises in the full n-dimensional case when for random n-vectors XY, the  identity. We may arrive at this special case en route to computing canonical correlation, and we shall refer to this case as being canonical. Thus, using spectral square roots, if we take two random n-vectors UV and set  and  then C(XY) is the canonical cross-correlation matrix and the covariance matrix for (XY) is

This study of canonical correlation goes back to Hotelling (1992).

The fine structure of the relationship between X and Y can be studied via the cross-correlation matrix C(XY). We have the following lemmas.

Lemma 4.3

(i) For a n-vectors XY with  is a valid cross-correlation matrix if and only if:

where  is the Loewner ordering, with equality if and only if:

which, in turn, holds if and only if

Proof

, so that all the eigenvalues of CCT are unity. This forces

and  must be the identity projector. The converse is immediate. 

The condition  implies that C(XY) is a rotation: formally a member of the orthogonal group . So we have the informal statement that all extreme cross-correlations matrices, C, are related to rotations.

4.2 Two examples

Example 1

Let  and consider the covariance matrix in canonical form above. Then

If  then the general solution can be written

for an angle .

In this case the set of C(XY) is a representation of the rotation group, O(2). For multiples of , we have the subgroup which is the 16 dihedral order group, , of permutation and sign changes with elements and representations:

Example 2

Take , again in canonical form, and

so that C(XY) is a member of . We compute:

For example, if , then we have  and

which is a member of , as expected.

4.3 The eigenvalues of C

The eigenvalues of C may be complex, but condition  in Lemma (4.3) imposes restrictions.

Lemma 4.4

For n-vectors XY with  every eigenvalue  of the cross-correlation matrix  satisfies .

Proof

We carry out the proof for the complex case. Let , with u and v real and , be the eigenvector corresponding to a . Then , the complex conjugate of , is the eigenvalue for the conjugate of z namely . Since  and 

and cancelling  gives the result. 

It is natural to ask whether in the canonical cross correlation case the matrix C(XY) has a representation which might be thought of as a kind of PCA for cross correlations. This is indeed the case but since C(XY) is not necessarily symmetric we need the Jordan form decomposition.

In the case that the eigenvalues  of C(XY) are real and distinct there exists a matrix Q such that

and if there are repeated roots then  has the usual Jordan block decomposition.

Complex eigenvalues occur in conjugate pairs: . For distinct conjugate pairs  there is a version of the Jordan decomposition which gives  a blocks of the form

with extended forms when complex roots are repeated.

When the roots of C(XY) are real we have the equivalent linear representation

But in the complex case we have for pairs :

Note, however, that the matrix  is, in general, no longer the covariance between X and Y, but between QX and . That is, by transforming C(XY) to the canonical form may affect the canonical representation .

In several fields this analysis is used to indicate the presence of feedback. Examples are in control theory and the closely related Granger causality in economics. We can, of course have a mixture of both real and complex eigenvalues.

The affine case

In Pronzato et al. (2017) the authors consider what we call here the affine case, motivated by (1). To aid explanation consider the first interesting example, namely triangles in three dimension.

Consider three i.i.d. copies  in , labelled as points ABC, respectively. They form a triangle ABC whose squared area is

In both cases we are considering the vectors from A to C and B to C. We can then expand by the Binet–Cauchy lemma and write the last expression as

This can be expressed using the wedge inner product as

It is natural to consider the covariance case, namely:

the expansion of which is

Taking expectations we see that our generalised 2-covariance is the expectation of a sum of products of signed areas from blades of dimension . We then adapt the analysis of Sect. 3 to the affine case by extending with a vector of ones, . Thus we replace vectors X by  and use the general version of the formulae

Generalising the above argument, Lemma 3.3 is replaced by

Lemma 5.1

Let  be independent copies of the base vector (XY). Then

When Y is replaced by X we obtain the main result in Pronzato et al. (2017). The results also extend in natural way to obtain an affine version of the development of the covariance representation in Sect. 4, with the analogous explanation in terms of the product of volumes of affine simplices.

Hodge star operator and the cross-covariance Pfaffian

The Hodge star construction, in the general case , shows that for elements  in  and  in  there is a mapping, called the Hodge star operator, which takes  into its Hodge star dual,  in  such that

We study the case  so that  and  both have dimension p. Taking expectation and suppressing  we have the identity

(3)
(4)

Definition 6.1

Let  be independent i.i.d. copies of possibly correlated p-vectors with cross covariance C. Define , equivalently, by (3) or (4) above, as the (generalised) dual cross-covariance of C.

Expand in determinant form, so that:

Then

(5)

From the Hodge star theory the values of  are all known. In summary, each  is a particular complementary base element of  with an appropriate sign.

Then, rearranging (5) we transfer the star, again with appropriate sign, to , and write

(6)

We are now able to match terms in the Binet–Cauchy expansion in (6) and write

(7)

In particular, (7) gives a representation of  in terms of determinants of  covariance matrices, but with complementary index sets, rather than matched index sets as in Lemma 3.3.

Example 3

For  and

we have

Example 4

For  and

we obtain

It turns out that  is a recognisable quantity which is the subject of considerable research with many application in diverse fields, namely the Pfaffian of C, see Dress and Wenzel (1995).

The Pfaffian  of an antisymmetric square matrix (), is a special polynomial function of the entries of A, with integer coefficients, and with the property

In our case we set

The following is the main result of this section, the proof of which can be developed using the arguments above, but which will be included in a subsequent more technical version.

Lemma 6.2

If n is even, then the dual cross-covariance, , of the  cross-covariance matrix C is equal to the Pfaffian of the antisymmetric matrix , and is the square root, with appropriate sign, of .

Proof

The following is a sketch. For n even, we first define a class of permutations  that maps  into blocks which consist of (disjoint) ordered pairs. For example, for  we may have , the pairs being (1, 4) and (2, 3). Let for  the ordered pairs are . Then for any antisymmetric  matrix  with  we have

(8)

We then use the fact that the p pairs  are independent i.i.d with mean zero. Many of the terms obtained by expanding the determinant in (7) are zero. Close inspection shows that the remaining terms give (8).

This representation shows that  is a function of the differences: . In the case  we have

We can check this is equal to the determinant representation above.

This points to  being a rather special measure of the symmetry of C. The following is well known: for any real  antisymmetric matrix A there is an orthogonal matrix Q such that  has has the form of  antisymmetric blocks on the diagonal, but with zero diagonal:

In our case  and . In our earlier notation, we can consider

In addition,

which is the antisymmmetrized version of the the covariance  of variables . Let . In this case  and since  we have

In summary, we can, after transformation, express  as a simple measure of symmetry.

It has been mentioned several times that the main concept in this and the authors’ previous papers is to show that certain types of generalised variances and cross-covariances can be shown to be proportional to the expected volume, or squared volume, of random simplices. It should be pointed out, then, that the determinant in (7) is proportional to the (signed) volume of a random simplex in  formed by p random pairs . From the properties of the Pfaffian this quantity is zero (for even n) if and only if the cross-covariance matrix,  between X and Y is zero.

Stochastic dominance

Recall that standard stochastic dominance:  is defined for univariate random variable UV with cdf’s  respectively if  for all . Now, starting with the squared volume  of the p-dimensional spanned by the columns of an  matrix X there is a natural way to introduce a form of stochastic dominance, usually referred to as dispersion ordering. This is an extension of the version introduced in Giovagnoli and Wynn (1995) and studied by others eg Ayala and Lóópez-Díaz (2009).

Definition 7.1

For two random n-vectors  and  let  and  be the matrices whose columns are given by respectively p iid copies of  and . Then define  if and only if

Here we study the linear case by finding the class of  matrices A such that if

for all Z which (with abuse of notation), would immediately imply

for any random vector Z.

If X is the  matrix

Then,

and

So, we are required to find the class of  matrices A such that

for all  matrices X.

Let the SVD of  be

where , the  identity and  is the vector of ordered eigen values .

We replace X by , so that

The required conditions on A then reduce to conditions on the . We single out the  matrix holding the p largest eigenvalues: .

Theorem 7.2

For an  matrix A

if and only if and only if .

Proof. Following the above working it is enough to show that  for all  matrices Y if and only if .

Split a  matrix Y into  matrix  and  matrix .

(i). Assume first that  for all  matrices Y. Choose  as the identity  matrix and  as an  matrix of zeros. Then  and .

(ii). Assume now that  and that  has full rank p. Expanding  and d(Y) , we obtain

As  and  are non-degenerate,

This gives  where  and

As all diagonal elements of  are smaller than or equal to , we obtain

(9)

Moreover, all diagonal elements of  are smaller than or equal to ,

where these inequalities are valid in the Loewner sense.

Now since  and the matrices  and  are non-degenerate,

and from (9), we obtain

The last two inequalities imply that  imply the result.

Lemma 7.3

Assume that  with  for all i and C is positive definite. Then

Proof. It is enough to calim that  is monotonic as a function of each 

Lemma 7.4

Assume  with  and u is a vector with all non-zero components.  if and only if . This inequality is true despite the fact that  is not true in general.

 

Conclusion

The expectation of the squared volume of random simplices formed by iid random vectors, is a natural generalisation of the expectation of squared length. In the latter case we obtain sums of variances (traces) and in the case of simplices the sums of the determinants of marginal covariance matrices. The expression in terms of determinants leads to a natural generalisation of Wilks’s generalised variances. Exterior algebra gives a framework in which marginal determinants can be handled, in a sense simultaneously, via a generalized inner product. There are two special developments: generalised covariances/correlations and application to generalised dispersion orderings.

WhatsApp