Department of Ecology and Evolution

State University of New York at Stony Brook, Stony Brook, NY 11794-5245

e-mail:

Fred L. Bookstein

Institute of Gerontology

University of Michigan. Ann Arbor, MI 48109-2007

e-mail:

Leslie F. Marcus†

F. James Rohlf

Department of Ecology and Evolution

State University of New York at Stony Brook, Stony Brook, NY 11794-5245

This glossary provides definitions for terms, concepts, and methods
frequently encountered in morphometric literature and discussions.
It includes entries for technical terms with more-or-less special
meaning in shape analysis and biological morphometrics (e.g.,
preshape, warps, anisotropy) and some of the casual jargon that
may be completely foreign to newcomers to the field (e.g., books
of various color - Red, Blue, Orange, and Black). Many definitions
provide the general idea behind each entry instead of a technically
or mathematically rigorous treatment. As such, they are intended
to give readers an intuitive understanding of a particular entry
that will allow them to follow the main ideas in the literature
without becoming unduly distracted, at first, with technical details.
Unless otherwise indicated, the following general notation has
been used: *n* - number of specimens, *p* - number of
points/landmarks, *k* - number of dimensions, a superscript *t* will
refer to the transpose of a matrix (e.g., $$**A**^{t},
but that may not be displayed properly by
all WWW browsers).
Members of
the morphometrics community, especially the subscribers to the
MORPHMET electronic mailing list, have helped greatly in the selection
of terms to be included in the glossary.

Note: many of the mathematical symbols and equations are patched into this file as images since HTML (the language used to prepare these WWW pages) does not support mathematical symbols. For this reason, many symbols will not appear on text-only WWW browsers and may not line-up well with the rest of the text. Superscripts and subscripts will not display properly on all WWW browsers.

† Deceased.

- In a relative warps analysis, this is the exponent used to rescale partial warps before computing their principal components, the relative warps (see Rohlf's chapter in the Black Book)). Scale invariant multivariate analyses using rescaled principal warp scores, such as canonical variates analysis, are not affected by the choice of (see Rohlf, 1996, NATO volume "white book").

-** **The
Kronecker tensor product or direct product. The Kronecker tensor product of
matrices **X** and **Y**, written as **X** **Y**,
results in a large matrix formed by taking all possible products of the elements
of **X** and those of **Y**. For example, if **X** and **Y** are
2x2 then **X** **Y** results in a 4x4 matrix:

**accuracy **- The closeness of a measurement or estimate to
its true value. See precision.

**affine superimposition** - A superimposition for which the
associated transformations are all affine. See affine transformation.

**affine transformation** - A transformation
for which parallel lines remain parallel. Affine transformations of the plane take
squares into parallelograms and take circles into ellipses of
the same shape. Affine transformations of a
3-dimensional space take cubes into parallelopipeds (sheared bricks)
and spheres into ellipsoids all of the same shape.
Similar results are produced in higher dimensional spaces. Equivalent
to "uniform transformation".

As far as form is concerned (that is, ignoring translation
and rotation), any affine transformation can be diagrammed
as a *pure strain* taking a square to a rectangle on the same
axes. In studies of shape, where scale is ignored as well, the picture
is the same but now the sum of the squares of the axes is unchanging.
Still ignoring scale (that is, as far as shape is concerned),
any affine transformation can be also diagrammed as a *pure shear*
taking a square into a parallelogram of unchanged base segment
and height. This diagram of shear came into morphometrics
via an application to principal components analysis somewhat
before it was applied to landmark-based shape (see shear,
Kendall's shape space, and tangent space).

**allometry** - Any change of shape with size. It describes
any deviation of the bivariate relation from the simple functional form *y*/*x*
= *c*, where *c* is a constant and *x* and *y*
are size measures in units of the same dimension. See Klingenberg, 1996,
NATO volume "white book".

**anisotropy** - Anisotropy is a descriptor of one aspect of
an affine transformation. In two dimensions, this is the ratio
of the axes of the ellipse into which a circle is transformed
by an affine transformation. In general, it is the maximum ratio
of extension of length in one direction to extension in a perpendicular
direction.

**asymptotically unbiased estimator**
- An estimator, , with an expected value that converges
in probability on the parametric value it is estimating, ,
as sample size goes to infinity:
as . See unbiased
estimator and consistent estimator.

**baseline** - For a system of
two-point shape coordinates
for landmarks in a plane, the baseline is the line
connecting the pair of landmarks that are assigned to fixed locations
(0,0) and (1,0) in the construction. In general, baselines work
better if they are closely aligned with the long axis of the mean
landmark shape and pass near the centroid of that mean shape (see
the Orange Book).

**bending energy** - Bending energy is a metaphor borrowed
for use in morphometrics from the mechanics of thin metal plates.
Imagine a configuration of landmarks that has been printed on
an infinite, infinitely thin, flat metal plate, and suppose that
the differences in coordinates of these same landmarks in another
picture are taken as vertical displacements of this plate perpendicular
to itself, one Cartesian coordinate at a time. The bending energy
of one of these out-of-plane "shape changes" is the
(idealized) energy that would be required to bend the metal plate
so that the landmarks were lifted or lowered appropriately.

While in physics bending energy is a real quantity, measured in appropriate units (g cm2 sec-2), there is an alternate formula that remains meaningful in morphometrics: bending energy is proportional to the integral of the summed squared second derivatives of the "vertical" displacement - the extent to which it varies from a uniform tilt. The bending energy of a shape change is the sum of the bending energies that apply to any two perpendicular coordinates in which the metaphor is evaluated. The bending energy of an affine transformation is zero since it corresponds to a tilting of the plate without any bending. The value obtained for the bending energy corresponding to a given displacement is inversely proportional to scale. Such quantities should not be interpreted as measures of dissimilarity (e.g., taxonomic or evolutionary distance) between two forms.

**bending energy matrix** - The formula for bending energy
(see above) - the formula whose value is proportional to that
integral of those summed squared second derivatives - is a quadratic
form (usually written $$**L**_{k}^{-1})
determined by the coordinates of the landmarks
of the reference form. That is, if **h** is a vector describing
the heights of a plate above a set of landmarks, then bending
energy is $$**h**^{t}**L**_{k}^{-1}**h**.
In morphometrics, the bending energy of a general
transformation is the sum
$$**x**^{t}**L**_{k}^{-1}**x**
+**y**^{t}**L**_{k}^{-1}**y**
of the bending energy of its horizontal
*x*-component, modeled as a "vertical" plate, plus
the bending energy of its vertical *y*-component, modeled
similarly as a "vertical" plate.

**biplot **- A single diagram that represents two separate
scatterplots on the same pair of axes. One scatter is of some
pair of columns of the matrix **U** of the singular value decomposition
of a matrix **S**, and the other scatter is of the matching
pair of columns of **V**. When **S** is a centered data
matrix, the effect is to plot principal component loadings and
scores on the same diagram. See Marcus (Black Book) for an in
depth discussion.

**Black Book **- Marcus, L. F., E. Bello, A. García-Valdecasas
(eds.). 1993. *Contributions to Morphometrics*. Museo Nacional
de Ciencias Naturales Monografias: Madrid.

See also Blue Book, Orange Book, Red Book, and Reyment's Black Book.

**Blue Book** - Rohlf, F. J. and F. L. Bookstein (eds.). 1990.
*Proceedings of the Michigan Morphometrics Workshop*. Special
Publication No. 2, University of Michigan Museum of Zoology: Ann
Arbor.

See also Black Book, Orange Book, Red Book, and Reyment's Black Book.

**Bookstein coordinates** - See
two-point shape coordinates.

**canonical** - A canonical description of any statistical
situation is a description in terms of extracted vectors that
have especially simple ordered relationships. For instance, a
canonical correlations analysis describes the relation between
two lists of variables in terms of two lists of linear combinations
that show a remarkable pattern of zero correlations. Each score
(linear combination) from either list is correlated with no other
combination from its list and with only one score from the other
list.

**canonical correlation analysis** - A multivariate method
for assessing the associations between two sets of variables within
a data set. The analysis focuses on pairs of linear combinations
of variables (one for each set) ordered by the magnitude of their
correlations with each other. The first such pair is determined
so as to have the maximal correlation of any such linear combinations.
Subsequent pairs have maximal correlation subject to the constraint
of being orthogonal to those previously determined.

**canonical variates analysis** - A method of multivariate
analysis in which the variation among groups is expressed relative
to the pooled within-group covariance matrix. Canonical variates
analysis finds linear transformations of the data which maximize
the among group variation relative to the pooled within-group
variation. The canonical variates then may be displayed as an
ordination to show the group centroids and scatter within groups.
This may be thought of as a "data reduction" method
in the sense that one wants to describe among group differences
in few dimensions. The canonical variates are uncorrelated, however
the vectors of coefficients are not orthogonal as in Principal
Component Analysis. The method is closely related to multivariate
analysis of variance (MANOVA), multiple discriminant analysis,
and canonical correlation analysis. A critical assumption is
that the within-group variance-covariance structure is similar,
otherwise the pooling of the data over groups is not very sensible.

**Centroid Size** - Centroid Size is the square root of the
sum of squared distances of a set of landmarks from their centroid,
or, equivalently, the square root of the sum of the variances
of the landmarks about that centroid in *x*- and *y*-directions.
Centroid Size is used in geometric morphometrics because it is approximately
uncorrelated with every shape variable when landmarks are distributed
around mean positions by independent noise of the same small variance
at every landmark and in every direction. Centroid Size is the
size measure used to scale a configuration of landmarks so they
can be plotted as a point in Kendall's shape space. The denominator
of the formula for the Procrustes distance between two sets of
landmark configurations is the product of their Centroid Sizes.

**cluster analysis** - A method of analysis that represents
multivariate variation in data as a series of sets. In biology,
the sets are often constructed in a hierarchical manner and shown in
the form of a tree-like diagram called a dendrogram.

**coefficient** - A coefficient, in general, is a number multiplying
a function. In multivariate data analysis, usually the "function"
is a variable measured over the cases of the analysis, and the
coefficients multiply these variable values before we add them
up to form a score. A coefficient is not the same as a loading.

**complex numbers** Complex numbers are an algebraic way of
coding points in the ordinary Euclidean plane so that translation
(shift of position) corresponds to the addition of complex numbers
and both rescaling (enlargement or shrinking) and rotation correspond
to multiplication of complex numbers. In this system of notation,
invented by Gauss, the *x*-axis is identified with the "real
numbers" (ordinary decimals numbers) and the *y*-axis
is identified with "imaginary numbers" (the square
roots of negative numbers). When you multiply points on this
axis by themselves according to the rules, you get negative points
on the "real" axis just defined. Many operations on
data in two dimensions can be proved valid more directly if they
are written out as operations on complex numbers.

**consensus configuration **- A single set of landmarks intended
to represent the central tendency of an observed sample for the
production of superimpositions, of a weight matrix, or some other
morphometric purpose. Often a consensus configuration is computed
to optimize some measure of fit to the full sample: in particular,
the Procrustes mean shape is computed to minimize the sum of squared
Procrustes distances from the the consensus landmarks to those of the sample.

**consistent estimator **- An estimator,
, that converges in probability on the parametric value
it is estimating, , as sample size goes to infinity:
for any positive *. *Asymptotically unbiased estimators are consistent
estimators if their variance goes to zero as sample size goes to infinity. See
unbiased estimator.

**coordinates** - A set of parameters that locate a point in
some geometrical space. Cartesian coordinates, for instance,
locate a point on a plane or in physical space by projection onto
perpendicular lines through one single point, the origin. The
elements of any vector may be thought of as coordinates in a geometric
sense.

**correlation **- Relation between two or more
variables. Frequently the word is used for Pearson's product-moment correlation
which is the covariance divided by the product of the standard deviations, .
This correlation coefficient is +1 or -1 when all values fall on a straight
line, not parallel to either axis. However, there are also Kendall, Spearman,
tetrachoric, etc. correlations which measure other aspects of the relation between
two variables.

**covariance** - A measure of the degree to which two variables vary together.
Computed as for
two variables X and Y in a sample of size *n*. See correlation.

**covariant** - A covariant of a particular shape change is
a shape variable whose gradient vector as a function of changes
in any complete set of shape coordinates lies precisely along
the change in question.

For transformations of triangles, the relation between invariants and covariants is a rotation by 90 degrees in the shape-coordinate plane. For more than three landmarks, a given transformation has only one direction of covariants, but a full plane (four landmarks) or hyperplane (five or more landmarks) of invariants (see the Orange Book). See invariant.

**curved space** - A space with coordinates and a distance
function such that the area of circles, volume of spheres, etc.
are not proportional to the appropriate power of the radius, e.
g., Kendall's shape space. In curved spaces, the usual intuitions
about what "straight lines" can be expected to do will
be faulty. For instance, corresponding to every triangular shape
in Kendall's shape space, there is another that is "as far
from it as possible," just like there is a point on the surface
of the earth as far as possible from where you now sit.

* D* - See 1) generalized distance or 2) fractal dimension.

$$**D**^{2} - Squared Mahalanobis, or generalized,
distance.

**deficient coordinate** - In addition to landmark locations,
a digitizer can be used to supply information of other sorts.
For example, a point can be used to encode part of the information
about a curving arc by identifying the spot at which the arc lies
farthest from some other image structure (perhaps another such
curving arc). The null model of independent Gaussian noise does
not apply to position along the tangent direction of the curve
that is digitized in this way, and so that Cartesian coordinate
is "deficient." The usual model of independent Gaussian
noise is inapplicable in principle for such points.
See Type III landmark.

**degrees of freedom** - Given a set of parameters estimated from the data,
the "degrees of freedom" of some statistic is the number of independent
observations *required* to compute the statistic. For example, the variance
has *n*-1 degrees of freedom because only *n*-1 of the observations
are needed for its computation given the sample mean. The missing observation
can be computed as .

**dilation** - Increase of length in a particular direction,
or along a particular interlandmark segment.

**discriminant analysis** - A broad class of methods concerned
with the development of rules for assigning unclassified objects/specimens
to previously defined groups. See discriminant function.

**discriminant function **- A discriminant function is used
to assign an observation to one of a set of groups. Linear discriminant
functions take a vector of observations from a specimen and multiplies
it by a vector of coefficients to produce a score which can be
used to classify the specimen as belonging to one or another predefined
group. See discriminant analysis.

**distance** - This term has several meanings in morphometrics;
it should never be used without a prefixed adjective to qualify
it, e.g., Euclidean distance, Mahalanobis distance, Procrustes
distance, taxonomic distance.

**edgel** - An extension of the notion of landmark to include
partial information about a curve through the landmark. An edgel
specifies rotation of a direction through a landmark, extension
along a direction through a landmark, or both. The formula for
thin-plate splines on landmarks can be extended to encompass data
about edgels as well. They are intended eventually to circumvent
any need for deficient coordinates in multivariate morphometric
analysis. See Little (1996, NATO volume "white book") and Bookstein and Green, 1993,
A feature space for edgels in images with landmarks, *Journal
of Mathematical Imaging and Vision* 3: 231-261.

**EDMA** - See euclidean distance matrix analysis.

**eigenshapes** - Principal components for outline data. An
eigenshape analysis begins with the selection of a distance function
between pairs of outlines. At the end one gets "eigenshapes,"
which have the properties of principal component vectors (uncorrelated,
describing the sample in decreasing order of variance) and also
are outline shapes themselves, so that the scores for each specimen
of the sample can be combined to produce a new outline shape that
approximates it in some possibly useful way. Eigenshapes apply
to curves as relative warps apply to landmark shape. See the chapter
by Lohmann and Schweitzer in the Blue Book and that by Sampson, 1996,
NATO volume "white book".

**eigenvalues** - Eigenvalues, ,
are the diagonal elements of the diagonal matrix in the equation: .
In the common data analysis case, **S** is a symmetrical variance-covariance
matrix, **E** is a matrix of eigenvectors, , and .
The order of the columns of **E** and is arbitrary, but by convention they
are usually sorted from largest to smallest eigenvalue. See eigenvectors
and singular value decomposition.

**eigenvectors **- In the equation given to define eigenvalues,
**E** contains the eigenvectors. In the common data analysis
case, **E** is an orthonormal matrix (i. e.,
$$**E**^{t}**E**=**I**
and $$**EE*** ^{t}*=

**elliptic Fourier analysis** - A type of outline analysis
in which differences in *x* and *y* (and possibly *z*)
coordinates of an outline are fit separately as a function of
arc length by Fourier analysis. The chapter by Rohlf in the
blue book provides an overview of various methods of fitting curves
to outline data.

**Euclidean distance** - Defined as:
for coordinates of points $$*x _{l}* and $$

**euclidean distance matrix analysis **--EDMA. A method for
the statistical analysis of full matrices of all interlandmark
distances, averaging elementwise within samples, and then comparing
those averages between samples by computing the ratios of corresponding
mean distances. See Lele, S. and J. T. Richtsmeier, 1991, Euclidean
distance matrix analysis: a coordinate free approach for comparing
biological shapes using landmark data, *American Journal of
Physical Anthropology*, 86:415-428.

**Euclidean space** - A space where distances between two points
are defined as Euclidean distances in some system of coordinates.

**factor analysis** - Factor analysis is a multivariate
technique for describing a set of measured variables
in terms of a set of causal or underlying variables. A factor
model can be characterized in terms of path diagrams to show relations
between measured variables and factors. See the chapter by Marcus
in the Blue Book and Reyment and Joreskog, 1993, *Applied Factor
Analysis in the Natural Sciences*, Cambridge University Press:
Cambridge, United Kingdom.

**FESA** - See finite element scaling analysis.

**fiber** - The set of preshapes (configurations that have
been centered at the origin and scaled to unit centroid size)
that differ only by a rotation. It is the path, through preshape
space, followed by a centered and scaled configuration under all
possible rotations.

**figure **- A representation of an object by the coordinates
of a specified set of points, the landmarks.

**figure space** - The 2*p*- or 3*p*-space of figures,
i. e., the original coordinate data vectors.

**finite element scaling analysis** - Without the word "scaling,"
finite element analysis is a computational system for continuum
mechanics that estimates the deformation (fully detailed changes
of position of all component particles) that are expected to result
from a specified pattern of stresses (forces) upon a mechanical
system. As applied in morphometrics, FESA solves the inverse
problem of estimating the strains representing the hypothetical
forces that deformed one specimen into another. These results
are a function of the "finite elements" into which the
space between the landmarks is subdivided. FESA can be compared
with the thin-plate spline, which interpolates a set of landmark
coordinates under an entirely different set of assumptions.

**form **- In morphometrics, we represent the form of an object
by a point in a space of form variables, which are measurements
of a geometric object that are unchanged by translations and rotations.
If you allow for reflections, forms stand for all the figures
that have all the same interlandmark distances. A form is usually
represented by one of its figures at some specified location and
in some specified orientation. When represented in this way, location
and orientation are said to have been "removed."

**form space **- The space of figures with differences due
to location and orientation removed. It is of 2*p*-3 dimensions
for two-dimensional coordinate data and 3*p*-6 dimensions for three-dimensional
coordinate data.

**Fourier analysis** - In morphometrics, the decomposition
of an outline into a weighted sum of sine and cosine functions.
The chapter by Rohlf in the Blue Book provides an overview of
this and other methods of analyzing outline data.

**fractal dimension **- *D*. A measure of the complexity
of a structure assuming a consistent pattern of self-similarity
(structural complexity at smaller scales is mathematically indistiguishable
from that at larger-scales) over all scales considered. See the
chapter by Slice in the Black Book.

**generalized distance** - *D*. A synonym
for Mahalanobis distance. Defined by the equation for two row vectors $$**x**_{i}
and $$**x**_{j} for two individuals, and *p*
variables as: ,
where **S** is the *p*x*p* variance-covariance matrix. It takes
into consideration the variance and correlation of the variables in measuring
distances between points, i. e., differences in directions in which there is
less variation within groups are given greater weight than are differences in
directions in which there is more variation.

**generalized superimposition **- The superimposition of a
set of configurations onto their consensus configuration. The
fitting may involve least-squares, resistant-fit, or other algorithms
and may be strictly orthogonal or allow affine transformations.

**geodesic distance** - The length of the shortest path between
two points in a suitable geometric space (one for which curving
paths have lengths). On a sphere, it is the distance between two
points as measured along a great circle.

**geometric morphometrics** - Geometric morphometrics is a
collection of approaches for the multivariate statistical analysis
of Cartesian coordinate data, usually (but not always) limited
to landmark point locations. The "geometry" referred
to by the word "geometric" is the geometry of Kendall's
shape space: the estimation of mean shapes and the description
of sample variation of shape using the geometry of Procrustes
distance. The multivariate part of geometric morphometrics is
usually carried out in a linear tangent space to the non-Euclidean
shape space in the vicinity of the mean shape.

More generally, it is the class of morphometric methods that preserve complete information about the relative spatial arrangements of the data throughout an analysis. As such, these methods allow for the visualization of group and individual differences, sample variation, and other results in the space of the original specimens.

**great circle **- A circle on a sphere with a diameter equal
to that of the sphere. The shortest path connecting two points
on the surface of a sphere lies along the great circle passing
through the points. See geodesic distance.

**homology** - The notion of homology bridges the language
of geometric morphometrics and the language of its biological
or biomathematical applications. In theoretical biology, only
the explicit entities of evolution or development, such as molecules,
organs or tissues, can be "homologous." Following D'Arcy
Thompson, morphometricians often apply the concept instead to
discrete geometric structures, such as points or curves, and,
by a further extension, to the multivariate descriptors (e.g.,
partial warp scores) that arise as part of most multivariate analyses.
In this context, the term "homologous" has no meaning
other than that the same name is used for corresponding parts
in different species or developmental stages. To declare something
"homologous" is simply to assert that we want to talk
about processes affecting such structures as if they had a consistent
biological or biomechanical meaning. Similarly, to declare an
interpolation (such as a thin-plate spline) a "homology map"
means that one intends to refer to its features as if they had
something to do with valid biological explanations pertaining
to the regions between the landmarks, about which we have no data.

**Hotelling's $$ T^{2} **- See $$

**hyperplane** - A *k*-1 dimensional subspace of a *k*-dimensional
space. A hyperplane is typically characterized by the vector to
which it is orthogonal.

**hyperspace** - A space of more than three dimensions.

**hypersphere** - A generalization of the idea of a sphere
to a space of greater than three dimensions.

**hypervolume** - A generalization of the idea of volume to
a space of more than three dimensions.

**invariant** - An invariant, generally speaking, is a quantity
that is unchanged (even though its formula may have changed) when
one changes some inessential aspect of a measurement. For instance,
Euclidean distance is an invariant under translation or rotation
of one's coordinate system, and ratio of distances in the same
direction is an invariant under affine transformations. In the
morphometrics of triangles, the invariants of a particular transformation
are the shape variables that do not change under that transformation
(see the Orange Book).
See covariant.

**isometry** - An isometry is a transformation of a geometric
space that leaves distances between points unchanged. If the
space is the Euclidean space of a picture or an organism, and
the distances are distances between landmarks, the isometries
are the Euclidean translations, rotations, and reflections. If
the distances are Procrustes distances between shapes, the isometries
(for the simplest case, landmarks in two dimensions) are the rotations
of Kendall's shape space.
For triangles, these can be visualized
as ordinary rotations of Kendall's "spherical blackboard."

**isotropic** - Invariant with respect to direction. Isotropic
errors have the same statistical distribution in all directions
implying equal variance and zero correlation between the original
variables (e.g., axis coordinates).

**Kendall's shape space** - The fundamental geometric construction,
due to David Kendall, underlying geometric morphometrics. Kendall's
shape space provides a complete geometric setting for analyses
of Procrustes distances
among arbitrary sets of landmarks. Each
point in this shape space represents the shape of a configuration
of points in some Euclidean space, irrespective of size, position,
and orientation. In shape space, scatters of points correspond
to scatters of entire landmark configurations, not merely scatters
of single landmarks. Most multivariate methods of geometric morphometrics
are linearizations of statistical analyses of distances and directions
in this underlying space.

For additional entries, see part 2.

Revised Feb. 12, 2009 by F. James Rohlf