Taille, Allometrie, et Elimination de la Taille en Morphometrie Multivariable

Paris, February 27 to March 1, 1995

ORGANIZERS: Group de Travail Morphometrie et Analyse de Forme, Museum National d'Histoire Naturelle.

SPEAKERS: C. P. Klingenberg (Edmonton), R. Reyment (Uppsala), N. Yoccoz (Lyon), J. B. Kazmierczak and M. Baylac (Paris).

The course was organized by M. Baylac, J.P. Hugot and C.P. Klingenberg and it was intended for French students and researchers (only a couple of non-french intruders managed to sneak in). Attendants were mainly interested in the use of morphometric tools for systematics, functional morphology and comparative anatomy.

The general idea of the course was to give a practical and theoretical review of the concept of allometry in biology and its implications for the study of size and shape variation. Methods for estimation of size and for the elimination of size factors in morphometric studies were thoroughly reviewed with numerous examples. Demonstration of software included SAS, NTSYS, SYSTAT, PRIMER (Reyment), and ADE (Yoccoz). A truly stimulating discussion on the always controversial matter of "Morphometrics and Phylogenetic Reconstruction," concluded the workshop.

Michel Baylac inaugurated the workshop by setting the boundaries of the topics examined, within the current offer of morphometric tools. The workshop dealt only with morphometric analysis using linear distances. I guess that decision was made primarily because of the still popular distance-based morphometrics. Baylac agreed that though geometric methods are the trend to follow on these days, there is still a wide range of applications where traditional morphometrics can give satisfactory results. Nonetheless, the topics reviewed were part of the basics in every morphometric study and thus they serve as well for geometric or distance oriented analysis.

Klingenberg introduced the concept of allometry in biology. He begun with Huxley's formulation of allometry based on the log regression model and the use of simple bivariate plots to show the relative growth of parts in organisms. Klingenberg made the distinction between the various levels of allometry: static, ontogenetic and evolutionary. This distinction was illustrated in one of his waterstrider studies, explaining how these levels of allometry should be analyzed and specially how to make comparisons between allometric vectors by looking at the angles formed between vectors. Another relevant application of allometric patterns concerns the study of heterochronic events. Klingenberg reviewed the connection between heterochrony and allometry in water striders concluding that current model connecting both concepts (processes) should be revised. He used a graphic model to show that size data cannot substitute for age in studies of heterochrony because of the allometric relationship between age and size of the traits studied.

After this introduction to the meaning and relevance of allometry in the study of morphological variation, Klingenberg described why the multivariate generalization of the bivariate plots used by Huxley to define allometric growth corresponds specifically to the first PC extracted form the log transformed covariance matrix. Using a very elegant set of graphics, Klingenberg made a geometrical dissection of PCA and its connection to regression in a simple bivariate case. The biplot was introduced as the most convenient tool for summarizing results of PCA geometrically. Though the method of PCA was something familiar to the participants, the beauty of Chris's explanations was his visual approach to the mathematical operations performed in the analysis. Any sort of call upon algebraic expressions was neatly avoided and data points were just points in bivariate plots, eigenvectors were drawn as real vectors in the plane and even covariance matrices were there without numbers, just represented by a grid with peaks of different heights expressing association (covariance) between variables. Very intuitive indeed.

Beyond common concepts of PCA, there was some insight into problems rarely seen in the books. To answer the question of how reliable our estimates are, Klingenberg explained the use of boostraping in the estimation of eigenvectors and eigenvalues. Bootstrap estimates of the standard errors of PC parameters are the only way to estimate their stability. This validation of PC results is seldom made in morphometric studies mainly because requires some degree of programming.

Because the goal of morphometrics is the study of size and of shape variation, one of the basic steps often required is the standardization for size. Various techniques for removal of size were reviewed during the course. Klingenberg explained the difference between removing isometric size (i.e., ratio corrections) and removing allometric size, the latter being more appropriate in morphometric data, simply because isometry rarely occurs. He recommended the popular Burnaby's method: projection of data into a space orthogonal to PC1. The resulting data matrix has one less dimension but with the advantage that subsequent analysis made with this data matrix can be interpreted as containing size free information.

Yoccoz discussed Mosimann's approach of transforming variables into log shape and log size variables. Log shape variables correspond to log ratios of variables relative to a given size variable (i.e., weight, max. length) whereas log size variables are constructed as simple logs of size variables (i.e., weight, max. length), sum of dimensions (i.e. length plus width), geometric mean of a series of variables, among other possibilities. Use of log shape variables is equivalent to isometric scaling and subsequent statistical analysis (MANOVA, CVA) produces also size free results. Kazmierczak presented the factorial logarithmic analysis, identical in spirit to Mosimann's log-shape ratios but this time introduced on a very general way, rooted in econometrics and showing its similarities with currency exchange rates! This lecture suggested different perspectives of Mossiman's approach. Michel Baylac showed his study about the correspondence between the results of applying different methods for size correction (including factorial correspondence analysis) to the same data set. Using a Procrustes superimposition of the ordinations obtained from each method, Baylac constructed a cluster diagram suggesting that Burnaby's procedures using Multiple Group PCA (MGPCA) or Common PCA (CPCA) show the closest similarity, whereas ordinations produced by correspondence analysis produced the most distant results.

Despite being widely used, standard PCA should not be applied when dealing with multiple groups because the model considers only one group in the data matrix. In the choice between the two methods available (MGPCA and CPCA) Klingenberg recommended CPCA because places fewer restrictions to the data; the only assumption being that angles of within group PCA should be parallel, whereas MGPCA requires equal within group covariance structure as well. The use of multiple group PCA becomes particularly critical for removing size form our data. In such a case, CPCA or MGPCA procedures should be used to find the size vector common to all the groups in the data set.

Reyment's lecture brought attention to the problem of outliers in the data, suggesting methods to detect influential observations and ways to test the robustness of PCA, CVA, and correspondence analysis. He explained Krzanowski's procedures for selecting influential variables and influential observations in PCA. Loyal to his reputation, Reyment devoted a great deal of his presentation to entertaining the audience with historic references to the etymology of morphometric jargon and the biography of the most famous statisticians that give names to the methods used today. He reviewed the basic concepts of Q and R mode of analyzing data matrices and some of the differences between Principal Factor and Principal Component analysis. On the last day, Baylac discussed briefly the connection between the concepts and methods used to study allometry using simple distances and using geometric morphometrics.

Some case studies illustrated the use of the techniques reviewed and their application to systematic and functional studies. Anne Marie Vachot showed the beauty and the difficulties of understanding the systematics of frogs from Madagascar. Christine Berge showed the functional implications of pelvic morphology in primates, including humans. Marc Delcorso showed his latest results in the analysis of human fossils using geometric morphometrics. Finally, Klingenberg showed a complete example of an ecomorphological analysis of allometric patterns in ten Antarctic fish species. The study described the evolutionary allometric patterns between species and its implications for the adaptation to benthic or pelagic life. While the relationships between ecology and morphology appeared strong in his example, there was no connection between morphological differentiation and phylogenetic history, suggesting that environment, rather than phylogenetics, played a major role in modulating the morphology of the fishes studied.

Software demonstrations were aimed to the implementation of the methodology reviewed during the course in some popular statistical packages, SAS, SYSTAT and NTSYS. The problem faced by users that want to do non standard transformations of their data (i.e. correction for size) is that manuals of statistical packages seldom contain instructions about these procedures (NTSYS maybe the only one). Then, the user has to invent offhand ways to solve this problem and that is not only dangerous but also very irritating and timeconsuming, to say the least.

For these reasons, participants were happy to see Klingenberg showing the use of SAS in this context. His presentation included PCA, CPCA, bootstrap estimates of PC parameters, angles between PC's, and projection of data into size vectors to get size-invariant scores that can be used in further analysis (ANOVA, MANOVA, PC of between group size-invariant dispersion matrix, ontogenetic scaling. . .). Most of the analyses were based upon routines of SAS IML written by Klingenberg, which unfortunately requires a moderate knowledge of SAS and IML to incorporate specific user needs.

Baylac described the way to use NTSYS for allometric analysis. NTSYS contains all the tools to do most of the applications reviewed and has the advantage over SAS of the simplicity of the user interface. It is easy to write batch files containing a series of instructions to link several modules of NTSYS. Baylac presented a set of ready-to-use NTSYS batch files that reproduced most of the analysis required for allometric studies. This set include Burnaby's method (single group and multiple group), Thorpe's Multiple Group PCA and Log Factor Analysis using double-centered data matrix. The batch files composed by Baylac are thoroughly commented and easy to modify for your own use. Reyment presented his own software that accompanies his "Primer of Multivariate Methods in Geology" available by FTP at the MORPHMET server. The package is an entire set of multivariate analysis covering most of the methods used in quantitative biology. As Reyment himself admits, the set of programs has been written with the aim of being practical, especially for geologists and little attention has been paid to the user interface or the output. This set of programs with the extensive documentation explaining thoroughly the mathematical and biological considerations of each method will certainly make excellent material for Biometry students.

The three-day course concluded with an open discussion on "Morphometrics and Phylogenetic Reconstruction" moderated by Klingenberg. The discussion was open to the public and attracted a number of researchers interested in phylogeny estimation in general, whether using morphological characters or not. Klingenberg introduced the subject explaining the problems with gapcoding morphometric scores into characters containing phylogenetic information and expressing polarity and character derivation. The discussion was very enthusiastic and there were many good points raised in favour and against phylogenetic inferences from morphometrics. Questions like: "what is a character?" and "what can be done when genotype and phenotype suggest different pylogenies?" raised vivid debate and everybody enjoyed it. Perhaps the general consensus was that thought phylogenetic inferences from morphometric data shouldn't be discarded "per se" there may be other data that is better suited for this purpose.

The discussion was a fine ending for a fine workshop. I think that most participants were happy with the course. The material covered in the course addressed common problems encountered by morphometricians and that is always welcomed. The lectures had the right balance between biology and statistics in most cases, thanks to a careful preparation of transparencies and graphic display. Most lectures were easy to follow or at least to get the feeling of which were the key issues one should be cautious about when using each method. Perhaps the weakest point of the course was the little time available for hands-on practice with computers. Most of the time devoted to computer practice was employed in software demonstrations, which let the attendants too tired to work with their own data. This was unfortunate since discussions over your own data are always most enlightening.

But don't think that participants got away without talking to each other. Not at all! On the evenings, organizers were smart enough to set up an ambush with the infallible bait of a varied assortment of national food products including, of course, local wine. This set the perfect stage for informal discussions about all the technicalities reviewed during the day.

In summary, the course fulfilled its purpose to address the issue of ". . . all that you have been doing with Principal Components and never understood very well why . . ." or something like that. Now we understand things much better, thanks to Michel Baylac, Jean Pierre Hugot and Chris Klingenberg.

Santiago Reig, Museo Nacional de Ciencias Naturales, CSIC, Jose Gutierrez Abascal 2, 28006 MADRID, Spain. Mustela@pinar1.csic.es