TPSREGR - Thin-plate splines regression analysis F. James Rohlf 6 July 1993 Department of Ecology and Evolution State University of New York at Stony Brook Stony Brook, NY 11794-5245 Phone: 631-632-8580 f.james.rohlf at stonybrook.edu The purpose of this program is to regress the shape of a collection of specimens (captured as coordinates of landmarks) onto an independent variable. The independent variable might be size in a study of allometry or it could be longitude, temperature, or any other variable of interest. The program regresses the partial warp scores (the weight matrix of partial warp scores) onto the independent variable and then plots a thin-plate spline as a function of the independent variable so that one can see the shape change associated with larger or smaller values of the independent variable. An alternative is to read in a vector giving an explicit linear combination to be used. The reference configuration must be supplied as a file (it can be computed by the GRF, GRF_ND, or TPSRW programs). Either the raw coordinate data (usually the most convenient way to use the program) or files giving the weight matrix, principal warps and their eigenvalues must be provided. The regression makes most sense if the reference configuration is such that it corresponds to the average of the independent variable. There are two versions of the program. One for DOS real mode and another for DOS protected mode (DPMI). Their use is identical except that the protected mode version requires that the RTM.EXE program be present (it will be loaded automatically in order to switch TPSRW into protected mode). Unless you are using software that provides DPMI services (e.g., Windows, 386Max, OS/2) the file DPMI16BI.OVL must also be present. The protected mode version requires an 80386 or 80486 computer and is able to use both ordinary RAM and extended memory so that larger datasets can be processed. It does not make use of overlay files. At present the DOS version can handle a maximum of 500 specimens and 100 landmark points. The DPMI version can handle 2000 specimens and 200 landmarks. No attempt was made to push these limits to their maxima. Please contact me if these limitations are a problem. Note: this version requires a new set of BGI graphics driver files. The program is still under development -- please be patient! To use the program: 1. Type its name at the DOS prompt:> TPSREGR 2. A menu will be displayed. The legal options at a given time will be shown highlighted. 3. First choose option 1 to specify the name of either the raw data file or the file containing the weight matrix (computed, for example, by the TPSRW program). If you provided a name for the data file then the program will not ask for the name of the weight matrix file. To provided the weight matrix leave the name of the data file blank. If you supply the name of a data file then you will be asked whether you would like the x,y-projections of the uniform component to be added in the weight matrix computed by the program. This is of interest if you would like to see the extent to which the uniform component can be predicted by the independent variable. If you supply a weight matrix (such as output by the TPSRW program), you will be asked whether the "retain affine" option was used. This is so the program can ignore the additional 6 columns added to the matrix. While regression of all the affine parameters might sometimes be of interest, it is not obvious how their effect should be plotted so they are ignored for now. All files must be in NTSYS-pc compatible formats. "NTSYSpc" format: The format is the same as used by the Fourier program in NTSYSpc. There can be comment lines, followed by a matrix header line, possibly followed by label lines, and finally followed by x,y-coordinates as in the "matrix" format described above. " fake data for 4 specimens (identical) with p=5 landmarks 1 4 10 0 1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 0.0 1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 0.0 1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 0.0 1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9 0.0 The program will ask for a name to be given to an output listing file. Various numerical results will be stored in this file. If a file already exists with the name you specify you will be asked whether to overwrite (and hence destroy) the old file or to append the new information to the end of the old file. Finally, the program will ask for the name of a file giving a list of pairs of landmarks to connect in output plots. This can sometimes make the plots easier to visualize. This input is optional. The input format is that for a graph matrix in NTSYS-pc. An example is as follows for 5 landmarks and 3 links (or edges of a graph) to be drawn. The value for the length of each edge does not matter but it must be provided. " Example of link matrix (type=7) for 5 landmark & 3 edges " to be shown. 7 5 3 0 1 3 0.0 3 4 0.0 1 4 0.0 4. Next, select option 2 to specify the reference configuration. It must be provided as an NTSYS-pc compatible file. An average consensus configuration can be computed by the TPSRW and GRF_ND programs. The file reference can be dimensions x landmarks, landmarks x dimensions, or strung out as a single array in the order: 1x, 1y, 2x, 2y, etc. for a total of 2p elements. If a weight matrix was provided (rather than a data matrix) then the program will ask what value was used for the exponential weight, alpha, in computing the weight matrix. The published accounts of relative warp analysis (Bookstein, 1991, or PMMW pp. 246-248) are equivalent to alpha = 1 (which corresponds to an inverse bending energy metric. It does not matter (for this program) what value of alpha was used. The reference configuration should correspond to a specimen that is average for the population. 5. Next choose option 3 or 4 to either read in the file containing the independent variable (the program will estimate the relationship with the partial warps by using regression) or to read a vector giving the weights for each partial warp (perhaps a discriminant function vector). In both cases the vector must be an NTSYS-pc file with only 1 row (or column) containing values. For option 3, its length must be the number of objects (specimens) in the data file. For option 4 the length must be equal to 2(p-3) if the uniform component is not added or 2(p-3)+2 if the uniform component is added to the weight matrix, where p=number of landmarks. 6. Next choose option 5 to perform the computations. If an independent variable was read and the sample size is larger than the number of parameters, then you will be given a choice of whether to: (L) regress W on X using least-squares (the usual case), (M) use major axis regression (PCA, see Biometry section 15.7), or (I) to use a multiple regression of X on W (but then the estimated relation has to be inverted, see the next paragraph). Enter the letter (L,M, or I) for your choice. Note: multiple regression cannot be used unless there are more observation than parameters being estimated. The menu choice will not be shown if that method cannot be used. The "inverse" is computed as follows. First, the usual regression is made of the variables in the weight matrix (W) onto the independent variable. Interpret the coefficients (ignoring the intercept) as defining a gradient vector through the centroid of the space (points in direction that fitted hyperplane is steepest). Project points onto this vector to determine the relationship between it and the independent variable. Use that relationship to predict the values in the weight matrix as a function of the independent variable. This often does not work very well. You may wish to try it just to obtain a significance test for the relationship between shape and the independent variable. If an independent vector was read, then you will be given two choices: (P) project the objects onto the vector and then use lease-squares to regress the partial warps onto the projection or (W) use the coeficients as is to weight the principal warps. Enter the letter (P or W) for your choice. Use (W) to visualize a particular warp or combination of warps and uniform component. The plot produced when the (P) option is used often does not correspond to particular warp you tried to select since other partial warp scores may be highly correlated with it in your sample of specimens. With the (W) option the scale is arbitrary. Be prepared to press the "+" or "-" keys many times in order to view the plot. The numerical results will be written to the listing file. Messages will be displayed on the screen showing progress through the computations. If the program runs out of memory you may only get a message that says "Out of memory!". 7. The last choice, "C", should be specified before you try to get hardcopy of the plots shown in the other menu choices. See the section "Graphics hardcopy" below for more information. 8. Next you can choose any of the plotting options (options 6 - 7). See below for information about each type of plot. 6 - Plot partial warp scores against the independent variable This menu item plots the partial warp scores for each specimen against the independent variable. If the uniform components are included then they can also be plotted (they are placed at the end). Note: this option cannot be selected if an independent vector is used. 1. Press the "+" and "-" keys to cycle through the partial warps. 2. Press "L" to toggle the labelling of the specimens. 3. Press "P" for graphics hardcopy. 4. Press the "ESC" key to exit. 7 - Plot regression as a spline This plot shows the thin-plate spline. It will cycle through displays to give an animated display of the spline being deformed for larger and then smaller values of the independent variable. 1. You can select the magnitude of the range of the independent variable by pressing the 'M' key followed by pressing the + and - keys. The value of X is displayed. When the correlations with the independent variable are very small you will have to greatly enlarge the range in order to see any effect. This is the default mode. The initial displayed range is half of the observed range. Pressing + will double this range. 2. If the uniform components are included then you can press the "U" key to toggle their contributions off and on. Likewise, you can press the "N" key to toggle the display of the nonuniform (local deformation) components off and on. A message will be displayed at the upper left of the screen to indicate their current status. 3. If you press "C" the landmarks for each specimen will be connected by a series of lines in the order in which the landmarks were entered. Press "C" again to turn this display off. 4. Press the "L" key to display labels for the points. Press it again to turn off their display. 5. Press the "V" key to display displacement vectors. Press it again to turn them off. These vectors are plotted only on the untransformed grid. The end points of the vectors are the locations of each point after the transformation that is about to be applied. The vectors are usually similar to the relative warp loading vectors. Sometimes they are quite different. 6. To print a copy of the graphics screen press "P". The program will then prompt for you to press either the "+" or the "-" keys. This allows you to specify whether to output the spline based on the positive or negative warping of the space. The screen will clear until the plot is complete. 7. Press the "E" key to toggle the plot of the line segments between pairs of landmarks on and off. Uses the link file if present. 8. Press the "R" key to reset the display back to the default. 9. For fast computers you can press the "D" key followed by the "+" key to increment (by 0.5 sec) the delay between succesimve displays. If you increase it too much you can decrease the delay by pressing the "-" key. The computer will beep if you attempt to reduce the delay below 0 (you cannot speed up the computations by having a negative delay!). 9. Press the ESC key to exit. ---------------------------------------------------------------- Configuration for graphics hardcopy A window will be displayed that lists the various devices and their modes. Another window will then be displayed that asks for a device or file name. If you would like the output written to a file for later use then enter a valid file name. The name should be short to allow for the fact that the program will append a number so that each picture can be stored in a separate file. To have the output sent directly to a printer attached to a printer port enter LPT1 or LPT2. For output directly to a printer or a plotter attached to a serial port enter COM1 or COM2. In the later case you must also specify the baud rate, parity, number of data bits, and whether or not to use XON/XOFF protocol. The available baud rates are: 300, 1200, 2400, 4800, and 9600. Parity can be N (none), E (even), or O (odd). The number of data bits can be 7 or 8. Use the symbol "X" to indicate XON/XOFF. These codes are entered after the port name. For example, for a plotter attached to COM1 and working with 2400 baud, no parity, 8 data bits, and using XON/XOFF enter the following: COM1,2400,N,8,X The following printers are supported: Epson 9-pin printers (including Epson FX and MX, IBM Graphics Printer and Proprinter, and Panasonic and OkiData ["native" or with Epson or IBM emulation]), Epson 24-pin printers (includes Epson LQ, NEC Pinwriter, and Panasonic printers with Epson emulation), and Toshiba P321 24-pin printer. The Epson 9-pin and 24-pin color dot matrix printers are supported. The HP LaserJet (all models), HP DeskJet (all models), and Canon LBP-8 laser and inkjet printers are supported. The following plotters are supported: HP7470, HP7475, and HP7585. Many other plotters are compatible with these plotters. If the plotting information is written to a file it can be read by many word processors, desktop publishing programs, and by graphics programs. In addition, you can select output formats of CGM, GEM IMG, PCX, WordPerfect WPG, and TIFF (both compressed and uncompressed). MS Windows bitmap files (BMP) are also supported. These are useful in order to import the graphics into various desktop publishing and "paint" programs where you can add annotations, delete unwanted details, etc. BGI files These are the files that provide the graphics support to the program. You only need to have the BGI files on your disk for the devices you expect to use. If you do not have the proper graphics BGI file you will not be able to see a plot on the screen. If the proper BGI file for graphics hardcopy is not present the program will exit back to the main menu without any error message. The correspondence between BGI files and devices is given below. Graphics adapters: _CANON.BGI Canon LBP-8 printer _CFX.BGI 9-pin color dot matrix _CLQ.BGI 24-pin color dot matrix _DIC.BGI Kodax Diconic printer _DJ.BGI HP DeskJet printer _DJC.BGI HP Color DeskJet printer _DMPL.BGI DM/PL plotters _FX.BGI Epson 9-pin printers _HP7470.BGI HP7470 plotter _HP7475.BGI HP7475 plotter _HP7550.BGI HP7550 plotter _HP7585.BGI HP7585 plotter _LQ.BGI Epson 24-pin printers _LJ.BGI HP LaserJet printer _LJ3R.BGI HP LaserJet III printer _OKI92.BGI Okidata 92 native mode _PJET.BGI HP paintjet _PP24.BGI 24 pin dot matrix _TJ.BGI HP ThinkJet printer For the above devices you will need to know how it is attached to your computer (printer or serial port). In the case of a serial port you will also need to know the baud rate, parity, number of data bits, and whether the XON/XOFF protocol is used. Graphics file formats: _AI.BGI Adobe Illustrator Postscript _BMP.BGI MS Windows bitmap files _CGM.BGI CGM files _DXF.BGI AutoCad _IMG.BGI GEM IMG files _PCX.BGI PCX paint file format _TIFF.BGI Compressed TIFF format _UTIFF.BGI Uncompressed TIFF format _WPG.BGI WordPerfect WPG files The BGI files whose names begin with "_" are part of the GRAF/DRIVE package from Flemming Software as is the GCOPY.EXE program that can be used to copy graphic files to a printer or plotter. Type GCOPY and instructions will be displayed. ---------------------------------------------------------------- Sample data files A set of data files are provided as an example. They are the rat calvarial growth dataset described on pages 408-414 in the "orange book" (Bookstein, 1991). The file RATS.NTS contains the x,y-coordinates in NTSYS-pc compatible format. Each specimen is a row of the matrix and the 16 columns correspond to the x and y coordinates for the 8 landmarks. The RATS.REF file contains coordinates that can be used as a reference and the RATS.SIZ file contains the centroid size of each specimen. The file RATS.LNK is an example of a matrix of edge links (not actually needed for this dataset since the landmarks were digitized in a logical order so that the "Connect" option will link them automatically. The file RATS.V1 provides and example of an independent vector that can be used to display the first principal warp. It is of length 10. Thus it assumes that the uniform components are not added. Use option (W) when computing the regression. Note: the specimens are ordered as in Bookstein (1991), specimen 1 ages 7 to 150 days, then specimen 2 ages 7 to 150 days, etc. ---------------------------------------------------------------- Output listing The program produces a rather modest listing file containing printouts of some of the matrices involved. Of most interest will be the listing giving the mean partial warp scores and the regression coefficients on the independent variable. It may be of interest to discover which principal warps are most correlated with the independent variable. If the first few warps are highly correlated then the effect of the independent variable is localized. If only the last few partial warps are highly correlated then there are large-scale effects. The uniform components have very large correlations with the independent variable. As one might expect, You may find that the three different regression techniques yield quite different coefficients. When the correlations are all very low you may have to greatly magnifiy the plot of the splines in order to see anything happening (if the correlations were all equal to zero then no amount of magnification would show any "action" in the plot). ---------------------------------------------------------------- Changes from previous version 7/6/93 Recompiled for BP7 and added support for DOS DPMI mode. Increased the size of datasets that could be processed.