MOVE - Interactive mixed method parsimony
(c) Copyright 1986-1993 by Joseph Felsenstein and by the University of
Washington. Written by Joseph Felsenstein. Permission is granted to copy this
document provided that no fee is charged for it and that this copyright notice
is not removed.
MOVE is an interactive parsimony program, inspired by Wayne Maddison and
David Maddison's marvellous program MacClade, which is written for Apple
Macintosh computers. MOVE reads in a data set which is prepared in almost the
same format as one for the mixed method parsimony program MIX. It allows the
user to choose an initial tree, and displays this tree on the screen. The user
can look at different characters and the way their states are distributed on
that tree, given the most parsimonious reconstruction of state changes for that
particular tree. The user then can specify how the tree is to be rearraranged,
rerooted or written out to a file. By looking at different rearrangements of
the tree the user can manually search for the most parsimonious tree, and can
get a feel for how different characters are affected by changes in the tree
This program is compatible with fewer computer systems than the other
programs in PHYLIP. It can be adapted to PCDOS systems or to any system whose
screen or terminals emulate DEC VT52 or VT100 terminals (such as, for example,
Zenith Z19, Z29, and Z49 terminals, Telnet programs for logging in to remote
computers over a TCP/IP network, VT100-compatible windows in the X windowing
system, and any terminal compatible with ANSI standard terminals). For any
other screen types, there is a generic option which does not make use of screen
graphics characters to display the character states. This will be less
effective, as the states will be less easy to see when displayed.
The input data file is set up almost identically to the data files for
MIX. The sole exception is that the user trees are not contained in the input
file, but in the tree file, which is used both for input of the starting tree
(if desired) and for output of the final tree. Note that this means that the
user tree supplied on input will possibly be overwritten. The W (Weights), A
(Ancestors), M (Mixed methods) and F (Factors) are possible options specified
in the input file (some must also be chosen in the menu).
The user interaction starts with the program presenting a menu. The menu
looks like this:
Interactive mixed parsimony algorithm, version 3.4
Settings for this run:
X Use Mixed method? No
P Parsimony method? Wagner
A Use ancestral states in input file? No
O Outgroup root? No, use as outgroup species 1
T Use Threshold parsimony? No, use ordinary parsimony
U Initial tree (arbitrary, user, specify)? Arbitrary
0 Graphics type (IBM PC, VT52, ANSI)? ANSI
L Number of lines on screen? 24
Are these settings correct? (type Y or the letter for one to change)
The O (Outgroup), T (Threshold), and 0 (Graphics type) options are the usual
ones and are described in the main documentation file. The L option allows the
program to take advanatage of larger screens if available. The X (Mixed
method) option is described in the Discrete Characters Programs docmentation
file. It requires information on the parsimony methods for different
characters to be in the input file. The U (initial tree) option allows the
user to choose whether the initial tree is to be arbitrary, interactively
specified by the user, or read from a tree file. Typing U causes the program
to change among the three possibilities in turn. I would recommend that for a
first run, you allow the tree to be set up arbitrarily (the default), as the
"specify" choice is difficult to use and the "user tree" choice requires that
you have available a tree file with the tree topology of the initial tree. If
you wish to set up some particular tree you can also do that by the
rearrangement commands specified below. The T (threshold) option allows a
continuum of methods between parsimony and compatibility. Thresholds less than
or equal to 1.0 do not have any meaning and should not be used: they will
result in a tree dependent only on the input order of species and not at all on
After the initial menu is displayed and the choices are made, the program
then sets up an initial tree and displays it. Below it will be a one-line menu
of possible commands, which looks like this:
NEXT? (Options: R # + - S . T U W O F C H ? X Q) (H or ? for Help)
If you type H or ? you will get a single screen showing a description of each
of these commands in a few words. Here are slightly more detailed
R ("Rearrange"). This command asks for the number of a node which is to be
removed from the tree. It and everything to the right of it on the tree is
to be removed (by breaking the branch immediately below it). The command
also asks for the number of a node below which that group is to be
inserted. If an impossible number is given, the program refuses to carry
out the rearrangement and asks for a new command. The rearranged tree is
displayed: it will often have a different number of steps than the
original. If you wish to undo a rearrangement, use the Undo command, for
which see below.
# This command, and the +, - and S commands described below, determine which
character has its states displayed on the branches of the trees. The
initial tree displayed by the program does not show states of sites. When
# is typed, the program does not ask the user which character is to be
shown but automatically shows the states of the next binary character that
is not compatible with the tree (the next character that does not perfectly
fit the current tree). The search for this character "wraps around" so
that if it reaches the last character without finding one that is not
compatible with the tree, the search continues at the first character; if
no incompatible character is found the current character is shown, and if
no current character is shown then the first character is shown. The
display takes the form of different symbols or textures on the branches of
the tree. The state of each branch is actually the state of the node above
it. A key of the symbols or shadings used for states 0, 1 and ? are shown
next to the tree. State ? means that either state 0 or state 1 could exist
at that point on the tree, and that the user may want to consider the
different possibilities, which are usually apparent by inspection.
+ This command is the same as # except that it goes forward one character,
showing the states of the next character. If no character has been shown,
using + will cause the first character to be shown. Once the last
character has been reached, using + again will show the first character.
- This command is the same as + except that it goes backwards, showing the
states of the previous character. If no character has been shown, using -
will cause the last character to be shown. Once character number 1 has
been reached, using - again will show the last character.
S ("Show"). This command is the same as + and - except that it causes the
program to ask you for the number of a character. That character is the
one whose states will be displayed. If you give the character number as 0,
the program will go back to not showing the states of the characters.
. This command simply causes the current tree to be redisplayed. It is of
use when the tree has partly disappeared off of the top of the screen owing
to too many responses to commands being printed out at the bottom of the
T ("Try rearrangements"). This command asks for the name of a node. The
part of the tree at and above that node is removed from the tree. The
program tries to re-insert it in each possible location on the tree (this
may take some time, and the program reminds you to wait). Then it prints
out a summary. For each possible location the program prints out the
number of the node to the right of the place of insertion and the number of
steps required in each case. These are divided into those that are better,
tied, or worse than the current tree. Once this summary is printed out,
the group that was removed is inserted into its original position. It is
up to you to use the R command to actually carry out any the arrangements
that have been tried.
U ("Undo"). This command reverses the effect of the most recent
rearrangement, outgroup re-rooting, or flipping of branches. It returns to
the previous tree topology. It will be of great use when rearranging the
tree and when a rearrangement proves worse than the preceding one -- it
permits you to abandon the new one and return to the previous one without
remembering its topology in detail.
W ("Write"). This command writes out the current tree onto a tree output
file. If the file already has been written to by this run of MOVE, it will
ask you whether you want to replace the contents of the file, add the tree
to the end of the file, or not write out the tree to the file. The tree is
written in the standard format used by PHYLIP (a subset of the New
Hampshire standard). It is in the proper format to serve as the User-
Defined Tree for setting up the initial tree in a subsequent run of the
program. Note that if you provided the initial tree topology in a tree
file and replace its contents, that initial tree will be lost.
O ("Outgroup"). This asks for the number of a node which is to be the
outgroup. The tree will be redisplayed with that node as the left
descendant of the bottom fork. Under some options (for example the Camin-
Sokal parsimony method or the Ancestor state options), the number of steps
required on the tree may change on re-rooting. Note that it is possible to
use this to make a multi-species group the outgroup (i.e., you can give the
number of an interior node of the tree as the outgroup, and the program
will re-root the tree properly with that on the left of the bottom fork).
F ("Flip"). This asks for a node number and then flips the two branches at
that node, so that the left-right order of branches at that node is
changed. This does not actually change the tree topology (or the number of
steps on that tree) but it does change the appearance of the tree.
C ("Clade"). When the data consist of more than 12 species (or more than
half the number of lines on the screen if this is not 24), it may be
difficult to display the tree on one screen. In that case the tree will be
squeezed down to one line per species. This is too small to see all the
interior states of the tree. The C command instructs the program to print
out only that part of the tree (the "clade") from a certain node on up.
The program will prompt you for the number of this node. Remember that
thereafter you are not looking at the whole tree. To go back to looking at
the whole tree give the C command again and enter "0" for the node number
when asked. Most users will not want to use this option unless forced to.
H ("Help"). Prints a one-screen summary of what the commands do, a few
words for each command.
? ("?"). A synonym for H. Same as Help command.
X ("Exit"). Exit from program. If the current tree has not yet been saved
into a file, the program will ask you whether it should be saved.
Q ("Quit"). A synonym for X. Same as the eXit command.
In the input file the W (Weights) option is available, as usual. The A
(Ancestral states) and X (Mixed methods) also require the options to be
declared on the first line of the input file and information to be present in
the input file. If the Ancestral States option is invoked the A option must
also be chosen from the menu. Note that the X option requires that the option
M (Mixture) be declared and the mixture information provided in the input file.
This is admittedly confusing. The F (Factors) option is available in this
program. It is used to inform the program which groups of characters are to be
counted together in computing the number of characters compatible with the
tree. Thus if three binary characters are all factors of the same multistate
character, the multistate character will be counted as compatible with the tree
only if all three factors are compatible with it.
ADAPTING THE PROGRAM TO YOUR COMPUTER AND TO YOUR TERMINAL
As we have seen, the initial menu of the program allows you to choose
among four screen types (PCDOS, Ansi, VT52 and none). If you want to avoid
having to make this choice every time, you can change some of the constants at
the beginning of the program to have it initialize itself in the proper way,
and recompile it. Among the constants at the beginning of the program you will
find three that determine which kind of screen graphics the program will use.
These constants are "ibmpc0", "vt520", and "ansi0". In the distribution
version of the programs, "ansi0" is set to true and the others to false, so
that the version will work with ANSI compatible terminals.
On the other hand if you have a terminal compatible with DEC's VT52, but
not with the ANSI terminal, you should change the constant "ansi0" to false and
"vt520" to true. If you have instead a terminal which is compatible with IBM
PC graphics, you should set the constant "ibmpc0" to true and the others to
false. If your terminal is compatible with none of these, you will have to
set the constants all false, in which case special graphics characters will not
be used to indicate character states, but only "*" for 1, "=" for 0, and "."
for ? states. This is less easy to look at.
The program should work successfully on DEC VAX systems under either the
VMS or the Unix operating systems without any other changes.
MORE ABOUT THE PARSIMONY CRITERION
MOVE uses as its numerical criterion the Wagner and Camin-Sokal parsimony
methods in mixture, where each character can have its method specified
separately. The program defaults to carrying out Wagner parsimony.
The Camin-Sokal parsimony method explains the data by assuming that
changes 0 --> 1 are allowed but not changes 1 --> 0. Wagner parsimony allows
both kinds of changes. (This under the assumption that 0 is the ancestral
state, though the program allows reassignment of the ancestral state, in which
case we must reverse the state numbers 0 and 1 throughout this discussion).
The criterion is to find the tree which requires the minimum number of changes.
The Camin- Sokal method is due to Camin and Sokal (1965) and the Wagner method
to Eck and Dayhoff (1966) and to Kluge and Farris (1969).
Here are the assumptions of these two methods:
1. Ancestral states are known (Camin-Sokal) or unknown (Wagner).
2. Different characters evolve independently.
3. Different lineages evolve independently.
4. Changes 0 --> 1 are much more probable than changes 1 --> 0 (Camin-
Sokal) or equally probable (Wagner).
5. Both of these kinds of changes are a priori improbable over the
evolutionary time spans involved in the differentiation of the group in
6. Other kinds of evolutionary event such as retention of polymorphism
are far less probable than 0 --> 1 changes.
7. Rates of evolution in different lineages are sufficiently low that two
changes in a long segment of the tree are far less probable than one change in
a short segment.
That these are the assumptions of parsimony methods has been documented in
a series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b). For an
opposing view arguing that the parsimony methods make no substantive
assumptions such as these, see the papers by Farris (1983) and Sober (1983a,
1983b), but also read the exchange between Felsenstein and Sober (1986).
At the beginning of the program are a series of constants, which can be
changed to help adapt the program to different computer systems. "nmlngth" is
the length of the species names. "screenlines" specifies the number of lines
per screen, which you will normally want to leave at its default value of 24.
I have already described the constants "ibmpc0", "vt520", and "ansi0" for
specifying the terminal type.
Below is a test data set, but we cannot show the output it generates
because of the interactive nature of the program.
-------------------------------TEST DATA SET----------------------------