version 3.5c

                       DNAPARS -- DNA Parsimony Program

(c) Copyright  1986-1993  by  Joseph  Felsenstein  and  by  the  University  of
Washington.  Written by Joseph Felsenstein.  Permission is granted to copy this
document provided that no fee is charged for it and that this copyright  notice
is not removed.

     This program carries out unrooted parsimony (analogous  to  Wagner  trees)
(Eck  and  Dayhoff, 1966; Kluge and Farris, 1969) on DNA sequences.  The method
of Fitch (1971) is used to count the number of changes  of  base  needed  on  a
given tree.  Other than that, the algorithm is a direct modification of program
WAGNER  (an  ancestor  of  MIX  which  was  formerly  in  this  package).   The
assumptions of this method are exactly analogous to those of MIX:

     1.  Each site evolves independently.

     2.  Different lineages evolve independently.

     3.  The probability of a base substitution at a given site is  small  over
the lengths of time involved in a branch of the phylogeny.

     4.  The expected amounts of change in different branches of the  phylogeny
do not vary by so much that two changes in a high-rate branch are more probable
than one change in a low-rate branch.

     5.  The expected amounts of change do not vary enough among sites that two
changes in one site are more probable than one change in another.

That these are the assumptions of parsimony methods has been  documented  in  a
series  of  papers  of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b).  For an
opposing  view  arguing  that  the  parsimony  methods  make   no   substantive
assumptions  such  as  these, see the papers by Farris (1983) and Sober (1983a,
1983b, 1988), but also read the exchange between Felsenstein and Sober (1986).

     Change from an occupied site to a  deletion  is  counted  as  one  change.
Reversion from a deletion to an occupied site is allowed and is also counted as
one change.  Note that this in effect assumes that a deletion N bases long is N
separate events.

     The input data is standard.  The first line of the input file contains the
number  of  species  and  the  number of sites.  If the Weights option is being
used, there must also be a W in this first line to signal its presence.   There
are  only  two options requiring information to be present in the input file, W
(Weights) and U (User tree).  All options other than W are  invoked  using  the

     Next come the species data.  Each sequence starts on a  new  line,  has  a
ten-character  species  name  that  must  be blank-filled to be of that length,
followed immediately by the species data in the one-letter code.  The sequences
must  either  be  in the "interleaved" or "sequential" formats described in the
Molecular Sequence Programs document.  The I option selects between them.   The
sequences  can  have internal blanks in the sequence but there must be no extra
blanks at the end of the terminated line.  Note that a blank  is  not  a  valid
symbol for a deletion.

     The options are selected using an interactive menu.  The menu  looks  like


DNA parsimony algorithm, version 3.5c

Setting for this run:
  U                 Search for best tree?  Yes
  J   Randomize input order of sequences?  No. Use input order
  O                        Outgroup root?  No, use as outgroup species  1
  T              Use Threshold parsimony?  No, use ordinary parsimony
  M           Analyze multiple data sets?  No
  I          Input sequences interleaved?  Yes
  0   Terminal type (IBM PC, VT52, ANSI)?  ANSI
  1    Print out the data at start of run  No
  2  Print indications of progress of run  Yes
  3                        Print out tree  Yes
  4          Print out steps in each site  No
  5  Print sequences at all nodes of tree  No
  6       Write out trees onto tree file?  Yes

Are these settings correct? (type Y or the letter for one to change)

The user either types "Y" (followed, of course, by a  carriage-return)  if  the
settings  shown  are to be accepted, or the letter or digit corresponding to an
option that is to be changed.

     The options U, J, O, T, M, and 0 are the usual ones.  They  are  described
in  the  main  documentation  file of this package.  Option I is the same as in
other molecular sequence programs and is described in  the  documentation  file
for the sequence programs.

     The O (outgroup) option will have no effect if the U  (user-defined  tree)
option  is  in  effect.  The user trees (U option) should be rooted bifurcating
trees.  The  T  (threshold)  option  allows  a  continuum  of  methods  between
parsimony  and compatibility.  Thresholds less than or equal to 1.0 do not have
any meaning and should not be used: they will result in a tree  dependent  only
on the input order of species and not at all on the data!

     Output is standard: if option 1 is toggled on, the data  is  printed  out,
with  the  convention  that "." means "the same as in the first species".  Then
comes a list of equally parsimonious trees, and (if option 2 is toggled  on)  a
table  of the number of changes of state required in each character.  If option
5 is toggled on, a table is printed out  after  each  tree,  showing  for  each
branch whether there are known to be changes in the branch, and what the states
are inferred to have been at the top end of the branch.  If the inferred  state
is  a  "?" or one of the IUB ambiguity symbols, there will be multiple equally-
parsimonious assignments of states; the user must work these out for themselves
by  hand.   A  "?" in the reconstructed states means that in addition to one or
more bases, a deletion may or may not be present.  If option 6 is left  in  its
default  state the trees found will be written to a tree file, so that they are
available to be used in other programs.

     If the U (User Tree) option is used and more than one  tree  is  supplied,
the program also performs a statistical test of each of these trees against the
best tree.  This test, which  is  a  version  of  the  test  proposed  by  Alan
Templeton  (1983)  and  evaluated  in a test case by me (1985a).  It is closely
parallel to a test using log likelihood differences due to Kishino and Hasegawa
(1989), and uses the mean and variance of step differences between trees, taken
across sites.  If the mean is more than 1.96 standard deviations different then
the trees are declared significantly different.  The program prints out a table
of the steps for each tree, the differences of each  from  the  best  one,  the
variance  of  that quantity as determined by the step differences at individual
sites, and a conclusion as to whether that tree  is  or  is  not  significantly
worse than the best one.

     The program is a straightforward  relative  of  MIX  and  runs  reasonably
quickly, especially with many sites and few species.

--------------------------TEST DATA SET---------------------------------

   5   13

-------- CONTENTS OF OUTPUT FILE (if all numerical options are on) --------

DNA parsimony algorithm, version 3.5c

Name            Sequences
----            ---------

Beta         ..G..C.... ..C
Gamma        C.UU.C.U.. C.A
Delta        GGUA.UU.GG CC.
Epsilon      GGGA.CU.GG CCC

One most parsimonious tree found:

     +--3  +--Delta
     !  !
  +--2  +-----Gamma
  !  !
--1  +--------Beta

  remember: this is an unrooted tree!

requires a total of     19.000

 steps in each site:
         0   1   2   3   4   5   6   7   8   9
    0!       2   1   3   2   0   2   1   1   1
   10!   1   1   1   3

From    To     Any Steps?    State at upper node
                             ( . means same as in the node below it on tree)

          1                AABGTSGCCA AAY
   1      2        maybe   .....C.... ...
   2      3         yes    V.KD...... C..
   3      4         yes    GG.A..T.GG .C.

   4   Epsilon     maybe   ..G....... ..C
   4   Delta        yes    ..T..T.... ..T
   3   Gamma        yes    C.TT...T.. ..A
   2   Beta        maybe   ..G....... ..C
   1   Alpha       maybe   ..C..G.... ..T