Morphological phylogeny with phangorn

Load packages

The package phangorn (Schliep 2011) (with the dependency ape (Paradis and Schliep 2019)) was used for this analyses (starting from a nexus file). If you start with a .csv file, dplyr (Wickham et al. 2022) and tidyr (Wickham and Girlich 2022) can be used for data manipulation.
set.seed is used to get reproducible results.

#install.packages("ape")
#install.packages("phangorn")

# for the latest development version, you can use:
#remotes::install_github("KlausVigo/phangorn")

#install.packages("tidyr")
#install.packages("dplyr")
library(ape)
library(phangorn)
library(tidyr)
library(dplyr)
set.seed(102)

Load data

from a .csv file

morpho_data <- read.csv("Matrix.csv")

After reading in the .csv file, the data has to be changed into the right format to be read as phyDat by phangorn.

morpho_data$species <- gsub(" ", "_", morpho_data$species)

morpho_data <- morpho_data %>% 
  mutate_if(is.numeric, as.character) %>%
  pivot_longer(cols = -species, names_to = "trait", values_to = "value") %>%
  pivot_wider(names_from = species, values_from = value) %>%
  select(-trait)

from a .nex (nexus) file

morpho_data <- read.nexus.data("morpho_matrix_full_names.nex")

`phyDat` format for subsequent analyses

For morphometric data, type="USER" needs to be selected and a vector reflecting the data has to be given to levels. Here we have the states 0 to 8.

morpho_pd <- phyDat(morpho_data, type = "USER", levels=as.character(0:8), ambiguity="?")

Parsimony Trees

pratchet from phangorn uses the parsimony ratchet (Nixon 1999). all=TRUE gives back all possible trees with the same parsimony score. k (minimum number of iterations when no change in the parsimony score is detected) around 50 is sufficient for this analysis. For unordered data method="fitch" (the default) is much quicker and gives back the same results as method="sankoff". The following acctran assigns branch lengths to the trees.

tree_pars <- pratchet(morpho_pd, k = 50, minit = 1000,
                     maxit = 100000, all=TRUE, trace = 0)
tree_pars <- acctran(tree_pars, morpho_pd)
tree_pars

## 14 phylogenetic trees

There are 14 trees with lowest parsimony score.
The outgroup is Collohmannia johnstoni, so we root the trees accordingly.

rooted_tree <- root(tree_pars, "Collohmannia_johnstoni", resolve.root=TRUE, edgelabel=TRUE)

To balance out the root, we write a function and apply it to all trees in the multiPhylo object.

pretty_root <- function(x){
  root <-  getRoot(x)
  ind <- which(x$edge[,1] == root)
  el <- sum(x$edge.length[ind]) / 2
  x$edge.length[ind] <- el
  x$node.label[1] = ""
  x
} 

for (i in 1:length(rooted_tree)){
  rooted_tree[[i]] <- pretty_root(rooted_tree[[i]])
}

We can plot the first tree with the bootstrap values (rounded to two digits) as node labels.

par(mar = c(1,1,4,1))
plot(rooted_tree[[1]], type = "phy", edge.width=1) 
title(main="Parsimony tree")
nodelabels(round(as.numeric(rooted_tree[[1]]$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()

We could also plot all trees, to get a quick overview (in this case without the node labels):

par(mar = c(1,1,4,1))
plot(rooted_tree, type = "phy", cex=.5, edge.width=2)

For this paper, we wrote the trees in Newick-format to edit them in FigTree (Rambaut 2018). All trees can be written into one file.

write.tree(tree_pars, file = "ParsimonyTrees.trees")

Maximum Likelihood Trees

For maximum likelihood trees, the function pml_bb from phangorn can be used. For unordered morphometric data, model="ER+ASC" should be used.

ER: equal rates, all transitions occur at the same rate
ASC: ascertainment bias correction (Lewis 2001) corrects for the fact that there are no constant sites (as is mostly the case with morphometric data); otherwise the branch lengths may be overestimated

We calculate one tree with a strict clock (method="ultrametric") and one with method="unrooted". Ultrafast bootstrapping (Minh, Nguyen, and Haeseler 2013) values are calculated with this function automatically.

# strict clock
ml_ULTRA <- pml_bb(morpho_pd, model="ER+ASC", rearrangement = "stochastic",
                       method="ultrametric",
                       control=pml.control(trace = 0))

# unrooted
ml <- pml_bb(morpho_pd, model="ER+ASC", rearrangement = "stochastic",
                 control=pml.control(trace = 0))

If we only want the tree object, we can get it with:

# strict clock
tree_ml_ULTRA <- ml_ULTRA$tree

# unrooted
tree_ml <- ml$tree

Rooting (or in the case of the ultrametric tree: rerooting) the tree can again be done with root and our function pretty_root, with the outgroup Collohmannia johnstoni.

# strict clock
rooted_ml_ULTRA_tree <- root(tree_ml_ULTRA, "Collohmannia_johnstoni", resolve.root = TRUE,
                             edgelabel=TRUE)
rooted_ml_ULTRA_tree <- pretty_root(rooted_ml_ULTRA_tree)

# relaxed clock
rooted_ml_tree <- root(tree_ml, "Collohmannia_johnstoni", resolve.root = TRUE,
                             edgelabel=TRUE)
rooted_ml_tree <- pretty_root(rooted_ml_tree)

Then we can plot the two trees.

par(mar = c(1,1,4,1), mfrow=2:1)
plot(rooted_ml_ULTRA_tree, type = "phy", edge.width=1) 
title(main = "Strict clock (ultrametric)")
nodelabels(round(as.numeric(rooted_ml_ULTRA_tree$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()

plot(rooted_ml_ULTRA_tree, type = "phy", edge.width=1) 
title(main = "Unrooted method")
nodelabels(round(as.numeric(rooted_ml_ULTRA_tree$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()

For this paper, we also wrote those trees in Newick-format, to edit them in FigTree (Rambaut 2018).

write.tree(tree_ml_ULTRA, file = "MLTree_ULTRA.tree")
write.tree(tree_ml, file = "MLTree.tree")

Bayesian inference

The bayesian trees were inferred with BEAST2 (Drummond et al. 2012; Bouckaert et al. 2019) using the morph-models package (Lewis 2001). The nexus-file was loaded into BEAUti and most of the default settings were used. Chain length was set to 50 million, trace and tree log every 1000. The random seed was set to 102. Again, we used two clock methods:

strict clock (Zuckerkandl and Pauling 1965)
relaxed clock (Drummond et al. 2006)

The trace was later inspected in Tracer (Rambaut et al. 2018) and the .trees files were summarized using the TreeAnnotator tool (maximum clade credibility tree with 0.5 posterior probability limit, median height nodes and 25% burn-in). For visualising FigTree (Rambaut 2018) was used.

Session Info

## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
## [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.0.10    tidyr_1.2.1     phangorn_2.11.1 ape_5.6-2      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9       highr_0.9        pillar_1.8.1     bslib_0.4.0     
##  [5] compiler_4.2.2   jquerylib_0.1.4  tools_4.2.2      digest_0.6.29   
##  [9] tibble_3.1.8     jsonlite_1.8.2   evaluate_0.16    lifecycle_1.0.2 
## [13] nlme_3.1-160     lattice_0.20-45  pkgconfig_2.0.3  rlang_1.0.5     
## [17] Matrix_1.5-1     fastmatch_1.1-3  igraph_1.3.4     cli_3.4.0       
## [21] yaml_2.3.5       parallel_4.2.2   xfun_0.33        fastmap_1.1.0   
## [25] stringr_1.4.1    knitr_1.40       generics_0.1.3   sass_0.4.2      
## [29] vctrs_0.4.1      tidyselect_1.1.2 grid_4.2.2       glue_1.6.2      
## [33] R6_2.5.1         fansi_1.0.3      rmarkdown_2.17   purrr_0.3.4     
## [37] magrittr_2.0.3   ellipsis_0.3.2   codetools_0.2-18 htmltools_0.5.3 
## [41] quadprog_1.5-8   utf8_1.2.2       stringi_1.7.8    cachem_1.0.6

References

Bouckaert, Remco, Timothy G Vaughan, Joëlle Barido-Sottani, Sebastián Duchêne, Mathieu Fourment, Alexandra Gavryushkina, Joseph Heled, et al. 2019. “BEAST 2.5: An Advanced Software Platform for Bayesian Evolutionary Analysis.” PLoS Computational Biology 15 (4): e1006650.

Drummond, Alexei J, Simon Y W Ho, Matthew J Phillips, and Andrew Rambaut. 2006. “Relaxed Phylogenetics and Dating with Confidence.” PLoS Biology 4 (5): e88.

Drummond, Alexei J, Marc A Suchard, Dong Xie, and Andrew Rambaut. 2012. “Bayesian Phylogenetics with BEAUti and the BEAST 1.7.” Molecular Biology and Evolution 29 (8): 1969–73.

Lewis, Paul O. 2001. “A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data.” Systematic Biology 50 (6): 913–25.

Minh, Bui Quang, Minh Anh Thi Nguyen, and Arndt von Haeseler. 2013. “Ultrafast Approximation for Phylogenetic Bootstrap.” Molecular Biology and Evolution 30 (5): 1188–95.

Nixon, Kevin C. 1999. “The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis.” Cladistics 15 (4): 407–14.

Paradis, E., and K. Schliep. 2019. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Bioinformatics 35: 526–28.

Rambaut, Andrew. 2018. FigTree V1.4.4. Edinburgh, Scotland: Institute of Evolutionary Biology, University of Edinburgh. http://tree.bio.ed.ac.uk/software/figtree/.

Rambaut, Andrew, Alexei J Drummond, Dong Xie, Guy Baele, and Marc A Suchard. 2018. “Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7.” Systematic Biology 67 (5): 901–4.

Schliep, K. P. 2011. “Phangorn: Phylogenetic Analysis in r.” Bioinformatics 27 (4): 592–93. https://doi.org/10.1093/bioinformatics/btq706.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. Dplyr: A Grammar of Data Manipulation.

Wickham, Hadley, and Maximilian Girlich. 2022. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.

Zuckerkandl, Emile, and Linus Pauling. 1965. “Molecules as Documents of Evolutionary History.” Journal of Theoretical Biology 8 (2): 357–66.

Morphological phylogeny with `phangorn`

Iris Bardel-Kahr

Last compiled on 17 Februar, 2023

Load packages

Load data

from a .csv file

from a .nex (nexus) file

`phyDat` format for subsequent analyses

Parsimony Trees

Maximum Likelihood Trees

Bayesian inference

Session Info

References

Morphological phylogeny with phangorn

Iris Bardel-Kahr

Last compiled on 17 Februar, 2023

Load packages

Load data

from a .csv file

from a .nex (nexus) file

phyDat format for subsequent analyses

Parsimony Trees

Maximum Likelihood Trees

Bayesian inference

Session Info

References

Morphological phylogeny with `phangorn`

`phyDat` format for subsequent analyses