phangorn
The package phangorn
(Schliep
2011) (with the dependency ape
(Paradis and Schliep 2019)) was used for this
analyses (starting from a nexus file). If you start
with a .csv file, dplyr
(Wickham et al. 2022) and tidyr
(Wickham and Girlich 2022) can be used for
data manipulation.
set.seed
is used to get reproducible results.
#install.packages("ape")
#install.packages("phangorn")
# for the latest development version, you can use:
#remotes::install_github("KlausVigo/phangorn")
#install.packages("tidyr")
#install.packages("dplyr")
library(ape)
library(phangorn)
library(tidyr)
library(dplyr)
set.seed(102)
morpho_data <- read.csv("Matrix.csv")
After reading in the .csv file, the data has to be
changed into the right format to be read as phyDat
by
phangorn
.
morpho_data$species <- gsub(" ", "_", morpho_data$species)
morpho_data <- morpho_data %>%
mutate_if(is.numeric, as.character) %>%
pivot_longer(cols = -species, names_to = "trait", values_to = "value") %>%
pivot_wider(names_from = species, values_from = value) %>%
select(-trait)
morpho_data <- read.nexus.data("morpho_matrix_full_names.nex")
phyDat
format for subsequent analysesFor morphometric data, type="USER"
needs to be selected
and a vector reflecting the data has to be given to levels
.
Here we have the states 0 to 8.
morpho_pd <- phyDat(morpho_data, type = "USER", levels=as.character(0:8), ambiguity="?")
pratchet
from phangorn uses the parsimony
ratchet (Nixon 1999).
all=TRUE
gives back all possible trees with the same
parsimony score. k
(minimum number of iterations when no
change in the parsimony score is detected) around 50 is
sufficient for this analysis. For unordered data
method="fitch"
(the default) is much quicker and gives back
the same results as method="sankoff"
. The following
acctran
assigns branch lengths to the trees.
tree_pars <- pratchet(morpho_pd, k = 50, minit = 1000,
maxit = 100000, all=TRUE, trace = 0)
tree_pars <- acctran(tree_pars, morpho_pd)
tree_pars
## 14 phylogenetic trees
There are 14 trees with lowest parsimony
score.
The outgroup is Collohmannia johnstoni, so we root the trees
accordingly.
rooted_tree <- root(tree_pars, "Collohmannia_johnstoni", resolve.root=TRUE, edgelabel=TRUE)
To balance out the root, we write a function and apply it to all
trees in the multiPhylo
object.
pretty_root <- function(x){
root <- getRoot(x)
ind <- which(x$edge[,1] == root)
el <- sum(x$edge.length[ind]) / 2
x$edge.length[ind] <- el
x$node.label[1] = ""
x
}
for (i in 1:length(rooted_tree)){
rooted_tree[[i]] <- pretty_root(rooted_tree[[i]])
}
We can plot the first tree with the bootstrap values (rounded to two digits) as node labels.
par(mar = c(1,1,4,1))
plot(rooted_tree[[1]], type = "phy", edge.width=1)
title(main="Parsimony tree")
nodelabels(round(as.numeric(rooted_tree[[1]]$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()
We could also plot all trees, to get a quick overview (in this case without the node labels):
par(mar = c(1,1,4,1))
plot(rooted_tree, type = "phy", cex=.5, edge.width=2)
For this paper, we wrote the trees in Newick-format to edit them in FigTree (Rambaut 2018). All trees can be written into one file.
write.tree(tree_pars, file = "ParsimonyTrees.trees")
For maximum likelihood trees, the function pml_bb
from
phangorn can be used. For unordered morphometric
data, model="ER+ASC"
should be used.
We calculate one tree with a strict clock
(method="ultrametric"
) and one with
method="unrooted"
. Ultrafast bootstrapping (Minh, Nguyen, and Haeseler 2013) values are
calculated with this function automatically.
# strict clock
ml_ULTRA <- pml_bb(morpho_pd, model="ER+ASC", rearrangement = "stochastic",
method="ultrametric",
control=pml.control(trace = 0))
# unrooted
ml <- pml_bb(morpho_pd, model="ER+ASC", rearrangement = "stochastic",
control=pml.control(trace = 0))
If we only want the tree object, we can get it with:
# strict clock
tree_ml_ULTRA <- ml_ULTRA$tree
# unrooted
tree_ml <- ml$tree
Rooting (or in the case of the ultrametric tree: rerooting) the tree
can again be done with root
and our function
pretty_root
, with the outgroup Collohmannia
johnstoni.
# strict clock
rooted_ml_ULTRA_tree <- root(tree_ml_ULTRA, "Collohmannia_johnstoni", resolve.root = TRUE,
edgelabel=TRUE)
rooted_ml_ULTRA_tree <- pretty_root(rooted_ml_ULTRA_tree)
# relaxed clock
rooted_ml_tree <- root(tree_ml, "Collohmannia_johnstoni", resolve.root = TRUE,
edgelabel=TRUE)
rooted_ml_tree <- pretty_root(rooted_ml_tree)
Then we can plot the two trees.
par(mar = c(1,1,4,1), mfrow=2:1)
plot(rooted_ml_ULTRA_tree, type = "phy", edge.width=1)
title(main = "Strict clock (ultrametric)")
nodelabels(round(as.numeric(rooted_ml_ULTRA_tree$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()
plot(rooted_ml_ULTRA_tree, type = "phy", edge.width=1)
title(main = "Unrooted method")
nodelabels(round(as.numeric(rooted_ml_ULTRA_tree$node.label), digits = 2), frame = "none", adj=c(1,1), cex=.75)
add.scale.bar()
For this paper, we also wrote those trees in Newick-format, to edit them in FigTree (Rambaut 2018).
write.tree(tree_ml_ULTRA, file = "MLTree_ULTRA.tree")
write.tree(tree_ml, file = "MLTree.tree")
The bayesian trees were inferred with BEAST2 (Drummond et al. 2012; Bouckaert et al. 2019) using the morph-models package (Lewis 2001). The nexus-file was loaded into BEAUti and most of the default settings were used. Chain length was set to 50 million, trace and tree log every 1000. The random seed was set to 102. Again, we used two clock methods:
The trace was later inspected in Tracer (Rambaut et al. 2018) and the .trees
files were summarized using the TreeAnnotator tool (maximum
clade credibility tree with 0.5 posterior probability limit, median
height nodes and 25% burn-in). For visualising FigTree (Rambaut 2018) was used.
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
## [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.0.10 tidyr_1.2.1 phangorn_2.11.1 ape_5.6-2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 highr_0.9 pillar_1.8.1 bslib_0.4.0
## [5] compiler_4.2.2 jquerylib_0.1.4 tools_4.2.2 digest_0.6.29
## [9] tibble_3.1.8 jsonlite_1.8.2 evaluate_0.16 lifecycle_1.0.2
## [13] nlme_3.1-160 lattice_0.20-45 pkgconfig_2.0.3 rlang_1.0.5
## [17] Matrix_1.5-1 fastmatch_1.1-3 igraph_1.3.4 cli_3.4.0
## [21] yaml_2.3.5 parallel_4.2.2 xfun_0.33 fastmap_1.1.0
## [25] stringr_1.4.1 knitr_1.40 generics_0.1.3 sass_0.4.2
## [29] vctrs_0.4.1 tidyselect_1.1.2 grid_4.2.2 glue_1.6.2
## [33] R6_2.5.1 fansi_1.0.3 rmarkdown_2.17 purrr_0.3.4
## [37] magrittr_2.0.3 ellipsis_0.3.2 codetools_0.2-18 htmltools_0.5.3
## [41] quadprog_1.5-8 utf8_1.2.2 stringi_1.7.8 cachem_1.0.6