Attempted murder? Inexplicable accident? Either way, a PTA mom struggled for her life in an elementary school cafeteria, poisoned by a wolfsbane-laced smoothie at the fifth-grade graduation party. Now all eyes are on the accused, the victim, and a woman hired to look deeper. Ambitious defense attorney and single mother Allison Barton is anxious to escape the shadow of the low-down dog of a marquee partner carrying their renowned Virginia law firm. A win for her high-profile new client will give Allison the career she deserves.
Divergence-time estimation based on molecular phylogenies and the fossil record has provided insights into fundamental questions of evolutionary biology. In Bayesian node dating, phylogenies are commonly time calibrated through the specification of calibration densities on nodes representing clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. In addition, both approaches assume that fossil records of different clades in the phylogeny are all the product of the same underlying fossil sampling rate, even though this rate has been shown to differ strongly between higher level taxa.
We here develop a flexible new approach to Bayesian age estimation that combines advantages of node dating and the FBD model. In our new approach, calibration densities are defined on the basis of first fossil occurrences and sampling rate estimates that can be specified separately for all clades. We verify our approach with a large of simulated data sets, and compare its performance to that of the FBD model.
We find that our approach produces reliable age estimates that are robust to model violation, on par with the FBD model. By applying our approach to a large data set including sequence data from over species of teleost fishes as well as carefully selected fossil constraints, we recover a timeline of teleost diversification that is incompatible with ly assumed vicariant divergences of freshwater fishes.
Our instead provide strong evidence for transoceanic dispersal of cichlids and other groups of teleost fishes. In phylogenetic analyses, molecular sequence data are commonly used to infer not only the relationships between species, but also the divergence times between them.
Evidence for the existence of molecular clocks was initially derived from relative rate tests Sarich and Wilson and has since been corroborated by a large body of literature e. However, it has been shown that the rate of the molecular clock often differs between lineages Drummond et al. To allow the estimation of absolute divergence dates from sequence data, a calibration of the rate of the molecular clock is required.
This calibration can be obtained from serially sampled DNA sequences, if the range of sampling times is wide enough to allow accumulation of substantial genetic differences between the first and last sampling event Drummond et al.
Frequently bought together
This is often the case for rapidly evolving viruses Smith et al. However, for macroevolutionary studies aiming to estimate divergence times on the order of tens or hundreds of million years, other sources of Atlantic information are required. However, due to the incompleteness dating the fossil record, clade origin will almost always predate the preservation of its oldest known free. As a result, fossils can provide absolute minimum clade ages, but are usually less informative regarding the maximum ages of clades Benton and Donoghue Unfortunately, the optimal parameterization of these distributions is usually unknown but has been shown to have a strong influence on the resulting age estimates Free and Phillips In addition, the effect of inaccurate calibration densities can only partially be corrected with larger molecular data sets Yang and Rannala Other shortcomings of node dating have been identified.
As described by Heled and Drummondcalibration densities interact with each other and with the tree prior to produce marginal prior distributions of node ages that may differ substantially and in unpredictable ways from the specified calibration density. The application of recently introduced calibrated tree priors can compensate for this effect, but becomes computationally expensive when more than a handful of calibrations are used in the analysis Heled and Drummond Node dating has also been criticized for ignoring most of the information from the fossil record, as only the oldest known fossils of each clade are used to define calibration densities Ronquist et al.
Furthermore, node dating relies on the correct taxonomic asment of fossils to clades, and may produce misleading age estimates Atlantic fossils are misplaced on the phylogeny Marshall ; Forest ; Ho and Phillips As alternatives to node dating, two approaches have recently been developed.
The position of these tips is determined as part of the phylogenetic analysis, based on morphological character data that are required for all included fossils and at least some of the extant taxa Pyron ; Ronquist et al. The total-evidence approach is conceptually appealing as it is able to for uncertainty in the phylogenetic position of fossils, and allows a more complete representation of the fossil record than node dating.
Importantly, due to the requirement of a morphological character matrix, total-evidence dating is limited to groups that share sufficient s of homologous characters Grimm et al. It is assumed that the fossils that are ultimately sampled and dating in the study have been preserved along branches of the complete species tree following a constant-rate Poisson process.
Unlike in total-evidence dating, a morphological character matrix is not required to place fossil taxa in the phylogeny but can be used for this purpose; Wright et al. Instead, fossils are ased to clades through the specification of topological constraints. These implementations allow one to specify priors on fossil ages to for the often large uncertainties associated with them as well as the specification of time intervals within which rates are assumed constant, but between which they are free wright vary.
However, a limitation that remains also in newer FBD implementations is the assumption that all clades existing in a given time interval are subject to the same rates of diversification and fossil sampling.
Especially in higher level phylogenies, this assumption is unlikely to be met, as substantial clade-specific differences in these rates have been identified in many groups Foote and Sepkoski ; Alfaro et al. The FBD model further assumes that the fossils included in the analysis represent either the complete set or a random sample of the known fossil record of a clade.
Presumably as a consequence of these difficulties, node dating has remained popular despite its drawbacks, and was applied in all recent phylogenomic time-tree analyses of groups above the order level dos Reis et al. Here, we develop a new approach for Bayesian phylogenetic divergence-time estimation that is related to node dating, but infers the optimal shape of calibration densities from a combination of the first fossil occurrence age of a given clade and independently assessed estimates of the sampling rate and the diversification rates.
This approach, therefore, overcomes a major problem of node dating, the fact that calibration densities are often chosen arbitrarily despite their strong influence on the resulting age estimates Heath et al. In contrast to node dating, calibration densities in our approach are not directly applied to node ages, but to the age of origin of clades, and as a consequence, knowledge about the sister groups of calibrated clades is not required.
Our approach is suitable for time calibration of higher level phylogenies combining groups with different sampling characteristics, as the sampling rate can be specified independently for each clade. Using a wide range of simulations, we assess the optimal scheme by which to select clades for calibration, and we show that the application of CladeAge calibration densities can result in age estimates comparable or better than those produced with the FBD model if the input rate estimates are correctly specified and only the oldest fossil of each clade is used for calibration.
We use our new approach together with a large and partially new molecular data set for teleost fishes to address the long-standing question whether freshwater cichlid fishes from India, Madagascar, Africa and the Neotropics diverged before or after continental separation Chakrabarty ; Sparks and Smith ; Genner et al. Our strongly support divergence of freshwater fishes long after continental separation, implying multiple marine dispersal events not only in cichlid fishes but also in other freshwater groups included as out-groups in our phylogeny.
This is equivalent to assuming a uniform prior probability distribution for the age of the clade, which is justified for calibration densities, as these probability densities will be multiplied with a nonuniform tree prior at a later stage, during the divergence time analysis.
Thus, any nonuniform prior assumptions about the clade origin can be incorporated via the tree prior. Finally, all estimates of probability densities are scaled so that the total probability mass becomes 1.
The calculation of calibration densities, as described above, requires estimates of the fossil sampling rate, as well as of the speciation and extinction rates, which can be obtained externally, from the fossil record alone Silvestro et al. Examples demonstrating the shape of Cladeage calibration densities, based on exactly known A or uncertain ages of the first fossil record Bare shown in Fig. Exemplary CladeAge calibration densities.
Probability densities for the age of a clade for which the earliest fossil is known to be exactly myr old aor assumed to be between and myr old, with a uniform fossil age probability within this range b.
Customers who bought this item also bought
The gray area in b indicates the fossil age uncertainty. Four alternative calibration schemes for CladeAge calibration densities.
In scheme A, each fossil is used to constrain the age of origin of all clades for which this fossils represents the earliest record. Scheme D is similar to scheme C except that only the older one of two fossils in two sister clades is used as an age constraint.
The frequency distributions of binned waiting times are shown in gray, and CladeAge probability density distributions for the same settings are indicated with dashed black lines. The total of waiting times sampled is given in each plot.
Under the assumption of constant rates of diversification and sampling as well as a uniform prior probability for node ages, Cladeage calibration densities approximate the probability density for the age of a clade, given the age of the oldest fossil record of this clade.
These probability distributions are, therefore, suitable as constraints on clade ages in Bayesian divergence-time estimation. As scheme B would allow one or more of the clades to appear younger than the fossil itself, it seems reasonable to specify, in addition to the CladeAge calibration density for the most inclusive clade, the fossil age as a strict minimum age for the least inclusive clade when using this scheme.
Furthermore, if two sister clades both possess a fossil record, these fossils could be used to constrain the ages of both of the two clades.
However, as the ages of the two clades are necessarily linked by their simultaneous divergence, two time constraints would effectively be placed on one and the same node. However, in contrast to node dating, where maximally one calibration density is placed on each node, the model used to calculate CladeAge calibration densities considers each clade individually, and could thus be biased if the selection of clades for calibration is based on information about their sister clade. Figure 2a illustrates the four different calibration schemes. As CladeAge calibration densities approximate the probability densities of clade ages conditional on the age of the first fossil record of this clade, they are also expected to approximate frequency distributions of observed waiting times between the origin of a clade and the appearance of the first fossil record of this clade in a sufficiently large sample of simulated phylogenetic trees.
Since these waiting times can be sampled according to the above four schemes, we can determine the optimal calibration scheme by comparison of waiting time frequency distributions with CladeAge calibration densities.
Waiting time frequency distributions recorded from relatively young clades can be biased by the fact that only those waiting times shorter than the clade age can be recorded otherwise the clade did not preserve at all. This is because the root node represents the oldest node that can be constrained with fossils in these clades, and thus waiting times between the root and these fossils is recorded with both schemes A and B.
If further divergence events occurred between the root and the fossil, the root does not represent the youngest node that can be constrained with the fossil, and thus, the waiting time between the root and the fossil is not recorded with schemes C and D see Fig. Differences between schemes A and B become apparent with less strict clade age thresholds, when also clades are included that do not represent the oldest possible clade to be constrained with a given fossil.
Comparisons for all other tested clade age thresholds are shown in Supplementary Fig. However, for all but the youngest clade age thresholds, scheme A produces a frequency distribution that is virtually identical in shape to the distribution of CladeAge calibration densities.
This suggests that when CladeAge calibration free are used for time calibration, they should strictly be applied to constrain all clades for which a given fossil represents the first occurrence, even if the same fossil is used to constrain multiple nodes, and even if more than one constraint is placed on one and the same node.
To more extensively compare the performance of the four different calibration schemes A—D, we simulated phylogenetic data sets including fossil records and sequence alignments, and used CladeAge calibration densities to estimate clade ages in BEAST v. For comparison, we also used the same generated data sets to estimate clade ages with the Dating model implemented in the Sampled Ancestors Gavryushkina et al.
If the time units used in these simulations are considered to be million years, the sampling and diversification rates used here are comparable to wright found in empirical data sets Jetz et al. In separate sets of simulations, branch-specific substitution rates were modeled either with an uncorrelated molecular clock Drummond et al. The branch lengths and substitution rates were used to simulate sequence evolution of nucleotides according to the unrestricted empirical codon model of Atlantic Kosiol et al.
For each of the two clock models and each of the three sampling rates, we generated 50 replicate data sets. An example of a data set simulated with these settings is illustrated in Supplementary Fig.