The Problem with HVR

Home W Descent Tree W Sequence Lookup

The Problem with HVR

Scientists identified in the late 1990's two areas of the human mitochondrial DNA. One was dubbed the 'coding region' - positions 575 to 16000. The genetic material here was said to be involved in the basic processes of cellular life. Therefore most mutations here would be damaging to the life of the cell. They would result in improper functioning of basic biochemical processes, ending with the death of the cell, and if it was an egg cell from the mother, the death of the new life in the womb. Such mutations would not be passed on to descendants.

Nevertheless, there were some locations where mutations could occur and were passed on to the descendants. They were either neutral, or conferred some advantage or minimal disadvantage to the offspring. The changes here provided the long-range genetic 'clock' that allowed the descent of the mankind from 'genetic Eve' to be determined. The people of the earth were assigned to 'haplogroups' based on a tree of descent derived from coding-region mutations being passed on to descendants farther down the tree. Out of the entire coding region, it was estimated that such mutations that occurred and were passed on occurred at a rate of around 0.03 changes/site/million years (range of various calculations 0.0126 to 0.0609).

The other area of the mtdna - called the D-region, the non-coding region, the highly variable section (HVS), or the highly variable region (HVR) - went from positions 16001 to 16568 (HVR1), and then from 1 to 574 (HVR2). This area was thought to only assist in aligning the mtdna during replication, and any mutations would be neutral and have no effect on the organism. Since mutations that occurred here would not be removed due to deleterious impacts on the organism, they would accumulate at a much faster rate. These could be used, it was believed, to study the descent within haplogroups, in some cases in genealogical time scales. These were the original basis for genealogical genetics.

The rate of mutations that were passed on that occurred in the HVR was estimated in various studies at around 0.5 changes/site/million years - more than 15 times more often than in the coding region (range of various calculations 0.0865 to 1.7957).

Most scientific papers studying the deep descent of human populations used only the coding region mutations rather than the 'fast moving' HVR changes.

However, when one studies full-sequence mtdna results within a haplogroup, one finds that the number of coding region mutations from the putative ancestral sequence of the haplogroup is often greater than the number of HVR changes. For example, the original study with a number of complete sequences of the W haplogroup was done among Finnish subjects. These showed little variation in the HVR1 and HVR2 areas (zero or no changes from the 'defining' W haplotype) but two to five changes in the coding region.

One would expect the two clocks to be running together - e.g. on the average, the more coding region changes, the older the lineage, and the more the HVR mutations. But this does not occur at all! In fact, within full-sequenced haplogroup W individuals, there is no correlation between the number of coding region changes and the number of HVR region changes. There is however an upper limit on the combined number of changes. This indicates that the standard deviation of the number of changes is too large for a conclusion on the antiquity of a node to be concluded from HVR or coding region analysis alone. Taken together, however, they improve the chance of a meaningful result (all of these comparisons are adjusted to reflect changes from the ancestral type, not from CRS):

What's going on here?

The first point to be made is that while the retained mutation rate is 15 times lower in the coding region, there are also nearly 15 times more locations there than in the HVR. Therefore, using the averages indicated in studies above, in 10,000 years one would expect to see around (15,425 x 0.03 x 10,000 / 1,000,000 = ) 4.6 changes in the coding region and (1,141 x 0.5 x 10,000 / 1,000,000 = ) 5.7 changes in the HVR - essentially the same number!

This has important implications for descent trees constructed using only coding region, or only HVR, or only HVR1. Consider a descent tree where we have full knowledge of the real descent, as opposed to that inferred using analytic techniques:

Gray, Greenhill and Ross note:
To investigate the ability of phylogenetic methods to recover the true phylogeny in cases with known horizontal transmission, we have begun to analyze the phylogeny of football... Tree A shows the known history of football-type sports, and Tree B shows a phylogeny constructed from traits such as the presence or absence of scrums and fullbacks. The estimated tree is more similar to the true tree than would be expected by chance... However, the estimated tree does contain a few striking departures ("The Pleasures and Perils of Darwinizing Culture (with phylogenies)", by Russell D Gray, Simon J Greenhill and Robert M Ross, University of Auckland, New Zealand, 2007)
It can be seen that each level of detail provides an approximation, close to the truth, but which can be overturned entirely when further information is obtained.

A good example of this in the W phylogeny is the absolutely distinctive motif in HVR2 (143A - 192C - 194T - 196C). When one constructs a descent chart using only coding-region changes, this motif occurs at three different places in the W descent tree as clearly otherwise defined by coding region changes: at FI9495, UN0619, FI9460, and IN4018:


This remarkable sequence cannot be due to shared ancestry, since there are multiple distinctive coding region changes separating the individuals. They cannot be due to some common laboratory problem or contamination, since in 8000+ mitosearch individuals they occur only in W individuals, and the same motif has been detected in the HVR2 by a number of different test services. However the probability of the same four mutations occurring independently four different times is astronomical.

However if a descent tree is constructed from full-mtdna sequences giving greater weight to coding region mutations, but including HVR mutations, then the entire problem is resolved, and a much neater and more logical descent tree is inferred:

Occam's Razor therefore demands, that despite the 'purity' of using only coding-region mutations to deduce subgroups within a haplotype, HVR changes cannot be neglected.

The Importance of Geography

Often neglected in resolving problems in descent trees is the information provided by geography. Consider three possible descent trees for some of the major branches of part of the W3 subgroup:

In this tree, the ancestral origin of the subject's family is indicated (IT = Italy, UN = unknown, GB = Britain, FI = Finland, IN = India). We can see that perhaps that only Alternate C puts all Finnish W3's on the same descendent branch. This perhaps makes it more likely the correct tree as opposed to A or B, which puts one Finnish descendant on its own major arm of the tree, and the other two on another.

One final observation is that the coding region is not so stable and the HVR not so unstable as often depicted. In 8,612 HVR1 sequences uploaded to mitosearch.org, 52% of the HVR1 loci have never shown a mutation. In 4,015 cases where HVR2 was tested, 54% of the HVR2 loci have never shown a mutation. On the other hand, in 3,219 published full-sequences, 76% of the coding region has never shown a mutation (and therefore is 'not neutral'). This is not as great a difference as often portrayed.... and leads one to wonder if the 'neutrality' of many of these mutations is as great as normally assumed.


Comments? Corrections? E-mail Pausanias
My Internet Pages: Encyclopedia Astronautica The Cid Home Page F-20 Tigershark Home Page Wade Geneology W Haplogroup Home Page