Things are More Complicated Than You Thought
No sooner have you understood the basic ideas, then you find things are more complicated than you thought. Such is life. Time to grow up....
MTDNA Mutations
The most common mutations are a change in a letter at a location to its complementary letter - A to G, G to A, C to T, or T to C.
Next most common are insertion of an extra letter. These are represented in your results by a dot, followed by the number of insertions. For example, 315.1C - meaning an extra C was inserted after 315C; or 315.2C - meaning two extra C's were inserted after 315C.
Deletions can also occur, where a letter in the sequence is missing. These are indicated by a dash, for example 16183-.
Occasionally the test results show 'heteroplasmy' - meaning the mtdna in your cells shows multiple results for the same position. This can happen because a single cell contains hundreds of thousands of mtdna copies, groups of which may have mutated and have different letters at the same location; and your results are based on the average for many millions of cells, some of which may all contain a certain mutation, while others are unchanged. These ambiguous results are indicated by the letters Y (C or T - example 16093Y); R (A or G - example 16034R); M (A or C - example 16183M); W (A or T - example 16189W); N (G or A or T or C - example 16192N). U, S, M, K, V, H, B, D, and X are also used to indicate other combinations of results for the same location.
Two Clocks Running at Different Speeds
The Coding Region (CR)
Scientists have identified two areas of the human mitochondrial DNA. One was dubbed the 'coding region' - positions 575 to 16000. The genetic material here is involved in the basic processes of cellular life. Therefore most mutations here would be damaging to the life of the cell. They would result in improper functioning of basic biochemical processes, ending with the death of the cell. If this was an egg cell from the mother, it would mean the death of any new life in the womb. Such mutations would not be passed on to descendants.
Nevertheless, there were some locations where mutations could occur and were passed on to the descendants. They were either neutral, or conferred some advantage or minimal disadvantage to the offspring. The changes here provided the long-range genetic 'clock' that allowed the descent of the mankind from 'genetic Eve' to be determined. The people of the earth were assigned to 'haplogroups' based on a tree of descent derived from these coding-region mutations being passed on to descendants farther down the tree. Out of the entire coding region, it was estimated that such mutations that were passed on occurred at a rate of around 0.03 changes per location per million years (range of various calculations 0.0126 to 0.0609).
The Highly Variable Regions (HVR)
The other area of the mtdna - called the D-region, the non-coding region, the highly variable segments (HVS), or the highly variable regions (HVR) - went from positions 16001 to 16568 (HVR1), and then from 1 to 574 (HVR2). This area was thought to only assist in aligning the mtdna during replication, and any mutations would be neutral and have no effect on the organism. Mutations that occurred here would not be removed due to bad effects on cell function, and therefore would accumulate at a much faster rate. The rate of mutations that are passed on in the HVR was estimated at around 0.5 changes per location per million years - more than 15 times faster than the coding region (range of various calculations 0.0865 to 1.7957).These could be used, it was believed, to study the descent within haplogroups, in some cases in genealogical time scales. These were the original basis for genealogical genetics.
Two Clocks, Two Speeds
At first one would think there would be far more HVR mutations than CR mutations, but this is not the case. While the retained mutation rate is 15 times lower in the coding region, there are also nearly 15 times more locations there than in the HVR. Therefore, using the averages indicated above, in 10,000 years one would expect to see in the coding region around 4.6 changes (15,425 locations x 0.03 changes per location per million years / 1,000,000 years x 10,000 years ). In the HVR region this would be 5.7 changes (1,141 locations x 0.5 changes per location per million years / 1,000,000 years x 10,000 years). Given the uncertainties in the estimates, these are nearly the same number!
The difference is that, with 15 times fewer locations, a particular mutation in the HVR is more likely to be flipped 'back' to its original value. Take haplogroup W as an example. The founding ancestor had a 16292T mutation. 10.5% of W's don't show the 16292T mutation - e.g. the location has mutated back to the 16292C of the Cambridge Reference Sequence. Over the many tens of thousands of years since ‘Genetic Eve’, this makes HVR useless for constructing the 'big' descent tree of mankind. But on the time scale within a haplogroup, the HVR can be nearly as reliable a guide as the CR changes. Often the number of coding region mutations from the putative ancestral sequence of the haplogroup is greater than the number of HVR changes. For example, the original study that included a number of complete sequences of the W haplogroup was done among Finnish W1 subjects. Many showed little variation in the HVR1 and HVR2 areas (zero or no changes from the 'defining' W1 haplotype) but two to five changes in the coding region.
One might expect the two clocks to be running together - e.g. on the average, the more coding region changes, the older the lineage, and the more the HVR mutations. But this does not occur at all due to the low number of average mutations (less than ten within most haplogroups that emerged after the last Ice Age). Within full-sequenced haplogroup W individuals, there is no correlation between the number of coding region changes and the number of HVR region changes.
What this also means is that, depending on your luck, your HVR1 or HVR2 result alone may provide a unique indicator to your ancestry. However it also may not – you may have a ‘vanilla’ result typical for your haplogroup or a mutation that is shared by several subgroups within the haplogroup. In such cases only getting an mtdna full genome sequence will clarify the situation.
Useful for Genealogy?
The mtdna clock is too slow to tell you, for example, that someone is your fifth cousin because of a difference in your mtdna results. On the average, even if you have the complete mtdna sequenced, there is only one change every millennium or so. If you only had HVR sequenced, that's two millennium. However having the same mtdna as someone else, if you've either had an FGS done or are lucky enough to have a distinctive HVR result, can be a great aid in genealogical research. For in that cases it may indicate that you may share a common immigrant ancestor. For example, all those with the ‘French W’ motif in North America seem to trace their ancestry back to Marie Marguerie, who migrated to Quebec in 1641. Another person, a Hungarian of German heritage, who’s ancestors were likely resettled there from Bavaria in the 18th Century, has a full FGS match with an American who’s ancestors came from a certain town in Bavaria. This makes it very likely that this is where her ancestors came from in the 1700;s, even without documentary evidence.
On a deeper level, your mtdna results can provide details on the prehistoric migrations of your ancestors. Were they the among the immigrants to Europe that brought agriculture to the continent 6000 years ago? Or others that brought horses and the wheel 5000 years ago? Or even later Slavs, Norsemen, Magyars, Ashkenazim Jews, who migrated into various parts of Europe in historical times? Your mtdna result may provide the answer!
HVR, HVS, and all that
Looking around the net, you'll quickly find that while FTDNA and others call the areas tested HVR1 and HVR2 (highly-variable regions 1 and 2), in other places HVS-I and HVS-II are mentioned (not even getting into HVR3 and HVS-III).
As with a lot of things involving science, it turns out to be messy ('exact science' = oxymoron). The definition of HVS can vary from paper to paper (and sometimes isn't even defined). HVR-1 extends from location 16001 to 16569, whereas HVS-I in older papers was 16090-16365 and in more recent papers was 16037-16518. Similarly, HVR2 is locations 001 to 574, whereas the older version of HVS-II was 68 to 263 and the later version is 74 to 300. The reason for the narrower ranges was to save money when conducting a large number of tests, and also to ignore pesky loci that seemed to never change or changed too readily ('hot spots'). The issue of 'hot spots' - loci to ignore because if you include them it makes your nice network of descent and relationship into a spaghetti-like diagram - is controversial. Which ones to ignore seems to vary from haplogroup to haplogroup. 16519 is widely seen as the most unstable of hot spots, but in haplogroup W it is as solid as a rock. In haplogroup W, location 119 seems to spring up everywhere in the diagram.
The bottom line - when comparing lots of results, you have to go to the 'lowest common denominator' HVS-I / HVS-II ranges, otherwise you won't be able to use a lot of the published data.
So now you know.
Comments? Corrections? Questions? E-mail me!