Insertion-deletion polymorphisms (indels) as genetic markers in natural

BMC Genetics
Research article

BioMed Central

Open Access

Insertion-deletion polymorphisms (indels) as genetic markers in natural populations
?lo V?li1,2, Mikael Brandstr?m1, Malin Johansson1 and Hans Ellegren*1
Address: 1Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden and 2Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Tartu, Estonia Email: ?lo V?li -; Mikael Brandstr?m -; Malin Johansson -; Hans Ellegren* - * Corresponding author

Published: 22 January 2008 BMC Genetics 2008, 9:8 doi:10.1186/1471-2156-9-8

Received: 29 September 2007 Accepted: 22 January 2008

This article is available from: ? 2008 V?li et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background: We introduce the use of short insertion-deletion polymorphisms (indels) for genetic analysis of natural populations. Results: Sequence reads from light shot-gun sequencing efforts of different dog breeds were aligned to the dog genome reference sequence and gaps corresponding to indels were identified. One hundred candidate markers (4-bp indels) were selected and genotyped in unrelated dogs (n = 7) and wolves (n = 18). Eighty-one and 76 out of 94 could be validated as polymorphic loci in the respective sample. Mean indel heterozygosity in a diverse set of wolves was 19%, and 74% of the loci had a minor allele frequency of >10%. Indels found to be polymorphic in wolves were subsequently genotyped in a highly bottlenecked Scandinavian wolf population. Fifty-one loci turned out to be polymorphic, showing their utility even in a population with low genetic diversity. In this population, individual heterozygosity measured at indel and microsatellite loci were highly correlated. Conclusion: With an increasing amount of sequence information gathered from non-model organisms, we suggest that indels will come to form an important source of genetic markers, easy and cheap to genotype, for studies of natural populations.

Advancement in population and evolutionary genetic research has been accompanied by – or perhaps better phrased – been a consequence of continuous improvement in the way genetic similarity or dissimilarity between genomes is assessed. Seen in long time perspective, genetic marker methodology has evolved from focusing on phenotypes, via immunological parameters and proteins, to genotypes. Following their introduction to the study of natural populations about 15 years ago [1-3], microsatellites or short simple tandem repeats have been

the genotype-based marker approach of choice for many applications where the relatedness between individuals, populations or species is sought. Preceding and subsequently in parallel to this, non-repetitive DNA sequence variation has been assessed through various approaches, including DNA sequencing, restriction fragment length polymorphism (RFLP) analysis, single strand conformation polymorphism (SSCP) analysis, random amplified polymorphism detection (RAPD) and amplified fragment length polymorphism (AFLP) analysis [4]. More recently, single nucleotide polymorphisms (SNPs) are increasingly
Page 1 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

finding their application in studies of natural populations [5,6]. The benefits of microsatellites are several and well-known. They are multi-allelic, show high heterozygosity and are relatively easy to analyse at moderate cost. Because of the high polymorphism information content, a rather limited number of markers suffice for many applications in molecular ecology and population genetics. It is usually not too difficult to isolate the required markers from DNA libraries [7] or to employ markers originally developed for related species [8]. SNPs merit as genetic markers for other reasons. They are very common, with genomic densities outnumbering that of microsatellites by orders of magnitudes. Large numbers of individuals may be genotyped at large number of loci by simple and fast automatic methods, and data interpretation is usually straightforward [5,9]. Moreover, SNP variation at protein-coding genes and in other functionally constrained regions of the genome is likely to form the main genetic background to phenotypic variation. Furthermore biallelic SNPs evolve in a manner well described by simple mutation models. There are good reasons to believe that they in many cases will gradually come to replace the use microsatellites in molecular ecology and population genetics/genomics research [6]. Unfortunately, however useful, both microsatellites and SNPs suffer from some shortcomings. The complex and heterogenous mutation pattern of microsatellites [10] introduces ambiguities to further data analysis. Genotyping errors may occur because of stutter bands and technical artefacts (allelic dropouts, null alleles, false alleles, size homoplasy) [11]. As for SNPs, many more markers are needed to get the same amount of information [6,9]. Moreover, despite the many elegant genotyping methods available [9], most of them are relatively costly at small or medium scales, and requires special equipment for highthroughput genotyping. With a few years' lag phase, the introduction of new genetic markers to the study of natural populations has generally followed methodological developments made in the genetic analysis of model organisms [4]. Currently, there is an increasing focus on polymorphisms of the type short insertions and deletions (indels) in genomic research of humans [12,13] and model species such as Drosophila melanogaster [14] and chicken G. gallus [15]. Indels have been recognised as an abundant source of genetic markers that are widely spread across the genome, though not as common as SNPs. For instance, Mills et al. [13] used data from re-sequencing surveys to identify 415,436 indels segregating in human populations and they estimated that among the total number of >10 million polymorphisms known in humans, some 1.5 million

represent indels. Clearly, this indicates that indels could form a very common class of genetic markers also in nonmodel species and this is particularly so given that genetic diversity in many natural populations typically seems to be higher than in humans [5,6,16]. Most importantly, indels can be genotyped with simple procedures based on size separation. Another advantage is the minuscule chance of two indel mutations of exactly the same length happening at the same genomic position, meaning that shared indels can confidently been seen as representing identity-by-descent [cf. [17]]. In this study we present a test of the usefulness of indel markers in natural populations. We use a bioinformatics approach to survey dog shot-gun reads [18] for the presence of indels and based on this we design a pipeline for development of PCR-based indel markers. We subsequently genotype 100 indels in natural wolf populations and compare the results with data on microsatellite variability obtained from the same animals.


There are ≈100,000 shot-gun reads available from each of 9 different dog breeds, sequences data that come in addition to data obtained for the partial [19] or full genome sequencing [18] of two dogs. We surveyed 200,000 of these trace reads for the occurrence of short insertion and deletion polymorphisms, as detected by alignment against the reference sequence of one female boxer [18]. Note that there is essentially no sequence overlap among trace reads so the alignments were consistently in the form of only two alleles drawn from the population of dogs. In total, this yielded 30,116 length polymorphisms, corresponding to about one length variant every 2400 bp. Consistent with what has been found in other organisms cf. [12,13,15,20], the great majority of indels were very short with a dominance of 1-bp events (Table 1). From these polymorphism data we chose 4-bp indels for further analysis since they are easily scored by size separation and relatively abundant in the genome. We selected 100 4-bp non-repetitive indels located within unique sequence. They were spread across the canine genome and consistently represented autosomal loci; the great majority of them likely to reside in non-protein coding sequence. Of the 100, 94 could be readily amplified and scored and were selected for further analysis (Table 2). Using conventional genotyping based on fragment length separation in a DNA sequencing instrument, 81 out of the 94 putative markers were found to be polymorphic in a screening of 7 dogs and 76 of them were polymorphic in a global sample of 18 wolves (Figure 1A). As PCR primers were designed to generate amplicons of varying size within the 70-120-bp interval, combinations of multiplex reactions (three markers per PCR) were readily formed.

Page 2 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

Table 1: Number and density of indels found.

Indel size (bp) 1 2 3 4 5 6 7 8 9 10

Count 20558 3352 1942 2185 678 436 297 297 219 152

Density (indels per million bp) 284.6 46.4 26.9 30.2 9.4 6.0 4.1 4.1 3.0 2.1

This allowed simultaneous amplification, and consequently simultaneous genotyping within a single capillary, of several markers even using the same fluorofore (Figure 1B). In wolves, 74% of the polymorphic loci had a minor allele frequency of >10% and 49% of >20%. The average observed and expected heterozygosities were respectively 19.4% and 26.1% in wolves, while they were 26.8% and 35.5% in dogs. The distribution of wolf heterozygosities is shown in Figure 2. The 76 indels found to be polymorphic in the global sample of wolves were subsequently genotyped in 27 wolves from a Swedish population. Fifty-one loci were polymorphic and showed an observed mean heterozygosity of 25.3%, or 17.0% if including all 76 markers. The same wolves were also genotyped for a set of 20 microsatellites known to be informative in this population [e.g. [21]]. Expected heterozygosities for these loci ranged between 28–75%. There was a positive correlation between mean heterozygosity at indel and microsatellite loci in individual wolves (r2 = 0.41, P < 0.001; Figure 3).

markers in non-model species. It can be anticipated that this will come to change in the near future. There is a rapid increase in the number of genome sequencing initiatives and new sequencing technology, like "454-sequencing" [25], offers immense possibilities for generating massive amount of sequence data from hitherto uncharacterised genomes. Importantly, the depth of sequence coverage provided by new technology means that it is well suited for sequence analysis of pools of individuals, from which a wealth of polymorphism data can be obtained [26]. For example, if 100 Mb of sequence is generated from each of two individuals (with a 1 Gb-genome) in two megasequencing runs, and with an indel density of 1 every 2 kb in pairwise comparisons, several hundred indels are expected to be detected. Indel density has not been as well characterized in natural populations as nucleotide diversity. In domestic chicken, the pairwise heterozygosity for indels is 2 × 10-4 per bp [15]. In a natural population of collared flycatchers, Backstr?m et al. [27] found a similar occurrence of indels, 1–2 × 10-4 per bp. In this study we found about 30,000 indels in 7.2 Mbp of dog sequence, which translates into a heterozygosity of 4 × 10-4 per bp. This includes length variants in unique sequence as well as in repetititve DNA, like microsatellites. Using a similar search algorithm and a similar type of shot-gun vs. genomic reference data set for chicken, we recently found that about half of all length variants detected in this way represent tandem repeats [15]. This would suggest that in dogs, the heterozygosity for short non-repetitive indels is about 2 × 10-4 per bp, similar to chicken. Moreover, the length distribution of dog indels (Table 1) show congruence with such data from chicken. The Swedish wolf population was functionally extinct by the 1960s–1970s but has subsequently recovered to a current size of well over 100 individuals [28]. All contemporary Scandinavian wolves are thought to originate from only three founders, that were eastern immigrants arriving to Sweden around 1980 and 1990, respectively [21]. The strong bottleneck, subsequent inbreeding and the associated loss of genetic diversity experienced by this population [21,29], give the opportunity to test the utility of indel markers in a small and endangered natural population. The finding that about 50% of in silico predicted indels from pairwise sequence comparisons of dog alleles is informative in this wolf population confirms the usefulness of indel markers even in a population with limited genetic diversity. The mean heterozygosity of the 51 polymorphic indels within the Scandinavian wolf population (25%) is somewhat lower than what was been observed for 21 SNPs (34%) in the same population [29]. However, those SNPs

Our study shows the feasibility of using large-scale genomic sequence data for extracting putative insertion and deletion polymorphisms, marker loci subsequently can be validated to represent informative genetic markers at a population level. It also demonstrates the feasibility of transfer of genomic data from a model species to a natural population of a close relative. Dogs were domesticated from wolves 10,000–100,000 years ago [22-24], and their divergence has since then been accentuated by strong artificial selection during domestication. Finally, by genotyping of indels and microsatellites in the same wolves it also shows that polymorphism levels of the two marker types are highly correlated. A lack of large-scale genome sequence information has up till now hampered the introduction of indels as genetic

Page 3 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

Table 2: Location of selected indel-markers (position on the respective chromosome in the dog genome assembly), primer sequences, amplicon length and expected heterozygosities from the genotyping of 18 wolves from worldwide.

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

Chromosome 23 21 5 6 29 15 12 30 7 26 37 14 17 28 25 34 20 1 23 35 22 30 10 22 24 31 3 26 1 13 10 5 3 2 5 1 14 38 6 4 1 19 12 16 14 8 5 7 1 14 7 2 5 34 12 20 30 13


Primer F

Primer R gccttgttggtttcagtggt ttggattaaccctaccacacg tctctgtgtgcctctcatgaat ctaggatgagagcccagctt gaactatccttaaatagaaccaatgc acccgggagtttgcctatac ggaccatgctgtggatctg tttccaaggtcccaccacta ttcctgtgggcataataatca ggaattgatttactgatagtgagatg tctgtgctcttcactggaaaaa gaatgaaatcatggaagagcaa tgaactaccctcgtgatcca ttctccttttagaccctttgtca tggtgctctttcttgttgga gctttggtattgttgattctattgtaa tgggagttctggctccac ccaagattgtgcatgtcagg tacatggtccctgtgttcca ttcccttaagaaataggcagagg tggatgttaaaaacctggtatattgt tttccaaggtcccaccacta catgtcatagtcacatgctgtacg gctgaaggaaatatctgttgaatg tgagggggatttgatctctt ggatgcaagaaaatctgctg gacccaggtggggatatcta aaagggtgatggtcctttga agcgaaaagtggcagtgg cacaatggcagaacacgag ctcaggcaggcaaataaaaa gctaaggaaagcaagctgga attcaagtgtgcccgagag cgaatgcgtgcttaccg ccatccctgagccacct caggttcttgtttccccaaa ggctcatgctgctctgg tgcccatgtaccaaatgaa tgcttcctggacatttgga gcccttgtcatccactagga ccttcggagcccatgc gagcagaggtgaggctgaa accacgtagtcttgacccattc caatgagtgaagggggtcag gcaggactgtctggaggttg tctcattgtggagcaaagacat tgtgagaaactccattgcctta tccctcatttcacaagctga ggccagctcttcttgttgag aggacccaagtggattctga gcctcatgccaatgagagac tggttgcagggaagattagg tgcagtatttagggtggagga tggctctgaatttaggcattt tgcagacaaatggactgaaga tcaagtgcaagtcaccaaact catcagcatttccagagttctt tttagatgggagggaatggtt

Fragment length (bp) 126 117 123 139 111 125 115 116 136 130 122 115 139 90 169 90 139 104 81 92 164 92 151 102 69 80 88 98 68 78 90 100 67 80 91 100 110 121 110 121 106 120 68 69 68 75 74 75 78 83 85 95 94 94 98 100 100 108

He 0.43 0.26 0.00 0.00 0.29 0.20 0.17 0.24 0.48 0.39 0.18 0.30 0.37 0.32 0.36 0.44 0.00 0.36 0.51 Excluded 0.37 0.29 0.32 0.16 0.14 0.43 0.06 0.09 0.51 0.06 0.13 0.50 0.51 0.07 Excluded 0.42 0.00 0.51 0.14 0.00 0.42 0.00 0.30 0.00 0.31 0.51 0.21 0.19 0.00 Excluded 0.12 0.34 0.43 0.48 0.44 0.43 0.26 Excluded

35377868 ccaggcttgtgtgaagctct 44954960 tgtcatttggccagatctctaa 56432744 catgctgcttgaagtgcaata 31824728 cacaatgaccacttattaaagattaca 17604384 tgtcaggtttcatatccttttgtg 52623379 ttcacatccatctgtcttgga 4364137 ctcctgttccctccagca 32665884 agaccagggtctgaatttgc 52754426 ttcacaaattgctatacctaaaaatg 32666193 tccaagaacaaagaagtaatgtaaaa 18098142 gaaaggtccctctgaattgaa 38060282 gtgtgctctaggggccatt 46184901 gaagggacaaaaccttggaa 38785954 aaaggagggcttgcagtttt 8846761 tgccttagcgttggcatt 7543229 caggagcaaagtaagggtaatca 21505420 aatggggacaccagtcactt 32746664 tcctgcggcagtttgg 42171234 caaaggcaagaaggcagatg 25523850 ttagcgatgttgagcgttttt 41896467 tgcaaaggagtgggaattatc 32665884 ccaagccccttccaatacta 55417044 tgctttgcatgttacattcttca 20849044 tattgctgccctgtttcaga 40849741 ctgcggtctcacatccttag 8148287 tctgctcaggtttagccttg 67261495 ttactcccagctctgtgcat 37097342 tgcccccactactcttgc 109880549 tgttgagcccttgaaatgag 20047698 tggctgccccatcttatg 13999933 gccttcttcctctgcctct 6092689 gcttgggaaatcatggtca 52499600 tctgactggcctccttcg 87005110 gccgccgtgtcttgtc 16794632 cgatgctggtgaggaagc 40637985 aagggccgatgccagt 16593885 cccaggtgccccttattt 3128143 gcttcccttgtttctttcca 78974466 gtagggcaagcggcaag 68490768 ttgcttgggaacatggag 41518626 cctggtgcaggttgcag 29399024 caggacacttgcaccagatt 27463695 cagtagccaaattgtggaagc 10834862 gttcccttctcagaggacca 38539141 gagtggcacacgagcactt 60840667 tgcctgagggagctgtatatg 36083663 gctttgttgtaagcagcgata 53313567 agaaggggcagacttgagg 88201810 tggctcattgatttgtgattct 61925547 gggttctctagggagatgacaa 17212546 gtcatggtgacatcgcagtt 67041786 gatggccgattgtacatcaa 3965764 ccgtctagttgtcgggtgtt 24388648 cccttgtaaaggggaggaga 23053279 agctctcctgctgtgattttt 29687857 tgagcacgaagaggtagagaag 37964459 cgtgaatggtccaaaatgat 42973844 tttctgggcaaaaacagtga

Page 4 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

Table 2: Location of selected indel-markers (position on the respective chromosome in the dog genome assembly), primer sequences, amplicon length and expected heterozygosities from the genotyping of 18 wolves from worldwide. (Continued)

59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

5 4 20 7 14 26 4 17 31 10 10 5 5 11 19 30 19 3 4 11 36 14 27 6 36 32 17 5 19 29 5 7 31 16 5 3 21 12 14 3 30 7

91665974 43098715 40585796 8798933 23314695 35769409 56495029 19537687 26928202 66801300 63275151 58727403 29583832 25068367 54304531 39573069 55236292 53761240 90768017 57430384 32128918 49479392 13108059 61627505 14699428 36108071 57586353 16750608 43306776 43945653 31951980 50783779 10957463 6629437 75136583 89213750 51457175 53379600 58723304 11002853 30015918 65831405

accttctgtttcccctttgg gtttacaccacccagcctga cccatccctgaggaagagag cccagaaaacaagagaaggaaa acccaccagattggctaaaa cctggcttaaccccttacct tcccactgtagcttgaaaacg atgctttgcagagatgcttg atgaacaagcaccccaaaac ccccaaattaagggaagttca tccagaacttggagagaatcaa ctggaccacatatggcttga ccacacagtgttgcctgtagt cccctgcttgtgctctctc aagccttgcacttgagcttg gcttttgtcatgaaacaccaa gcccacagggtctttattttt ctgtggtgcagcggtttag gttctccctgtgtctgactgtt agggacagacccactaagtgtc tgatatagccgaagtcaggaag tttcaaattgctgaatgttgg cagccaattggacacaaaaa gtcatgatcccatggtccta gtgttttcttttgggcaagg cccagtgttggtcacatataca gtgattagggttgaggggaga aattttccaggaggctttgg tggggtaaatcagtgagtgaag aacccgaataacattaggagga cccagaaatccacttaatgacc ggttagcttagctcctctccaa ttggcaactgccttacaataaa gaaatgggaaggttttattcca aggacacagacagatgtgagga gagaacttcgatgtgagggaat aggcattcagggtgttaaaaa ttcccaaggagttggagaga ccctgtggtaaccatcaacc ccactggccctaagtgactg tcaggttttggatttgaagga gaggttcaaatttcccatatcc

agagcgcagcagagatgact aggcagttacaggcattaatca ggacacccgcatttctgtc tgttggcagatctcatggtc tcgaacaggtccagtttacatt aaccgctatcccacattctg ttcaggtattgctgtccaaaaa tgctcatgtcagaacagagagg cagtccacttcaatgcacca tccatgaccaaatctgcatc tccaggcaagactatgagca tgccggagagatacgtgtaa ggaaagcacagaaagttgtgaa ctctcagcaacccacagagat ccattggaaaggcacgtact ccaccagatgctcaagtctg gcttgggtctcttcctctctc ggagcctgacatgggactt aaagaccaaggggtgaaaga tccttaggcgacatggagac aatggaaagatcatccagacag ccctggtctccaggatcac aaaatcaagatgtcagcagagg gtagaggggcagagggagag tgcaaccaacacacagatga gcgatgaaaattgggaaaga ttttccatcaaggtttgtcca ttccctcctgctgatctagg ggcattaagagaagcctgctg tgggtttaagctggttacgg tgtgttaccagggctaggttc agccacatgctgaaaggaag ttgaatgtggacatgaaacaaa ggtgctgacaacagaaaacct ccgaagaggaatctgcactc ctctcccaccaaaaatctcct tggtgaactggaaagtagctga gctgagggcagctgtgttat caaagtgaacaagcaaagcaa ttagggttttaaaggctgtgc taagcacaaccattagctcca gggatccatgcaaaatagttc

103 109 114 114 116 96 95 94 68 70 70 72 73 75 77 79 78 78 84 86 89 89 88 88 92 96 97 98 99 104 104 104 106 109 111 113 108 68 67 70 72 73

0.00 0.40 0.12 0.00 0.26 0.00 0.51 0.31 0.27 0.12 0.30 0.00 0.49 0.49 0.27 0.48 0.33 0.00 0.13 Excluded 0.40 0.47 0.00 0.00 0.49 Excluded 0.42 0.06 0.07 0.51 0.44 0.00 0.47 0.46 0.00 0.29 0.51 0.00 0.51 0.06 0.06 0.37

were initially identified from a screening of a limited number of Scandinavian wolves so there was an ascertainment bias in favour of markers with high polymorphism information content. Generally, for those indels and SNPs that represent neutral markers, there should be no reason to believe that heterozygosity for polymorphic loci differs between the two marker categories. Indels in coding sequence are likely to more often be deleterious than point mutations, at least indels that cause frame shift mutations, which should act as to reduce their diversity due to negative selection. On the other hand, point mutations in coding sequence may potentially more often than indels be subject to positive selection, which also reduces diversity. In any case, although probably comparable to

SNPs, indels do show less variation than microsatellites. Thus, to obtain the same resolution power in relatedness analyses, a higher number of biallelic markers are needed compared to multiallelic microsatellites [30-32]. However, the rich abundance of indels in genome sequence surveys and the ease by which they are genotyped (Figure 1a) and multiplexed (Figure 1b) add to their benefit. Moreover, it is possible to design microarrays specifically for short indels, by which genotyping costs become very low [32].

With an increasing amount of sequence information gathered from non-model organisms, we suggest that indels

Page 5 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

Figure 1 zygote and (lower a 4-bp indel locus in wolves showing (upper panel) a homozygote for the longer allele, (mid panel) a hetero(a) Genotyping of panel) a homozygote for the shorter allele (a) Genotyping of a 4-bp indel locus in wolves showing (upper panel) a homozygote for the longer allele, (mid panel) a heterozygote and (lower panel) a homozygote for the shorter allele. (b) Multiplex amplification and simultaneous genotyping in a single capillary of five indel markers in one individual heterozygous for all these markers. The long and short alleles of marker 1– 5 are labelled. All markers show some form of extra fragments that likely represent PCR artefacts. These may either be shorter (marker 2) or longer (marker 3–5) than the amplified allele, alternatively both shorter and longer (marker 1).

will come to form an important source of genetic markers, easy and cheap to genotype, for studies of natural populations.

Samples Genomic DNA was extracted from wolf tissue samples using standard phenol-chloroform extraction protocols or

the DNEasy Tissue Kit (Qiagen). Altogether 18 samples, from Sweden (5), Finland (3), Spain (3), Russia (2) and Canada (5), were used to test the amplification ability and to get a first idea of polymorphism of indels. Seven domestic dog samples from different breeds (Dachshund, Dalmatian, Gordon Setter, Greenland Dog, Lakeland Terrier, Pyrenean Mountain Dog, Welsh Corgi) were also added. Subsequently, we tested the ability of indel mark-

Page 6 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

Figure 2 genotyped in 18 wolves heterosygosities at 94 indel loci Distribution of observedfrom five populations worldwide Distribution of observed heterosygosities at 94 indel loci genotyped in 18 wolves from five populations worldwide.

ers to analyse the genetic diversity at the intra-populational level using tissue samples of 27 wolves collected between 1985 and 2005 from roadkills or shot animals from Sweden [21,29].
Selection of markers A total of about 200,000 dog trace read sequences were obtained from GenBank. These sequence tags were almost exclusively derived from light shot-gun sequencing of unrelated dogs that was done in conjunction to the sequencing of the dog genome [18]. An automated pipeline was set up to survey the sequences for potential indels and for design of primers. The initial step in the pipeline was to place all STS sequences onto the dog genome. This was done using local NCBI BLAST [33], with a conservative setting to require an E value of less than 10-70. To avoid possible duplicated loci all cases where there was

more than one BLAST hit were discarded. Next, the BLAST results were surveyed for 4 bp indels, recognised as 4 bp gaps in alignments of shot-gun reads and the genome reference sequence. To avoid selection of microsatellites only those 4 bp indels where none of the flanks were identical to the indel were used for further processing. For each indel with at least 70 bp flanking sequence on both sides, Primer3 [34] was used for primer design. Primers were requested from the program for fragment lengths between 70 and 120 bp. The primers were constrained by a required melting temperature between 58 and 62°C, as well as a primer length between 19 and 22 bp. Finally the primers were evaluated with regard to self complementarity, as well as for the possibility of the resulting product to form a hair-pin. This was done through a simple complementarity testing procedure where possible self-complementarity at the sharp end of the hairpin was scored higher, and decreasing score inwards. The top 100 loci passing through all steps were picked for screening. Primers were fluorescently labelled with either FAM, HEX, or TET. The same animals were also genotyped for a set of 20 autosomal microsatellites, as described in ref. 20: c2001, c2006, c2010, c2017, c2054, c2079, c2088 and c2096, vWF, u109, u173, u225, u250 and u253 and PEZ01, PEZ03, PEZ05, PEZ06, PEZ08 and PEZ12.
Genotyping and data analysis Amplification by polymerase chain reaction (PCR) was performed in 10 ?l solution containing 20 ng DNA, 0.25 U AmpliTaq Gold polymerase with 1× Amplitaq Gold PCR buffer (Applied Biosystems), 2.5 mM MgCl2, 0.3 ?M of each primer and 0.4 mM dNTP. The PCR profile for the indel markers included initial heating at 95°C for 5 min, followed by 35 cycles of 95°C for 30 s, 58°C for 30 s and 72°C for 1 min, and a final extension at 72°C for 10 min. The profile for microsatellites included an initial denaturation step of 95°C for 10 min, 11 touch-down cycles with 94°C for 30 s, 58°C for 30 s, decreasing by 0.5°C in each cycle, and 72°C for 1 min, then 28 cycles of 94°C for 30 s, 52°C for 30 s and 72°C for 1 min and a final extension of 72°C for 10 min. PCR products were run on a MegaBACE 1000 capillary sequencer (Amersham Biosciences) and analyzed using the accompanied software Genetic Profiler 2.2. Observed and expected heterosygosities calculated using Microsatellite Toolkit for MS Excel [35], and correlation between the observed individual heterozygosities according to indel and microsatellite data was estimated.

Figure 3 and microsatellite (20) heterozygosities individual indel (51) Correlation between average observed in 22 Swedish wolves Correlation between average observed individual indel (51) and microsatellite (20) heterozygosities in 22 Swedish wolves.

Authors' contributions
?V carried out the molecular studies and performed the data analysis. MB participated in the design of the study, selected markers and designed primers. MJ participated in

Page 7 of 8
(page number not for citation purposes)

BMC Genetics 2008, 9:8

the molecular analyses. HE conceived of and coordinated the study, and wrote the paper together with ?V. All authors read and approved the final manuscript.



We thank Annika Einarsson for technical assistance, Jennifer Leonard and Carles Vilà for wolf samples, and two anonymous reviewers for useful comments on the manuscript. Financial support was obtained from the Norwegian and Swedish Natural Environmental Protection Agencies. ?V was supported by the fellowship of the Visby programme from the Swedish Institute. 21.

22. 23. 24. 25.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Schlotterer C, Amos B, Tautz D: Conservation of polymorphic simple sequence loci in Cetacean species. Nature 1991, 354(6348):63-65. Ellegren H: DNA typing of museum birds. Nature 1991, 354(6349):113-113. Ellegren H: Polymerase-chain-reaction (PCR) analysis of microsatellites - a new approach to studies of genetic relationships in birds. Auk 1992, 109(4):886-895. Schlotterer C: The evolution of molecular markers - just a matter of fashion? Nature Reviews Genetics 2004, 5(1):63-69. Brumfield RT, Beerli P, Nickerson DA, Edwards SV: The utility of single nucleotide polymorphisms in inferences of population history. Trends in Ecology & Evolution 2003, 18(5):249-256. Morin PA, Luikart G, Wayne RK, SNP workshop group: SNPs in ecology, evolution and conservation. Trends in Ecology & Evolution 2004, 19(4):208-216. Zane L, Bargelloni L, Patarnello T: Strategies for microsatellite isolation: A review. Molecular Ecology 2002, 11(1):1-16. Primmer CR, Moller AP, Ellegren H: A wide-range survey of cross-species microsatellite amplification in birds. Molecular Ecology 1996, 5(3):365-378. Syv?nen AC: Accessing genetic variation: Genotyping single nucleotide polymorphisms. Nature Reviews Genetics 2001, 2(12):930-942. Ellegren H: Microsatellites: Simple sequences with complex evolution. Nature Reviews Genetics 2004, 5(6):435-445. Pompanon F, Bonin A, Bellemain E, Taberlet P: Genotyping errors: Causes, consequences and solutions. Nature Reviews Genetics 2005, 6(11):847-859. Bhangale TR, Rieder MJ, Livingston RJ, Nickerson DA: Comprehensive identification and characterization of diallelic insertiondeletion polymorphisms in 330 human candidate genes. Human Molecular Genetics 2005, 14(1):59-69. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE: An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Research 2006, 16(9):1182-1190. Ometto L, Stephan W, De Lorenzo D: Insertion/deletion and nucleotide polymorphism data reveal constraints in Drosophila melanogaster introns and intergenic regions. Genetics 2005, 169(3):1521-1527. Brandstr?m M, Ellegren H: The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) genome: a high frequency of deletions in tandem duplicates. Genetics 2007, 176(3):1691-1701. Ellegren H: Molecular evolutionary genomics of birds. Cytogenet Genome Res 2007, 117(1-4):120-130. Shedlock AM, Okada N: SINE insertions: Powerful tools for molecular systematics. Bioessays 2000, 22(2):148-160. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, Mauceli E, Xie XH, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, deJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S, Grabherr M, Kellis M, Kleber M, Bardeleben C, Goodstadt L, Heger A, Hitte C, Kim L, Koepfli KP, Parker HG, Pollinger JP, Searle SMJ, Sutter NB, Thomas R, Webber C, Lander ES, Plat BIGS: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005, 438(7069):803-819.

26. 27.


29. 30. 31. 32.





34. 35.

16. 17. 18.

Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL, Pop M, Wang W, Fraser CM, Venter JC: The dog genome: Survey sequencing and comparative analysis. Science 2003, 301(5641):1898-1903. Brandstr?m M, Ellegren H: The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) genome: A high frequency of deletions in tandem duplicates. Genetics 2007, 176(3):1691-1701. Vilà C, Sundqvist AK, Flagstad O, Seddon J, Bjornerfeldt S, Kojola I, Casulli A, Sand H, Wabakken P, Ellegren H: Rescue of a severely bottlenecked wolf (Canis lupus) population by a single immigrant. Proceedings of the Royal Society of London Series B-Biological Sciences 2003, 270(1510):91-97. Leonard JA, Wayne RK, Wheeler J, Valadez R, Guillen S, Vila C: Ancient DNA evidence for Old World origin of New World dogs. Science 2002, 298(5598):1613-1616. Savolainen P, Zhang YP, Luo J, Lundeberg J, Leitner T: Genetic evidence for an East Asian origin of domestic dogs. Science 2002, 298(5598):1610-1613. Vilà C, Savolainen P, Maldonado JE, Amorim IR, Rice JE, Honeycutt RL, Crandall KA, Lundeberg J, Wayne RK: Multiple and ancient origins of the domestic dog. Science 1997, 276(5319):1687-1689. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu PG, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437(7057):376-380. Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS: SNP discovery via 454 transcriptome sequencing. Plant Journal 2007, 51:910-918. Backstr?m N, Fagerberg S, Ellegren H: Genomics of natural bird populations: a gene-based set of reference markers evenly spread across the avian genome. Molecular Ecology 2007, 17:964-980. Wabakken P, Sand H, Liberg O, Bjarvall A: The recovery, distribution, and population dynamics of wolves on the Scandinavian peninsula, 1978-1998. Canadian Journal of Zoology-Revue Canadienne De Zoologie 2001, 79(4):710-725. Seddon JM, Parker HG, Ostrander EA, Ellegren H: SNPs in ecological and conservation studies: A test in the Scandinavian wolf population. Molecular Ecology 2005, 14(2):503-511. Anderson EC, Garza JC: The power of single-nucleotide polymorphisms for large-scale parentage inference. Genetics 2006, 172(4):2567-2582. Glaubitz JC, Rhodes OE, Dewoody JA: Prospects for inferring pairwise relationships with single nucleotide polymorphisms. Molecular Ecology 2003, 12(4):1039-1047. Salathia N, Lee HN, Sangster TA, Morneau K, Landry CR, Schellenberg K, Behere AS, Gunderson KL, Cavalieri D, Jander G, Queitsch C: Indel arrays: An affordable alternative for genotyping. Plant J 2007, 51(4):727-737. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 1997, 25(17):3389-3402. Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods in Molecular Biology 2000, 132:365–386. Park SDE: Trypanotolerance in West African cattle and the population genetic effects of selection. PhD thesis , University of Dublin,; 2001.

Page 8 of 8
(page number not for citation purposes)


Genetic polymorphisms of 10 X-chromosome STR loci in Chinese Daur
cyp3a4 and cyp2c19 genetic polymorphisms and zolpidem metabolism in the
Genetic association of insulin-like growth factor-1 polymorphisms with high-grade myopia in an inter
Genetic polymorphisms influence runners responses
cytochrome p450 3a genetic polymorphisms and inter-ethnic differences
中国东乡族9个STR基因座遗传多态性研究 Genetic Polymorphisms of 9 STR Loci in Dongxiang Ethni
北京地区汉族人群21号染色体上5个STR基因座的遗传多态性 Genetic Polymorphisms of Five STR Loci
Genetic polymorphisms of glutathione S-transferase genes GSTM1, GSTT1 and risk of CHD
Alcohol and genetic polymorphisms:effect on risk of alcohol-related cancer