VariationHunter/CommonLaw: Tool(s) for structural variation (insertion, deletion and inversion) discovery on one or multiple genomes simultaneously.


The method has two main parts.

1) Clustering of discordant paired-end read mappings
2) Selection of the SVs from the clusters created in previous part.

Clustering Algorithm

The discordant paired-end reads mapping should be in the DIVET format which mrFast produces.

1 - Concatenate and filter the DIVET files generated with mrFAST:

The DIVET format is as follows:
The input file of the mappings should look like this:

IL5_286:1:311:10:40 chr1 142188880 142188915 R = 141736787 141736822 F deletion 3 23.930555 0.00257801939733326435
IL5_286:1:317:831:353 chr16 28530262 28530297 R = 28517264 28517299 F deletion 0 26.263889 1.00000000000000000000
IL5_286:1:37:711:983 chr1 152551838 152551873 F = 152560844 152560879 R deletion 0 23.777779 1.00000000000000000000
IL5_286:1:30:707:467 chr10 41694806 41694841 R = 41686145 41686180 F deletion 4 24.666666 0.00000000860654392199

The fields are as follow:

<readN> <chroN> <ML_{1,L}> <ML_{1,R}> <OR_{1}> <ML_{2_L}> <ML{2_R}> <OR_{2}> <SV_type> <ED> <AvgPhred> <ProbBasedPhred>

<readN>: the name of the read
<chroN>: the chr. the read has mapped to
<ML_{1,L}>: Mapping Location of left side of the first end-pair of the read
<ML_{1,R}>: Mapping Location of right side of the first end-pair of read
<OR_{1}> : orientation of the mapping of the first end-pair {R: Reverse, F:Forward}
<chroN2> : this field can be equal to = (if it is the same as chroN)
<ML_{2,L}>: Mapping Location of left side of the second end-pair of the read
<ML_{2,R}> : Mapping Location of right side of the first end-pair of read
<OR_{2}>: orientation of the mapping of the first end-pair {R: Reverse, F:Forward}
<SVtype>: the type of variation this mapping of paired-end is suggesting {insertion, deletion, inversion}
<ED>: total edit distance of the mapping of this paired-end
<AvgPhred>: the average phred score of this paired-end read
<ProbBasedPhred>: the probability of paired-end mapping to that location solely based on the phred score and
the edit distance.

How to calculate <ProbBasedPhred>:
This is the probability of a paired-end been mapped to the given possition based on the phred score of mistmatches.
Assuming that the mapping has a "k" mismatches at positions n_1, n_2, ..., n_k.
Now ProbBasedPhred(n_1, n_2, ...., n_k)=Pr(n_1,n_2,..., n_(k-1))*(1/1000 + 10^(-phred/10) - 1/1000*10^(-phred/10)

Note that mrFast produces the DIVET file itself.

2 - Create maximal clusters:

VH -c AllChro -i initInfo -l sample.lib -r genome.satellite.bed -g -o sample.cluster.out -t -x 500 -p 0.001
chromosome names and chromosome length (the numbers may be larger than the actual length).
library_name sample_name sample.vh min_cut max_cut read_length
... continue as necessary if the first line is >1

NA12878-lib1 NA12878 NA12878-1.DIVET.vh 93 184 36
NA12878-lib2 NA12878 NA12878-2.DIVET.vh 125 443 36
calculate min_cut=mean-4std and max_cut=mean+4std. These numbers will be given to mrFAST or mrsFAST as --min/--max parameters in the mapping stage.
NA12878-lib1 NA12878 NA12878-1.DIVET.vh 93 184 36
NA12878-lib2 NA12878 NA12878-2.DIVET.vh 125 443 36
NA12891-lib1 NA12891 NA12891-1.DIVET.vh 145 440 36
NA12892-lib1 NA12892 NA12892-1.DIVET.vh 115 410 36

VariationHunter/CommonLaw selection Algorithm:

The command for running Set Cover (selection) part of VariationHunter and CommonLaw is as follow (these are the necessary fields):

setCover -l sample.lib -r -c sample.cluster.out -t numberOfTopCalls -o sample.Out.SV
Additional (optional) flags:

n total individuals that is 2^n-1 total weights. Assuming the input is Trio of NA12878, NA12891 and NA12892 a sample of this file will look like this:
NA12878 : 100
NA12891 : 20
NA12892 : 20
NA12878 NA12891 : 5
NA12878 NA12892 : 5
NA12891 NA12892 : 10
NA12878 NA12891 NA12892 : 2

Note 1) that the larger the weight th less likely for commonLaw to pick that combination.
Note 2) If the option of -w is not used and the input is multi individual the commonLaw algorithm will assign weight of 1 to every possible combination.
Note) The default value is 0.

Interpreting the output:

In the output file the list of calls with the paired-end reads supporting the call is provided. The output looks as follow:

Chr:chr1 Start_Outer:10105 Start_Inner:10468 End_Inner:176912 End_Outer:176989 SVtype:D sup:11 Sum_Weight:0 AvgEditDits:4.636364 Lib:6878 LibSup:2 LibHurScore:1.3 AvgEditDistInd:1 Lib:6879 LibSup:4 LibHurScore:2.4 AvgEditDistInd:2 Lib:6880 LibSup:0 LibHurScore:0 AvgEditDistInd:0 minDelLen:166028 maxDelLen:166448
D0A81ACXX_177:8:1303:2818:78987:1#0 chr1 10110 176912 2 0.251581 F R 6878.PE 6878
C059EACXX_133:5:1204:8980:105851:1#0 chr1 10146 176934 3 0.0372024 F R 6878.PE 6878
C05MRACXX_123:8:1308:11157:199354:1#0 chr1 10151 176989 3 0.0231612 F R 6879.PE 6879
C02EEACXX_228:2:1306:4928:144278:1#0 chr1 10153 176937 5 0.015873 F R 6879.PE 6879
D051CACXX_86:1:2203:16675:109505:1#0 chr1 10286 176926 5 0.0298211 F R 6879.PE 6879
D09P4ACXX_131:5:2102:4253:141916:1#0 chr1 10105 176920 5 0.5 F R 6879.PE 6879
D09P4ACXX_131:5:2108:6007:165040:1#0 chr1 10217 176945 5 0.0227273 F R 6879.PE 6879
D0A9DACXX_85:2:1304:21086:88162:1#0 chr1 10455 176942 5 0.0168919 F R 6879.PE 6879
C02EEACXX_228:2:1108:17262:9610:1#0 chr1 10109 176937 6 0.03125 F R 6879.PE 6879

The First Line is the information about the predicted SV in following order.
For each individual these information also are provided:
  • Lib: Name of the individual
  • LibSup: Number of paired-end reads from this particular individual supporting the SV
  • LibHurScore: The total heuristic score of this SV for this individual (Note that if the option -wh is not used LibHurScore will
be equal to LibSup.
  • AvgEditDistInd: The average edit distance of the mapping from this individual supporting the SV

Following the SV information line the set of paired-end read mappings support this SV is given.
ReadName chroName pos1 pos2 edit_distance heuristic_score orienation_1 orientation_2 library_name individual_name

Valid XHTML :: Valid CSS: :: Powered by WikkaWiki