This software application was developed and tested on a 64-bit Linux (CentOS 6.5) environment.
It requires gSearch, gcc (as the compiler), Python, and R for running.
Name | Description | Download |
---|---|---|
LR Filter | A set of shell scripts, C source code, and Python programs for logistic regression based filtering | Download |
All database files were originally downloaded from the UCSC Table Browser and converted to appropriate formats for LR Filter.
Name | Description | Download |
---|---|---|
RepeatMasker track | BED file for the RepeatMasker track | Download |
dbSNP131 (SNV) | dbSNP (Build ID: 131) database for SNVs in Genome Variation Format (GVF) | Download |
dbSNP131 (INS) | dbSNP (Build ID: 131) database for insertions in GVF | Download |
dbSNP131 (DEL) | dbSNP (Build ID: 131) database for deletions in GVF | Download |
RefSeq genes | A RefSeq gene model database in gSearch Format (GSF) | Download |
We provide a sample training variant call file and a set of gold standard variants for it.
Name | Description | Download |
---|---|---|
12877_ill_hg19.gvf | An example training variant call file in GVF for NA12877 (an individual of CEPH/Utah Pedigree 1463) prepared using the Illumina CASAVA pipeline | Download |
12877_ill_hg19_gs.gvf | An example set of gold standard variants for NA12877, consisting of the variants concordantly called by both Complete Genomics and Illumina platforms | Download |
12877_ill.gvf.filter | The logistic regression filter built using the training and gold standard variant call files in GVF for NA12877 | Download |
12877_ill.gvf.summary | Summary of the logistic regression filter built using the training and gold standard variant call files in GVF for NA12877 | Download |
... chr1 VCF SNV 28863 28863 9 + . ID=36;Reference_seq=C;Variant_seq=A;Genotype=heterozygous;gene_component_detail=Ensembl|WASH7P|ENST00000423562|intron,Ensembl|WASH7P|ENST00000438504|intron,Ensembl|WASH7P|ENST00000488147|intron,Ensembl|WASH7P|ENST00000538476|intron,Ensembl|WASH7P|ENST00000430492|intron,UCSC|WASH7P|uc001aah.4|intron,UCSC|WASH7P|uc009vir.3|intron,UCSC|WASH7P|uc009viq.3|intron,UCSC|WASH7P|uc001aac.4|intron,UCSC|WASH7P|uc009viv.2|intron,UCSC|WASH7P|uc009viw.2|intron,UCSC|WASH7P|uc009vix.2|intron,UCSC|WASH7P|uc009viy.2|intron,UCSC|WASH7P|uc009viz.2|intron,UCSC|WASH7P|uc010nxs.1|intron,UCSC|WASH7P|uc009vjb.1|intron,UCSC|WASH7P|uc009vje.2|intron,UCSC|WASH7P|uc009vjf.2|intron,RefSeq|WASH7P|NR_024540|intron;gene_component_overlap=1; chr1 VCF SNV 30923 30923 56 + . ID=37;Reference_seq=G;Variant_seq=T;Genotype=homozygous;gene_component_detail=Ensembl|MIR1302-11|ENST00000473358|intron,Ensembl|MIR1302-11|ENST00000469289|intron;gene_component_overlap=1; chr1 VCF SNV 49298 49298 68 + . ID=39;Reference_seq=T;Variant_seq=C;Genotype=homozygous; chr1 VCF SNV 51459 51459 5 + . ID=40;Reference_seq=G;Variant_seq=A;Genotype=heterozygous; chr1 VCF SNV 51476 51476 5 + . ID=41;Reference_seq=T;Variant_seq=C;Genotype=heterozygous; chr1 VCF SNV 51928 51928 24 + . ID=42;Reference_seq=G;Variant_seq=A;Genotype=heterozygous; chr1 VCF SNV 52238 52238 106 + . ID=43;Reference_seq=T;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 54676 54676 19 + . ID=46;Reference_seq=C;Variant_seq=T;Genotype=heterozygous; chr1 VCF SNV 54708 54708 16 + . ID=47;Reference_seq=G;Variant_seq=C;Genotype=heterozygous; chr1 VCF SNV 54716 54716 18 + . ID=48;Reference_seq=C;Variant_seq=T;Genotype=heterozygous; chr1 VCF SNV 54844 54844 19 + . ID=49;Reference_seq=G;Variant_seq=A;Genotype=homozygous; chr1 VCF SNV 55164 55164 102 + . ID=50;Reference_seq=C;Variant_seq=A;Genotype=homozygous; chr1 VCF SNV 58211 58211 26 + . ID=51;Reference_seq=A;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 61442 61442 36 + . ID=53;Reference_seq=A;Variant_seq=G;Genotype=homozygous; ...
... chr1 VCF SNV 52238 52238 106 + . ID=43;Reference_seq=T;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 55164 55164 102 + . ID=50;Reference_seq=C;Variant_seq=A;Genotype=homozygous; chr1 VCF SNV 58211 58211 26 + . ID=51;Reference_seq=A;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 61442 61442 36 + . ID=53;Reference_seq=A;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 69511 69511 45 + . ID=60;Reference_seq=A;Variant_seq=G;Genotype=homozygous;gene_component_detail=Ensembl|OR4F5|ENST00000335137|CDS,CCDS|OR4F5|CCDS30547.1|CDS,UCSC|OR4F5|uc001aal.1|CDS,RefSeq|OR4F5|NM_001005484|CDS;gene_component_overlap=1; chr1 VCF SNV 128798 128798 120 + . ID=112;Reference_seq=C;Variant_seq=T;Genotype=homozygous;gene_component_detail=Ensembl|RP11-34P13.7|ENST00000477740|intron,Ensembl|RP11-34P13.7|ENST00000471248|intron;gene_component_overlap=1; chr1 VCF SNV 548491 548491 38 + . ID=218;Reference_seq=C;Variant_seq=T;Genotype=homozygous;gene_component_detail=Ensembl|RP5-857K21.4|ENST00000440200|intron;gene_component_overlap=1.0000; chr1 VCF deletion 567240 567240 770 + . ID=24;Reference_seq=G;Variant_seq=-;Genotype=homozygous;gene_component_detail=Ensembl|RP5-857K21.4|ENST00000440200|intron,Ensembl|RP5-857K21.6|ENST00000414273|CDS;gene_component_overlap=1; chr1 VCF deletion 688055 688055 90 + . ID=27;Reference_seq=A;Variant_seq=-;Genotype=homozygous; chr1 VCF SNV 704367 704367 45 + . ID=326;Reference_seq=T;Variant_seq=C;Genotype=homozygous;gene_component_detail=Ensembl|RP11-206L10.2|ENST00000428504|intron,UCSC|LOC100288069|uc001abo.3|intron,RefSeq|LOC100288069|NR_033908|intron;gene_component_overlap=1; ...
.SNV.hetero (Intercept) quality_score rmsk dbsnp gene_model -8.36093490 1.80831214 -0.07303558 1.11260619 10.84397095 SNV_typeA->G SNV_typeA->T SNV_typeC->A SNV_typeC->G SNV_typeC->T 0.26527300 -0.22657708 0.07260413 0.13455575 0.36617201 SNV_typeG->A SNV_typeG->C SNV_typeG->T SNV_typeT->A SNV_typeT->C 0.37280697 0.13363085 0.07604954 -0.26103353 0.24608697 SNV_typeT->G -0.00743212 ...
12877_ill_hg19.gvf.SNV.hetero Call: glm(formula = yval ~ ., family = binomial(), data = train) Deviance Residuals: Min 1Q Median 3Q Max -5.0565 0.0016 0.1642 0.4581 3.6338 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.360935 0.022054 -379.105 < 2e-16 *** quality_score 1.808312 0.003781 478.238 < 2e-16 *** rmsk -0.073036 0.005123 -14.255 < 2e-16 *** dbsnp 1.112606 0.006768 164.390 < 2e-16 *** gene_model 10.843971 0.375903 28.848 < 2e-16 *** SNV_typeA->G 0.265273 0.012985 20.429 < 2e-16 *** SNV_typeA->T -0.226577 0.015798 -14.342 < 2e-16 *** SNV_typeC->A 0.072604 0.015710 4.622 3.81e-06 *** SNV_typeC->G 0.134556 0.016282 8.264 < 2e-16 *** SNV_typeC->T 0.366172 0.012835 28.529 < 2e-16 *** SNV_typeG->A 0.372807 0.012845 29.023 < 2e-16 *** SNV_typeG->C 0.133631 0.016261 8.218 < 2e-16 *** SNV_typeG->T 0.076050 0.015733 4.834 1.34e-06 *** SNV_typeT->A -0.261034 0.015725 -16.599 < 2e-16 *** SNV_typeT->C 0.246087 0.012965 18.980 < 2e-16 *** SNV_typeT->G -0.007432 0.015995 -0.465 0.642 --- ...
We provide a sample variant call file for variant prioritization and the logistic regression filter built using the training variant call file and the gold standard variants for it described above. The sample variant file with filtering score is also given.
Name | Description | Download |
---|---|---|
12878_ill_hg19.gvf | An example variant call file in GVF for NA12878 (an individual of the CEPH/Utah pedigree) prepared using the Illumina CASAVA pipeline | Download |
12877_ill.gvf.filter | The logistic regression filter built using the training and gold standard variant call files in GVF for NA12877 | Download |
12878_ill_hg19.filtered_lr.gvf | The example variant call file in GVF for NA12878 with filtering score predicted by the logistic regression filter above. | Download |
... chr1 VCF SNV 20250 20250 11 + . ID=22;Reference_seq=T;Variant_seq=C;Genotype=heterozygous; chr1 VCF SNV 28376 28376 49 + . ID=24;Reference_seq=G;Variant_seq=A;Genotype=homozygous; chr1 VCF SNV 28563 28563 53 + . ID=25;Reference_seq=A;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 28835 28835 10 + . ID=26;Reference_seq=A;Variant_seq=G;Genotype=heterozygous; chr1 VCF SNV 30923 30923 39 + . ID=27;Reference_seq=G;Variant_seq=T;Genotype=homozygous; chr1 VCF SNV 31029 31029 5 + . ID=28;Reference_seq=G;Variant_seq=A;Genotype=heterozygous; chr1 VCF SNV 52238 52238 53 + . ID=31;Reference_seq=T;Variant_seq=G;Genotype=homozygous; chr1 VCF SNV 54586 54586 13 + . ID=32;Reference_seq=T;Variant_seq=C;Genotype=heterozygous; chr1 VCF SNV 54676 54676 129 + . ID=33;Reference_seq=C;Variant_seq=T;Genotype=heterozygous; chr1 VCF SNV 54708 54708 14 + . ID=34;Reference_seq=G;Variant_seq=C;Genotype=heterozygous; ...
.SNV.hetero (Intercept) quality_score rmsk dbsnp gene_model -8.36093490 1.80831214 -0.07303558 1.11260619 10.84397095 SNV_typeA->G SNV_typeA->T SNV_typeC->A SNV_typeC->G SNV_typeC->T 0.26527300 -0.22657708 0.07260413 0.13455575 0.36617201 SNV_typeG->A SNV_typeG->C SNV_typeG->T SNV_typeT->A SNV_typeT->C 0.37280697 0.13363085 0.07604954 -0.26103353 0.24608697 SNV_typeT->G -0.00743212 ...
... chr1 VCF SNV 20250 20250 11 + . ID=22;Reference_seq=T;Variant_seq=C;Genotype=heterozygous;repeat_with_rmsk_tag_detail=L3;dbsnp_tag_detail=ID=960;Reference_seq=T;Variant_seq=C;,ID=961;Reference_seq=T;Variant_seq=A;,ID=962;Reference_seq=T;Variant_seq=G;;T->C_SNV;lr_score=1.08205372613e-06; chr1 VCF SNV 28376 28376 49 + . ID=24;Reference_seq=G;Variant_seq=A;Genotype=homozygous;dbsnp_tag_detail=ID=1724;Reference_seq=G;Variant_seq=A;,ID=1725;Reference_seq=G;Variant_seq=C;,ID=1726;Reference_seq=G;Variant_seq=T;,ID=1727;Reference_seq=G;Variant_seq=A;;G->A_SNV;lr_score=5.07639376526e-12; chr1 VCF SNV 28563 28563 53 + . ID=25;Reference_seq=A;Variant_seq=G;Genotype=homozygous;dbsnp_tag_detail=ID=1751;Reference_seq=A;Variant_seq=G;,ID=1752;Reference_seq=A;Variant_seq=C;,ID=1753;Reference_seq=A;Variant_seq=T;;A->G_SNV;lr_score=4.09326257894e-12; chr1 VCF SNV 28835 28835 10 + . ID=26;Reference_seq=A;Variant_seq=G;Genotype=heterozygous;dbsnp_tag_detail=ID=1784;Reference_seq=A;Variant_seq=G;,ID=1785;Reference_seq=A;Variant_seq=G;,ID=1786;Reference_seq=A;Variant_seq=G;;A->G_SNV;lr_score=9.1658815874e-07; chr1 VCF SNV 30923 30923 39 + . ID=27;Reference_seq=G;Variant_seq=T;Genotype=homozygous;repeat_with_rmsk_tag_detail=(TC)n;dbsnp_tag_detail=ID=1965;Reference_seq=G;Variant_seq=A;,ID=1966;Reference_seq=G;Variant_seq=C;;G->T_SNV;lr_score=1.0523479336e-11; chr1 VCF SNV 31029 31029 5 + . ID=28;Reference_seq=G;Variant_seq=A;Genotype=heterozygous;repeat_with_rmsk_tag_detail=MLT1A;dbsnp_tag_detail=ID=1972;Reference_seq=G;Variant_seq=A;,ID=1973;Reference_seq=G;Variant_seq=C;,ID=1974;Reference_seq=G;Variant_seq=T;;G->A_SNV;lr_score=3.10132320581e-06; chr1 VCF SNV 52238 52238 53 + . ID=31;Reference_seq=T;Variant_seq=G;Genotype=homozygous;repeat_with_rmsk_tag_detail=AT_rich;dbsnp_tag_detail=ID=2967;Reference_seq=T;Variant_seq=G;;T->G_SNV;lr_score=3.92584308003e-12; chr1 VCF SNV 54586 54586 13 + . ID=32;Reference_seq=T;Variant_seq=C;Genotype=heterozygous;repeat_with_rmsk_tag_detail=L2;dbsnp_tag_detail=ID=3012;Reference_seq=T;Variant_seq=C;;T->C_SNV;lr_score=7.9993499285e-07; chr1 VCF SNV 54676 54676 129 + . ID=33;Reference_seq=C;Variant_seq=T;Genotype=heterozygous;repeat_with_rmsk_tag_detail=L2;dbsnp_tag_detail=ID=3013;Reference_seq=C;Variant_seq=T;;C->T_SNV;lr_score=8.74542353761e-09; chr1 VCF SNV 54708 54708 14 + . ID=34;Reference_seq=G;Variant_seq=C;Genotype=heterozygous;repeat_with_rmsk_tag_detail=L2;G->C_SNV;lr_score=1.86216375082e-06; ...