DNM Filter is a python script for filtering false positive de novo mutations (DNMs) based on the information from family members other than trio: siblings and offsprings.
Mendelian inheritance errors (MIEs) can be identified by comparing variant call files of trio. Briefly, a heterozygous variant of case is designated as MIE if none of its parents have the same variant. An MIE can be due to germline DNM or sequencing errors. Usually, more than 95% of the MIEs discovered by simply comparing the variant call files of a trio, prepared by next generation sequencing, are due to sequencing errors, i.e., false positive DNMs. There exist a number of methods for classifying true DNMs from false positive DNMs based on joint variant calling, machine learning, and ensemble genotyping. In any case, false positive DNMs can be further filtered if other family members than trio are available.
Our DNM Filter requires variant call files from siblings and offsprings of case. It uses gSearch to compare MIEs - DNM candidates - to variants in siblings and offsprings. First, a DNM candidate of a trio is filtered if anyone of the siblings of the case has the same variant. It is highly likely to be due to sequencing errors, considering the widely accepted human germline mutation rate. Second, a DNM candidate found in any child - offspring - of the case is regarded as highly probable true positive.
This software application was developed and tested on a 64-bit Linux (CentOS 6.5) environment.
It requires gSearch and Python for running.
Name | Description | Download |
---|---|---|
dnmfilter.py | A python script for filtering false positive DNMs and prioritize highly likely DNMs using the information from family members other than trio | Download |
Name | Usage | Description |
---|---|---|
Input DNM call file | -i <input_variant_call_file> |
A list of DNMs in GVF or VCF (format is determined from file name) |
Sibling/Offspring list | -f <family_variant_call_file_list> |
A text file listing variant call files from siblings or offsprings |
Output DNM call file prefix | -o <output_file_prefix> |
User defined prefix of output files |
Path to gSearch executable | -d </path/to/gsearch> |
(optional) Path to gsearch program |
Sample files for DNM Filter
Name | Description | Download |
input.vcf | An example of input DNM call file | Download |
sib1.vcf | An example of family variant call file from sibling #1 | Download |
sib2.vcf | An example of family variant call file from sibling #2 | Download |
off1.vcf | An example of family variant call file from offspring #1 | Download |
off2.vcf | An example of family variant call file from offspring #2 | Download |
off3.vcf | An example of family variant call file from offspring #3 | Download |
./dnmfilter.py -i input.vcf -f exfam_sib_only -o filtered
family variant call file list (exfam_sib_only)
S sib1.vcf S sib2.vcf
Outputs filtered.low_confidence.vcf
(dnm candidates shared with siblings) and filtered.intermediate.vcf
(dnm candidates not shared with siblings).
./dnmfilter.py -i input.vcf -f exfam_off_only -o filtered
family variant call file list (exfam_off_only)
O off1.vcf O off2.vcf O off3.vcf
Outputs filtered.high_confidence.vcf
(dnm candidates shared with offsprings) and filtered.intermediate.vcf
(dnm candidates not shared with offsprings).
./dnmfilter.py -i input.vcf -f exfam_all -o filtered
family variant call file list (exfam_all)
S sib1.vcf S sib2.vcf O off1.vcf O off2.vcf O off3.vcf
Outputs filtered.high_confidence.vcf
(dnm candidates shared with offsprings, not shared with siblings), filtered.low_confidence.vcf
(dnm candidates shared with siblings) and filtered.intermediate.vcf
(the other dnm candidates).
PATH
environment variable
./dnmfilter.py -d /home/user1/myprogs -i input.vcf -f exfam_all -o filtered
(when gsearch
is located in directory /home/user1/myprogs
)