Class CollectIndependentReplicateMetrics


  • @DocumentedFeature
    @ExperimentalFeature
    public class CollectIndependentReplicateMetrics
    extends CommandLineProgram
    A CLP that, given a BAM and a VCF with genotypes of the same sample, estimates the rate of independent replication of reads within the bam. That is, it estimates the fraction of the reads which look like duplicates (in the MarkDuplicates sense of the word) but are actually independent observations of the data. In the presence of Unique Molecular Identifiers (UMIs), various metrics are collected regarding the utility of the UMI's for the purpose of increasing coverage.

    The estimation is based on duplicate-sets of size 2 and 3 and gives separate estimates from each. The assumption is that the duplication rate (biological or otherwise) is independent of the duplicate-set size. A significant difference between the two rates may be an indication that this assumption is incorrect.

    The duplicate sets are found using the mate-cigar tag (MC) which is added by MergeBamAlignment , or FixMateInformation. This program will not work without the MC tag.

    Explanation of the calculation behind the estimation can be found in the IndependentReplicateMetric class.

    The calculation Assumes a diploid organism (more accurately, assumes that only two alleles can appear at a HET site and that these two alleles will appear at equal probabilities. It requires as input a VCF with genotypes for the sample in question. NOTE: This class is very much in alpha stage, and still under heavy development (feel free to join!)

    • Field Detail

      • INPUT

        @Argument(shortName="I",
                  doc="Input (indexed) BAM/CRAM file.")
        public File INPUT
      • OUTPUT

        @Argument(shortName="O",
                  doc="Write metrics to this file")
        public File OUTPUT
      • MATRIX_OUTPUT

        @Argument(shortName="MO",
                  doc="Write the confusion matrix (of UMIs) to this file",
                  optional=true)
        public File MATRIX_OUTPUT
      • VCF

        @Argument(shortName="V",
                  doc="Input VCF file")
        public File VCF
      • MINIMUM_GQ

        @Argument(shortName="GQ",
                  doc="minimal value for the GQ field in the VCF to use variant site.",
                  optional=true)
        public Integer MINIMUM_GQ
      • MINIMUM_MQ

        @Argument(shortName="MQ",
                  doc="minimal value for the mapping quality of the reads to be used in the estimation.",
                  optional=true)
        public Integer MINIMUM_MQ
      • MINIMUM_BQ

        @Argument(shortName="BQ",
                  doc="minimal value for the base quality of a base to be used in the estimation.",
                  optional=true)
        public Integer MINIMUM_BQ
      • SAMPLE

        @Argument(shortName="ALIAS",
                  doc="Name of sample to look at in VCF. Can be omitted if VCF contains only one sample.",
                  optional=true)
        public String SAMPLE
      • STOP_AFTER

        @Argument(doc="Number of sets to examine before stopping.",
                  optional=true)
        public Integer STOP_AFTER
      • BARCODE_TAG

        @Argument(doc="Barcode SAM tag.",
                  optional=true)
        public String BARCODE_TAG
      • BARCODE_BQ

        @Argument(doc="Barcode Quality SAM tag.",
                  optional=true)
        public String BARCODE_BQ
      • MINIMUM_BARCODE_BQ

        @Argument(shortName="MBQ",
                  doc="minimal value for the base quality of all the bases in a molecular barcode, for it to be used.",
                  optional=true)
        public Integer MINIMUM_BARCODE_BQ
      • FILTER_UNPAIRED_READS

        @Argument(shortName="FUR",
                  doc="Whether to filter unpaired reads from the input.",
                  optional=true)
        public boolean FILTER_UNPAIRED_READS
      • PROGRESS_STEP_INTERVAL

        @Argument(fullName="PROGRESS_STEP_INTERVAL",
                  doc="The interval between which progress will be displayed.",
                  optional=true)
        public int PROGRESS_STEP_INTERVAL
    • Constructor Detail

      • CollectIndependentReplicateMetrics

        public CollectIndependentReplicateMetrics()
    • Method Detail

      • doWork

        protected int doWork()
        Description copied from class: CommandLineProgram
        Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
        Specified by:
        doWork in class CommandLineProgram
        Returns:
        program exit status.