Class MarkDuplicatesWithMateCigarIterator

  • All Implemented Interfaces:
    htsjdk.samtools.SAMRecordIterator, htsjdk.samtools.util.CloseableIterator<htsjdk.samtools.SAMRecord>, Closeable, AutoCloseable, Iterator<htsjdk.samtools.SAMRecord>

    public class MarkDuplicatesWithMateCigarIterator
    extends Object
    implements htsjdk.samtools.SAMRecordIterator
    This will iterate through a coordinate sorted SAM file (iterator) and either mark or remove duplicates as appropriate. This class relies on the coordinate sort order as well as the mate cigar (MC) optional SAM tag.
    • Constructor Detail

      • MarkDuplicatesWithMateCigarIterator

        public MarkDuplicatesWithMateCigarIterator​(htsjdk.samtools.SAMFileHeader header,
                                                   htsjdk.samtools.util.CloseableIterator<htsjdk.samtools.SAMRecord> iterator,
                                                   OpticalDuplicateFinder opticalDuplicateFinder,
                                                   htsjdk.samtools.DuplicateScoringStrategy.ScoringStrategy duplicateScoringStrategy,
                                                   int toMarkQueueMinimumDistance,
                                                   boolean removeDuplicates,
                                                   boolean skipPairsWithNoMateCigar,
                                                   int maxRecordsInRam,
                                                   int blockSize,
                                                   List<File> tmpDirs)
                                            throws PicardException
        Initializes the mark duplicates iterator.
        Parameters:
        header - the SAM header
        iterator - an iterator over the SAM records to consider
        opticalDuplicateFinder - the algorithm for optical duplicate detection
        duplicateScoringStrategy - the scoring strategy for choosing duplicates. This cannot be SUM_OF_BASE_QUALITIES.
        toMarkQueueMinimumDistance - minimum distance for which to buffer
        removeDuplicates - true to remove duplicates, false to mark duplicates
        skipPairsWithNoMateCigar - true to not return mapped pairs with no mate cigar, false otherwise
        blockSize - the size of the blocks in the underlying buffer/queue
        tmpDirs - the temporary directories to use if we spill records to disk
        Throws:
        PicardException - if the inputs are not in coordinate sort order
    • Method Detail

      • logMemoryStats

        public void logMemoryStats​(htsjdk.samtools.util.Log log)
      • assertSorted

        public htsjdk.samtools.SAMRecordIterator assertSorted​(htsjdk.samtools.SAMFileHeader.SortOrder sortOrder)
        Establishes that records returned by this iterator are expected to be in the specified sort order. If this method has been called, then implementers must throw an IllegalStateException from tmpReadEnds() when a samRecordWithOrdinal is read that violates the sort order. This method may be called multiple times over the course of an iteration, changing the expected sort, if desired -- from the time it is called, it validates whatever sort is set, or stops validating if it is set to null or SAMFileHeader.SortOrder.unsorted. If this method is not called, then no validation of the iterated records is done.
        Specified by:
        assertSorted in interface htsjdk.samtools.SAMRecordIterator
        Parameters:
        sortOrder - The order in which records are expected to be returned
        Returns:
        This SAMRecordIterator
      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface Iterator<htsjdk.samtools.SAMRecord>
      • remove

        public void remove()
        Specified by:
        remove in interface Iterator<htsjdk.samtools.SAMRecord>
      • close

        public void close()
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface Closeable
        Specified by:
        close in interface htsjdk.samtools.util.CloseableIterator<htsjdk.samtools.SAMRecord>
      • getNumRecordsWithNoMateCigar

        public long getNumRecordsWithNoMateCigar()
        Useful for statistics after the iterator has been exhausted and closed.
      • getNumDuplicates

        public int getNumDuplicates()
      • getOpticalDupesByLibraryId

        public htsjdk.samtools.util.Histogram<Short> getOpticalDupesByLibraryId()