Class UnsortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>


  • public class UnsortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>
    extends BasecallsConverter<CLUSTER_OUTPUT_RECORD>
    UnortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data from standard Illumina formats to specific output records (FASTA records/SAM records). This data is processed on a tile by tile basis.

    The underlying IlluminaDataProvider applies several optional transformations that can include EAMSS filtering, non-PF read filtering and quality score recoding using a BclQualityEvaluationStrategy.

    The converter can also limit the scope of data that is converted from the data provider by setting the tile to start on (firstTile) and the total number of tiles to process (tileLimit).

    Additionally, BasecallsConverter can optionally demultiplex reads by outputting barcode specific reads to their associated writers.

    • Constructor Detail

      • UnsortedBasecallsConverter

        protected UnsortedBasecallsConverter​(File basecallsDir,
                                             File barcodesDir,
                                             int[] lanes,
                                             ReadStructure readStructure,
                                             Map<String,​? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                             boolean demultiplex,
                                             Integer firstTile,
                                             Integer tileLimit,
                                             BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                             boolean ignoreUnexpectedBarcodes,
                                             boolean applyEamssFiltering,
                                             boolean includeNonPfReads,
                                             htsjdk.io.AsyncWriterPool writerPool,
                                             BarcodeExtractor barcodeExtractor,
                                             Integer numThreads)
        Constructs a new BasecallsConverter object.
        Parameters:
        basecallsDir - Where to read basecalls from.
        barcodesDir - Where to read barcodes from (optional; use basecallsDir if not specified).
        lanes - What lane to process.
        readStructure - How to interpret each cluster.
        barcodeRecordWriterMap - Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.
        demultiplex - If true, output is split by barcode, otherwise all are written to the same output stream. available cores - numProcessors.
        firstTile - (For debugging) If non-null, start processing at this tile.
        tileLimit - (For debugging) If non-null, process no more than this many tiles.
        bclQualityEvaluationStrategy - The basecall quality evaluation strategy that is applyed to decoded base calls.
        ignoreUnexpectedBarcodes - If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap.
        applyEamssFiltering - If true, apply EAMSS filtering if parsing BCLs for bases and quality scores.
        includeNonPfReads - If true, will include ALL reads (including those which do not have PF set).
    • Method Detail

      • processTilesAndWritePerSampleOutputs

        public void processTilesAndWritePerSampleOutputs​(Set<String> barcodes)
                                                  throws IOException
        Set up tile processing and record writing threads for this converter. This creates a tile reading thread pool of size 4. The tile processing threads notify the completed work checking thread when they are done processing a thread. The completed work checking thread will then dispatch the record writing for tiles in order.
        Specified by:
        processTilesAndWritePerSampleOutputs in class BasecallsConverter<CLUSTER_OUTPUT_RECORD>
        Parameters:
        barcodes - The barcodes used for demultiplexing. When there is no demultiplexing done this should be a Set containing a single null value.
        Throws:
        IOException