Class CollectHiSeqXPfFailMetrics.ReadClassifier

    • Constructor Detail

      • ReadClassifier

        public ReadClassifier​(ReadData read)
        Heart of CLP. This class actually classifies ReadData into the reason why it failed PF classification is based on a small set of titrated flowcells sequenced at the Broad Institute by the Genomics Platform. Three cluster were observed: - numNs~24 and was found only near the boundaries of tiles. it didn't seem to depend on concentration. For this reason it was classified as MISALIGNED

        - numNs~0 and numQGtTwo<=8 these were found throughout the tiles and _decreased_ in number as the concentration of the library increased Thus it was concluded that these correspond to the EMPTY wells

        - numNs~0 and numQGtTwo>=12 there were found throughout the tiles and _increased_ in number as the concentration of the library increased Thus it was concluded that these correspond to the POLYCLONAL wells

        - the remaining reads were few in number the classification for them wasn't clear. Thus they are left as UNKNOWN.

        We use the length of the read as a parameter and scale the 8 and the 12 accordingly as length/3 and length/2, but in reality this has only been tested on length=24.

        Parameters:
        read - The read to classify.