The main results page gives a table with some summary statistics:
Total reads: total number of reads sequenced.
QC reads: the number of reads (randomly) selected for the QC analysis.
Reads w/adaptor: the number of 'QC reads' containing adaptor sequence (for Illumina sequencing).
Reads mapped: the number of 'QC reads' that mapped to the assembly used for the QC analysis. Here 770183/1003486 = approximately 76.8%.
Reads paired: the number of 'QC reads' that mapped as pairs to the assembly.
Reads mapped (rmdup): the number of mapped 'QC reads', after removing duplicate reads.
Total bases: the total number of bases in the reads sequenced (each read is 100-bp in this example).
QC bases: the total number of bases in the 'QC reads'.
Bases postclip: the total number of bases in the 'QC reads', after soft-clipping some (presumably erroneous) bases at the ends of reads [I am guessing this is soft-clipping rather than hard-clipping.]
Bases mapped: the total number of bases in the mapped regions of mapped 'QC reads'.
Bases mapped (rmdup): the tota number of bases in the mapped regions of mapped 'QC reads', after removing duplicate reads.
Assembly: the assembly that the 'QC reads' were mapped to.
Mapper: the mapping algorithm used.
Cycles: the number of sequencing cycles used. 100 cycles gives 100-bp reads.
NPG QC: says whether the data passed the Sanger 'NPG' group's QC analysis [I'm guessing this].
Error rate: the percent of bases sequenced that are erroneous [I think this is estimated from some spiked DNA from a genome whose sequence is known].
Duplication rate: the percent of reads that are duplicates [I'm not sure how this is estimated].
Genome covered: the bases of the assembly that are covered by mapped 'QC reads'.
Coverage depth: the estimated sequencing coverage of the assembly [I'm not sure exactly how this is estimated. I think it must be estimated from the regions of the assembly where there are 'QC reads' mapped.]
Histogram of GC content of reads
This plot shows a histogram of the GC content of the first reads in fragments (red), the second reads in fragments (green), and the reference genome. Presumably to calculate the histogram for the reference genome, regions of the same length as the reads (100-bp here) are sampled from the reference genome.
In this example, we see that the modal GC content of the RNA-seq reads is about 42.2%, while the mode for the genomic DNA is about 28%. This is not unusual, because the RNA-seq reads originate from the coding regions of the genome, so may have a different GC content than the whole genome.
This plot shows the number of bases of each type (A, red; C, black; G, green; T, blue) sequenced during each sequencing cycle (out of 100 sequencing cycles, in this case). In this example, there are slightly more As and Ts sequenced than Gs and Cs, at each point.
This plot shows the median (black line)), mean (red or green line) and interquartile range (grey area) of the qualities for forward and reverse reads (y-axes), versus the sequencing cycle (x-axis). I'm not sure if this is average base qualities in reads, or mapping quality of reads (?). I think it must be base qualities.
We see that for the forward reads, the quality is low for the first ten cycles, then increases, and then decreases again slightly for the last 10-15 cycles. This is also seen for the reverse reads.
This plot shows the count of insertions (red line) and deletions (black line) versus the sequencing cycle (on the x-axis). Presumably this was estimated for some spiked DNA, for which the sequence was already known.
We see that in this example the amount of insertions and deletions is fairly low for the first ten sequencing cycles, then increases, but stays steady until the last ten cycles, when there is a peak in indels.