RNA-Seq—Hisat2 标准输出中比对率信息解读
本文具体解释部分(一)中内容复制自Biostar内容,后面附上我实际的例子,二者略有不同,整体理解上没大问题,有不合适之处可联系我。
(一)Biostar上解释的例子
HISAT2是速度、敏感性和比对率方面比较优秀的比对软件,通常用于比对二代测序产生的RNA-Seq数据。在实际使用时需要统计HISAT2需要对结果中的比对率统计,HISAT2完成比对后标准输出如下:
HISAT2 summary stats:
Total pairs: 11587225
Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
Aligned concordantly 1 time: 2195620 (18.95%)
Aligned concordantly >1 times: 4877336 (42.09%)
Aligned discordantly 1 time: 50186 (0.43%)
Total unpaired reads: 8928166
Aligned 0 time: 8019048 (89.82%)
Aligned 1 time: 304653 (3.41%)
Aligned >1 times: 604465 (6.77%)
Overall alignment rate: 65.40%
具体解释如下:
1. Total pairs: 11587225
Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)
2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
These are unmapped reads : 4464083 * 2 (paired end) = 8928166
( 8928166 / 23174450 (Total reads) ) * 100 ~ 38.53%
3. Aligned concordantly 1 time: 2195620 (18.95%)
These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240
( 4391240 / 23174450 (Total reads) ) * 100 ~ 18.95%
4. Aligned concordantly >1 times: 4877336 (42.09%)
These are multi mapped reads : 4877336 * 2 = 9754672
( 9754672 / 23174450 (Total reads) ) * 100 ~ 42.09%
5.Aligned discordantly 1 time: 50186 (0.43%)
Discordant aligned : 50186 * 2 = 100372
( 100372 / 23174450 (Total reads) ) * 100 ~ 0.43%
6. Total unpaired reads: 8928166
These are not paired reads
Aligned 0 time: 8019048 (89.82%)
(8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all
Aligned 1 time: 304653 (3.41%)
(304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once
Aligned >1 times: 604465 (6.77%)
(604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped
7. Overall alignment rate: 65.40%
Calculation as explained below
Paired Reads
Aligned concordantly 1 time: (2195620 * 2 = 4391240)
Aligned concordantly >1 times: (4877336 * 2 = 9754672)
Aligned discordantly 1 time: (50186 * 2 = 100372)
Unpaired Reads
Aligned 1 time: 304653
Aligned >1 times: 604465
Total = 4391240 + 9754672 + 100372 + 304653 + 604465 = 15155402
Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%
(二)实际运行的结果
基于上面的解释,以下面的结果为例,二者略有不同,可能是版本的问题:
目前我使用的版本是hisat2 version 2.1.0
(1)双端测序数据比对
23116294 reads; of these:
23116294 (100.00%) were paired; of these:
2117146 (9.16%) aligned concordantly 0 times
18275745 (79.06%) aligned concordantly exactly 1 time
2723403 (11.78%) aligned concordantly >1 times
----
2117146 pairs aligned concordantly 0 times; of these:
449895 (21.25%) aligned discordantly 1 time
----
1667251 pairs aligned 0 times concordantly or discordantly; of these:
3334502 mates make up the pairs; of these:
2176277 (65.27%) aligned 0 times
889188 (26.67%) aligned exactly 1 time
269037 (8.07%) aligned >1 times
95.29% overall alignment rate
我们计算总的比对率0.9529276,与上面结果一致:
Overall alignment rate比例为
(18275745 * 2 + 2723403 * 2 + 449895 * 2 + 889188 + 269037 )/(23116294 * 2) = 0.9529276
Unique mapping 比例为
(18275745 * 2 + 449895 * 2 + 889188)/(23116294 * 2)
[1] 0.8292953
Multiple mapping比例为
(2723403 * 2 + 269037)/(23116294 * 2)
[1] 0.1236323
(2)单端测序数据比对
14479369 reads; of these:
14479369 (100.00%) were unpaired; of these:
4838655 (33.42%) aligned 0 times
8260600 (57.05%) aligned exactly 1 time
1380114 (9.53%) aligned >1 times
66.58% overall alignment rate
关于上面的Concordant 解释如下(bowtie2 官网):
Concordant pairs match pair expectations, discordant pairs don’t
A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align “concordantly”. If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”. Discordant alignments may be of particular interest, for instance, when seeking structural variants.
The expected relative orientation of the mates is set using the --ff, --fr, or --rf options. The expected range of inter-mates distances (as measured from the furthest extremes of the mates; also called “outer distance”) is set with the -I and -X options. Note that setting -I and -X far apart makes Bowtie 2 slower. See documentation for -I and -X.
To declare that a pair aligns discordantly, Bowtie 2 requires that both mates align uniquely. This is a conservative threshold, but this is often desirable when seeking structural variants.
By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordantoption.
此外,还得考虑的是如何从SAM/BAM结果文件中将unique mapping reads提取出来,以便于后续分析,具体可以参考链接2。
有一个问题是,看到有的文章是使用concordant unique alignments for paired-end reads进行表达分析,这样的比对数据在我的数据集是19%,不知道这部分数据的实际定量效果如何,目前还没有和全部的数据进行比较。之前我计算表达量都未按SAM tag去提取。
参考:
- https://www.biostars.org/p/395017/
- https://www.jianshu.com/p/bc3751b72f0c
- http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#concordant-pairs-match-pair-expectations-discordant-pairs-dont
- https://www.cnblogs.com/leezx/p/8540862.html