愿武艺晴小朋友一定得每天都开心
把scATAC_MPAL2 细胞投影到Healthy-scATAC上,
> ############ 1.读取Fragments信息文件 ############### > #input文件路径,只需要样本的atac_fragments.tsv.gz文件 > input.file.list <-c("./Fragments/GSM4138900_scATAC_MPAL2_T1.fragments.tsv.gz", + "./Fragments/GSM4138901_scATAC_MPAL2_T2.fragments.tsv.gz", + "./Fragments/GSM4138888_scATAC_BMMC_D5T1.fragments.tsv.gz", + "./Fragments/GSM4138889_scATAC_BMMC_D6T1.fragments.tsv.gz", + "./Fragments/GSM4138890_scATAC_CD34_D7T1.fragments.tsv.gz", + "./Fragments/GSM4138891_scATAC_CD34_D8T1.fragments.tsv.gz", + "./Fragments/GSM4138892_scATAC_CD34_D9T1.fragments.tsv.gz", + "./Fragments/GSM4138893_scATAC_PBMC_D10T1.fragments.tsv.gz", + "./Fragments/GSM4138894_scATAC_PBMC_D11T1.fragments.tsv.gz", + "./Fragments/GSM4138895_scATAC_PBMC_D12T1.fragments.tsv.gz", + "./Fragments/GSM4138896_scATAC_PBMC_D12T2.fragments.tsv.gz", + "./Fragments/GSM4138897_scATAC_PBMC_D12T3.fragments.tsv.gz")
> #设置样本名 > sampleNames=c("MPAL2_T1","MPAL2_T2", + "BMMC_D5T1","BMMC_D6T1","CD34_D7T1","CD34_D8T1","CD34_D9T1", + "PBMC_D10T1","PBMC_D11T1","PBMC_D12T1","PBMC_D12T2","PBMC_D12T3")
> ############ 2.创建Arrow文件 ################ > ArrowFiles <- createArrowFiles( + inputFiles = input.file.list, + sampleNames = sampleNames, + minTSS = 9.3, #这个参数不需要过高,后续可以调整 + minFrags = 1100, #flank=2000,norm=100 + minFragSize = 10, # 默认genomeAnnotation = genomeAnnotation, + maxFragSize = 2000, # 默认 + excludeChr = c("chrM","chrY"), #排除线粒体DNA和Y染色体的干扰 + addTileMat = TRUE,force = TRUE, #强制覆盖之前的Arrow文件 + addGeneScoreMat = TRUE) #在当前目录下生成一个"QualityControl"目录 Using GeneAnnotation set by addArchRGenome(Hg19)! Using GeneAnnotation set by addArchRGenome(Hg19)! ArchR logging to : ArchRLogs/ArchR-createArrows-116d63d50ef56-Date-2024-05-22_Time-13-39-06.log If there is an issue, please report to github with logFile! 2024-05-22 13:39:10 : Batch Execution w/ safelapply!, 0 mins elapsed. ArchR logging successful to : ArchRLogs/ArchR-createArrows-116d63d50ef56-Date-2024-05-22_Time-13-39-06.log
> ArrowFiles #查看是否是一个存放Arrow文件路径的向量 [1] "MPAL2_T1.arrow" "BMMC_D6T1.arrow" "MPAL2_T2.arrow" "BMMC_D5T1.arrow" "PBMC_D10T1.arrow" [6] "CD34_D7T1.arrow" "CD34_D8T1.arrow" "CD34_D9T1.arrow" "PBMC_D11T1.arrow" "PBMC_D12T1.arrow" [11] "PBMC_D12T2.arrow" "PBMC_D12T3.arrow"
> #在样本间评估双细胞分数for every single cell > doubScores <- addDoubletScores(input = ArrowFiles, + k = 10, #或10 #Refers to how many cells near a "pseudo-doublet" to count. + knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search with doublet projection. + LSIMethod = 1) ArchR logging to : ArchRLogs/ArchR-addDoubletScores-116d676df5c60-Date-2024-05-22_Time-14-41-13.log If there is an issue, please report to github with logFile! 2024-05-22 14:41:23 : Batch Execution w/ safelapply!, 0 mins elapsed. 2024-05-22 14:41:23 : MPAL2_T1 (1 of 12) : Computing Doublet Statistics, 0.002 mins elapsed. Filtering 1 dims correlated > 0.75 to log10(depth + 1) MPAL2_T1 (1 of 12) : UMAP Projection R^2 = 0.99534 ************************************************************ 2024-05-22 14:46:53 : ERROR Found in ggplot for MPAL2_T1 (1 of 12) : LogFile = ArchRLogs/ArchR-addDoubletScores-116d676df5c60-Date-2024-05-22_Time-14-41-13.log <simpleError in g$grobs[[legend]]: no such index at level 2>
> projHeme1_Healthy_MPAL2 <- ArchRProject(ArrowFiles = ArrowFiles, + outputDirectory="MPAL-Hematopoiesis-MPAL2", + copyArrows=TRUE)
Using GeneAnnotation set by addArchRGenome(Hg19)! Using GeneAnnotation set by addArchRGenome(Hg19)! Validating Arrows... Getting SampleNames... Copying ArrowFiles to Ouptut Directory! If you want to save disk space set copyArrows = FALSE 1 2 3 4 5 6 7 8 9 10 11 12 Getting Cell Metadata... Merging Cell Metadata... Initializing ArchRProject...
> projHeme1_Healthy_MPAL2 ___ .______ ______ __ __ .______ / \ | _ \ / || | | | | _ \ / ^ \ | |_) | | ,----'| |__| | | |_) | / /_\ \ | / | | | __ | | / / _____ \ | |\ \\___ | `----.| | | | | |\ \\___. /__/ \__\ | _| `._____| \______||__| |__| | _| `._____| class: ArchRProject outputDirectory: /home/u37189/Project/ATAC/greenleaf-healthy-ATAC/MPAL-Hematopoiesis-MPAL2 samples(12): MPAL2_T1 BMMC_D6T1 ... PBMC_D12T2 PBMC_D12T3 sampleColData names(1): ArrowFiles cellColData names(15): Sample TSSEnrichment ... DoubletEnrichment BlacklistRatio numberOfCells(1): 46280 medianTSS(1): 16.243 medianFrags(1): 11895
> #ArchRProject在R中会使用多少内存 > paste0("Memory Size = ",round(object.size(projHeme1_Healthy_MPAL2) / 10^6, 3)," MB") [1] "Memory Size = 44.446 MB" > #使用ArchRProject过滤doublets > projHeme2_Healthy_MPAL2 <- filterDoublets(projHeme1_Healthy_MPAL2,filterRatio = 2.16) #filterRatio = 1 默认#filterRatio = 1.5 #让Matrix的版本=1.6.1.1 Filtering 4424 cells from ArchRProject! MPAL2_T1 : 496 of 4794 (10.3%) BMMC_D6T1 : 1872 of 12741 (14.7%) MPAL2_T2 : 333 of 3930 (8.5%) BMMC_D5T1 : 402 of 4317 (9.3%) PBMC_D10T1 : 191 of 2978 (6.4%) CD34_D7T1 : 226 of 3239 (7%) CD34_D8T1 : 275 of 3570 (7.7%) CD34_D9T1 : 344 of 3996 (8.6%) PBMC_D11T1 : 161 of 2735 (5.9%) PBMC_D12T1 : 15 of 846 (1.8%) PBMC_D12T2 : 34 of 1266 (2.7%) PBMC_D12T3 : 75 of 1868 (4%)
> #ArchRProject存放有哪些矩阵数据? > getAvailableMatrices(projHeme2_Healthy_MPAL2) [1] "GeneScoreMatrix" "TileMatrix"
> ## 4.ArchR的降维/addIterativeLSI()函数执行迭代LSI分析;降维;需要进行多次迭代-多次重复### > ##4.1 进行细胞类型定义:无约束整合+约束整合 ##即借助scRNA-seq定义cluster的身份 > ##降维聚类 > projHeme2_Healthy_MPAL2 <- addIterativ