7.ArchR的onto整合（3）

愿武艺晴小朋友一定得每天都开心

把scATAC_MPAL2 细胞投影到Healthy-scATAC上，

> ############ 1.读取Fragments信息文件 ###############
> #input文件路径，只需要样本的atac_fragments.tsv.gz文件
> input.file.list <-c("./Fragments/GSM4138900_scATAC_MPAL2_T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138901_scATAC_MPAL2_T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138888_scATAC_BMMC_D5T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138889_scATAC_BMMC_D6T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138890_scATAC_CD34_D7T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138891_scATAC_CD34_D8T1.fragments.tsv.gz", 
+                     "./Fragments/GSM4138892_scATAC_CD34_D9T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138893_scATAC_PBMC_D10T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138894_scATAC_PBMC_D11T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138895_scATAC_PBMC_D12T1.fragments.tsv.gz",
+                     "./Fragments/GSM4138896_scATAC_PBMC_D12T2.fragments.tsv.gz",
+                     "./Fragments/GSM4138897_scATAC_PBMC_D12T3.fragments.tsv.gz")

> #设置样本名
> sampleNames=c("MPAL2_T1","MPAL2_T2",
+               "BMMC_D5T1","BMMC_D6T1","CD34_D7T1","CD34_D8T1","CD34_D9T1",
+               "PBMC_D10T1","PBMC_D11T1","PBMC_D12T1","PBMC_D12T2","PBMC_D12T3")

> ############ 2.创建Arrow文件 ################
> ArrowFiles <- createArrowFiles(
+   inputFiles = input.file.list,
+   sampleNames = sampleNames,
+   minTSS = 9.3,   #这个参数不需要过高，后续可以调整
+   minFrags = 1100,  #flank=2000,norm=100
+   minFragSize = 10, # 默认genomeAnnotation = genomeAnnotation,
+   maxFragSize = 2000, # 默认
+   excludeChr = c("chrM","chrY"), #排除线粒体DNA和Y染色体的干扰
+   addTileMat = TRUE,force = TRUE,   #强制覆盖之前的Arrow文件
+   addGeneScoreMat = TRUE) #在当前目录下生成一个"QualityControl"目录
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-116d63d50ef56-Date-2024-05-22_Time-13-39-06.log
If there is an issue, please report to github with logFile!
2024-05-22 13:39:10 : Batch Execution w/ safelapply!, 0 mins elapsed.
ArchR logging successful to : ArchRLogs/ArchR-createArrows-116d63d50ef56-Date-2024-05-22_Time-13-39-06.log

> ArrowFiles #查看是否是一个存放Arrow文件路径的向量
 [1] "MPAL2_T1.arrow"   "BMMC_D6T1.arrow"  "MPAL2_T2.arrow"   "BMMC_D5T1.arrow"  "PBMC_D10T1.arrow"
 [6] "CD34_D7T1.arrow"  "CD34_D8T1.arrow"  "CD34_D9T1.arrow"  "PBMC_D11T1.arrow" "PBMC_D12T1.arrow"
[11] "PBMC_D12T2.arrow" "PBMC_D12T3.arrow"

> #在样本间评估双细胞分数for every single cell
> doubScores <- addDoubletScores(input = ArrowFiles,
+            k = 10, #或10 #Refers to how many cells near a "pseudo-doublet" to count.
+            knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search with doublet projection.
+            LSIMethod = 1)
ArchR logging to : ArchRLogs/ArchR-addDoubletScores-116d676df5c60-Date-2024-05-22_Time-14-41-13.log
If there is an issue, please report to github with logFile!
2024-05-22 14:41:23 : Batch Execution w/ safelapply!, 0 mins elapsed.
2024-05-22 14:41:23 : MPAL2_T1 (1 of 12) :  Computing Doublet Statistics, 0.002 mins elapsed.
Filtering 1 dims correlated > 0.75 to log10(depth + 1)
MPAL2_T1 (1 of 12) : UMAP Projection R^2 = 0.99534

************************************************************
2024-05-22 14:46:53 : ERROR Found in ggplot for MPAL2_T1 (1 of 12) :  
LogFile = ArchRLogs/ArchR-addDoubletScores-116d676df5c60-Date-2024-05-22_Time-14-41-13.log

<simpleError in g$grobs[[legend]]: no such index at level 2>

> projHeme1_Healthy_MPAL2 <- ArchRProject(ArrowFiles = ArrowFiles,
+                                         outputDirectory="MPAL-Hematopoiesis-MPAL2",
+                                         copyArrows=TRUE)

Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
Validating Arrows...
Getting SampleNames...

Copying ArrowFiles to Ouptut Directory! If you want to save disk space set copyArrows = FALSE
1 2 3 4 5 6 7 8 9 10 11 12 
Getting Cell Metadata...

Merging Cell Metadata...
Initializing ArchRProject...

> projHeme1_Healthy_MPAL2

           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    
class: ArchRProject 
outputDirectory: /home/u37189/Project/ATAC/greenleaf-healthy-ATAC/MPAL-Hematopoiesis-MPAL2 
samples(12): MPAL2_T1 BMMC_D6T1 ... PBMC_D12T2 PBMC_D12T3
sampleColData names(1): ArrowFiles
cellColData names(15): Sample TSSEnrichment ... DoubletEnrichment BlacklistRatio
numberOfCells(1): 46280
medianTSS(1): 16.243
medianFrags(1): 11895

> #ArchRProject在R中会使用多少内存
> paste0("Memory Size = ",round(object.size(projHeme1_Healthy_MPAL2) / 10^6, 3)," MB")
[1] "Memory Size = 44.446 MB"
> #使用ArchRProject过滤doublets
> projHeme2_Healthy_MPAL2 <- filterDoublets(projHeme1_Healthy_MPAL2,filterRatio = 2.16)  #filterRatio = 1 默认#filterRatio = 1.5 #让Matrix的版本=1.6.1.1
Filtering 4424 cells from ArchRProject!
	MPAL2_T1 : 496 of 4794 (10.3%)
	BMMC_D6T1 : 1872 of 12741 (14.7%)
	MPAL2_T2 : 333 of 3930 (8.5%)
	BMMC_D5T1 : 402 of 4317 (9.3%)
	PBMC_D10T1 : 191 of 2978 (6.4%)
	CD34_D7T1 : 226 of 3239 (7%)
	CD34_D8T1 : 275 of 3570 (7.7%)
	CD34_D9T1 : 344 of 3996 (8.6%)
	PBMC_D11T1 : 161 of 2735 (5.9%)
	PBMC_D12T1 : 15 of 846 (1.8%)
	PBMC_D12T2 : 34 of 1266 (2.7%)
	PBMC_D12T3 : 75 of 1868 (4%)

> #ArchRProject存放有哪些矩阵数据？
> getAvailableMatrices(projHeme2_Healthy_MPAL2)
[1] "GeneScoreMatrix" "TileMatrix"

> ## 4.ArchR的降维/addIterativeLSI()函数执行迭代LSI分析;降维;需要进行多次迭代-多次重复###
> ##4.1 进行细胞类型定义：无约束整合+约束整合 ##即借助scRNA-seq定义cluster的身份
> ##降维聚类
> projHeme2_Healthy_MPAL2 <- addIterativ

7.ArchR的onto整合（3）

悦读