目录
一、简答题
使用软件完成课后习题5.6以及5.8,要求将程序代码结果粘贴到word文件中,并解释结果。
1.【习题5.6】
【第一问】
> library(MASS)
> d5.6=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.6.csv',header=1)
> mod1=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6)
> mod2=qda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6)
> pre=predict(mod1,xnew)
> pre2=predict(mod2,xnew)
> cbind(pre$class,pre2$class)
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 2 2
[9,] 2 2
[10,] 1 1
[11,] 2 2
[12,] 2 2
[13,] 1 2
[14,] 2 2
从上述结果可以看出,协方差矩阵相等时,判为一级的有1~7、10及13的运动员,判为健将级的有编号8、9、11、12和14的运动员;协方差矩阵不相等时,判为一级的有1~7及10的运动员,判为健将级的有编号8、9、11~14的运动员。
【第二问】
>mod11=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6,CV=1)
>mod22=qda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6,CV=1)
> Z1=predict(mod1)
> Z2=predict(mod2)
> table(d5.6[,'g'],Z1$class)
1 2
1 28 0
2 0 25
> table(d5.6[,'g'],mod11$class)
1 2
1 28 0
2 2 23
> table(d5.6[,'g'],Z2$class)
1 2
1 28 0
2 0 25
> table(d5.6[,'g'],mod22$class)
1 2
1 28 0
2 0 25
由以上结果可以看出,若按回代法估计:时,;时,。若按交叉验证法估计:时,;时,。从中可以看出回代法的乐观估计。
【第三问】
> mod111=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.8,0.2),d5.6)
> pre3=predict(mod111,xnew)
> pre3$class
[1] 1 1 1 1 1 1 1 1 2 1 2 2 1 2
Levels: 1 2
从结果可以看出,判为一级的有编号1~8、10与13;判为健将级的有编号9、11、12和14的运动员。第三问判为一级的数量略多于第一问两种情况下判为一级的数量,从中可以看出先验概率对分类的影响。
2.【习题5.8】
【代码】
> d5.8=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.csv',header=1)
>ldf=lda(d5.8[,'g']~x1+x2+x3+x4+x5+x6+x7+x8,d5.8,prior=c(1,1,1)/3)
> ldf
Call:
lda(d5.8[, "g"] ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = d5.8,
prior = c(1, 1, 1)/3)
Prior probabilities of groups:
1 2 3
0.3333333 0.3333333 0.3333333
Group means:
x1 x2 x3
1 110.5882 2.352941 1.235294
2 111.0000 2.600000 0.650000
3 90.0000 2.333333 1.333333
x4 x5 x6
1 203.52941 1.294118 14.58824
2 185.50000 2.250000 15.25000
3 98.33333 1.116667 10.00000
x7 x8
1 8.117647 85.00000
2 7.950000 91.75000
3 5.000000 58.33333
Coefficients of linear discriminants:
LD1 LD2
x1 -0.008046317 -0.049972739
x2 -0.450346119 0.209579174
x3 0.687547192 0.615110916
x4 -0.001034259 0.005976727
x5 -1.052964843 -1.410665769
x6 -0.253084783 0.135622604
x7 -0.255654826 0.167722187
x8 0.021432623 0.034778776
Proportion of trace:
LD1 LD2
0.8106 0.1894
> Z=predict(ldf)
> newg=Z$class
> table(d5.8[,'g'],newg)
newg
1 2 3
1 15 2 0
2 3 16 1
3 3 0 3
> plot(Z$x)
> text(Z$x[,1],Z$x[,2],d5.8[,'g'],adj=-0.8,cex=0.8)
【图】
> install.packages('MorphoTools')
> library('MorphoTools2')
> dd5.8=read.morphodata('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.txt')
> dd5.8
ID Population Taxon x1 x2 x3 x4 x5 x6 x7 x8
1 1 1 1 110 2 2 180 1.5 10.5 10 70
2 2 2 1 110 6 2 290 2.0 17.0 1 105
3 3 3 1 110 1 1 180 0.0 12.0 13 55
4 4 4 1 110 1 1 180 0.0 12.0 13 65
5 5 5 1 110 1 1 280 0.0 15.0 9 45
6 6 6 1 110 3 1 250 1.5 11.5 10 90
7 7 7 1 110 2 1 260 0.0 21.0 3 40
8 8 8 1 110 2 1 180 0.0 12.0 12 55
9 9 9 1 100 2 1 220 2.0 15.0 6 90
10 10 10 1 130 3 2 170 1.5 13.5 10 120
11 11 11 1 100 3 2 140 2.5 10.5 8 140
12 12 12 1 110 2 1 200 0.0 21.0 3 35
13 13 13 1 140 3 1 190 4.0 15.0 14 230
14 14 14 1 100 3 1 200 3.0 16.0 3 110
15 15 15 1 110 1 1 140 0.0 13.0 12 25
16 16 16 1 100 3 1 200 3.0 17.0 3 110
17 17 17 1 110 2 1 200 1.0 16.0 8 60
18 18 18 2 70 4 1 260 9.0 7.0 5 320
19 19 19 2 110 2 0 125 1.0 11.0 14 30
20 20 20 2 100 2 0 290 1.0 21.0 2 35
21 21 21 2 110 1 0 90 1.0 13.0 12 20
22 22 22 2 110 3 3 140 4.0 10.0 7 160
23 23 23 2 110 2 0 220 1.0 21.0 3 30
24 24 24 2 110 2 1 125 1.0 11.0 13 30
25 25 25 2 110 1 0 200 1.0 14.0 11 25
26 26 26 2 100 3 0 0 3.0 14.0 7 100
27 27 27 2 120 3 0 240 5.0 14.0 12 190
28 28 28 2 110 2 1 170 1.0 17.0 6 60
29 29 29 2 160 3 2 150 3.0 17.0 13 160
30 30 30 2 120 2 1 190 0.0 15.0 9 40
31 31 31 2 140 3 2 220 3.0 21.0 7 130
32 32 32 2 90 3 0 170 3.0 18.0 2 90
33 33 33 2 100 3 0 320 1.0 20.0 3 45
34 34 34 2 120 3 1 210 5.0 14.0 12 240
35 35 35 2 110 2 0 290 0.0 22.0 3 35
36 36 36 2 110 2 1 70 1.0 9.0 15 40
37 37 37 2 110 6 0 230 1.0 16.0 3 55
38 38 38 3 120 1 2 220 0.0 12.0 12 35
39 39 39 3 120 1 2 220 1.0 12.0 11 45
40 40 40 3 100 4 2 150 2.0 12.0 6 95
41 41 41 3 50 1 0 0 0.0 13.0 0 15
42 42 42 3 50 2 0 0 1.0 10.0 0 50
43 43 43 3 100 5 2 0 2.7 1.0 1 110
> can5.8=cda.calc(ddd5.8)
Error in methods::is(object, class) : object 'ddd5.8' not found
> can5.8=cda.calc(dd5.8)
> round(can5.8$coeffs.raw,3)
Can1 Can2
x1 0.022 0.045
x2 0.369 -0.332
x3 -0.838 -0.386
x4 -0.001 -0.006
x5 1.420 1.040
x6 0.202 -0.204
x7 0.195 -0.235
x8 -0.031 -0.027
> can5.8
$objects
$objects$ID
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ... 43
$objects$Population
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ... 43
$objects$Taxon
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
[43] 3
Levels: 1 2 3
$objects$scores
Can1 Can2
1 -1.134861088 0.22533079
2 -0.549240640 -1.39740521
3 -1.447367516 -1.22291692
4 -1.754242201 -1.49258336
5 -1.391225868 -1.22534778
6 -0.393069476 -0.88498333
7 -0.811684454 -1.11391857
8 -1.273503886 -1.32001555
9 -0.325858125 -0.07855418
10 -1.239007535 -1.09847314
11 -2.076985740 -0.69765224
12 -0.612437530 -0.61804666
13 1.066206669 -1.99160156
14 0.481515914 0.71206730
15 -0.489249647 -0.14178266
16 0.683716023 0.50820335
17 0.005887126 -0.40946725
18 0.782506571 0.59701008
19 1.981924440 0.84481125
20 1.158118623 0.04798314
21 1.960319922 1.72037360
22 -1.470893125 0.72786655
23 1.783687528 0.82289780
24 0.949002686 0.69361808
25 1.729852441 0.95507859
26 2.155348784 2.03819784
27 3.473914121 -0.02121919
28 -0.159500004 0.03780099
29 1.642947880 -0.55377085
30 -0.576523471 -0.42718554
31 1.700569697 -0.47795269
32 1.941558980 1.19182159
33 1.190494699 -0.76605018
34 1.124769751 -1.57553166
35 0.358723945 -0.97696906
36 0.670211930 0.69201883
37 1.474303923 -0.22174188
38 -1.673638586 -0.62129360
39 -0.755477448 0.38430425
40 -1.131907730 -0.23189344
41 -2.921408598 1.45542780
42 -2.812678838 1.83073993
43 -3.314820150 4.08080476
$eigenvalues
Can1 Can2
1.5613432 0.4546081
$eigenvaluesAsPercentages
Can1 Can2
0.7744945 0.2255055
$cumulativePercentageOfEigenvalues
[1] 0.7744945 1.0000000
$groupMeans
Taxon Can1 Can2
1 1 -0.6624358 -0.7204204
2 2 1.1935670 0.2674529
3 3 -2.1016552 1.1496816
$rank
[1] 2
$coeffs.std
Can1 Can2
x1 0.40085903 0.8148037
x2 0.45959593 -0.4138933
x3 -0.63439984 -0.2927091
x4 -0.05578159 -0.4396310
x5 2.51160836 1.8390483
x6 0.80300920 -0.8096169
x7 0.88224127 -1.0632588
x8 -2.04939155 -1.8009049
$coeffs.raw
Can1 Can2
x1 0.022344104 0.045417606
x2 0.369109646 -0.332405063
x3 -0.837675738 -0.386499597
x4 -0.000763493 -0.006017311
x5 1.420281838 1.039957871
x6 0.202200109 -0.203863959
x7 0.195246015 -0.235306430
x8 -0.030687468 -0.026966644
$totalCanonicalStructure
Can1 Can2
x1 0.3552393 -0.47710068
x2 0.1257746 0.06284426
x3 -0.4745516 -0.19823124
x4 0.2829126 -0.67174219
x5 0.3482493 0.13990051
x6 0.4326728 -0.42739067
x7 0.1946673 -0.31965137
x8 0.1881562 -0.14474942
$canrsq
[1] 0.6095799 0.3125296
attr(,"class")
[1] "cdadata"
> data5.8=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.csv',header=1)
> colMeans(data5.8)
ID Population Taxon x1 x2 x3 x4
22.0000000 22.0000000 1.7441860 107.9069767 2.4651163 0.9767442 180.4651163
x5 x6 x7 x8
1.7139535 14.2558140 7.6046512 84.4186047
由以上结果可得两个费希尔判别函数为: