Bootstrap

【应用多元统计分析】CH5 判别分析作业

目录

一、简答题

1.【习题5.6】

 【第一问】

【第二问】

【第三问】 

2.【习题5.8】

【代码】

【图】

二、计算题

1.

2.

3.


一、简答题

使用软件完成课后习题5.6以及5.8,要求将程序代码结果粘贴到word文件中,并解释结果。

1.【习题5.6】

 【第一问】

> library(MASS)
> d5.6=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.6.csv',header=1)
> mod1=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6)
> mod2=qda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6)
> pre=predict(mod1,xnew)
> pre2=predict(mod2,xnew)
> cbind(pre$class,pre2$class)
      [,1] [,2]
 [1,]    1    1
 [2,]    1    1
 [3,]    1    1
 [4,]    1    1
 [5,]    1    1
 [6,]    1    1
 [7,]    1    1
 [8,]    2    2
 [9,]    2    2
[10,]    1    1
[11,]    2    2
[12,]    2    2
[13,]    1    2
[14,]    2    2

        从上述结果可以看出,协方差矩阵相等时,判为一级的有1~7、10及13的运动员,判为健将级的有编号8、9、11、12和14的运动员;协方差矩阵不相等时,判为一级的有1~7及10的运动员,判为健将级的有编号8、9、11~14的运动员。

【第二问】

>mod11=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6,CV=1)
>mod22=qda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.5,0.5),d5.6,CV=1)
> Z1=predict(mod1)
> Z2=predict(mod2)
> table(d5.6[,'g'],Z1$class)
   
     1  2
  1 28  0
  2  0 25
> table(d5.6[,'g'],mod11$class)
   
     1  2
  1 28  0
  2  2 23
> table(d5.6[,'g'],Z2$class)
   
     1  2
  1 28  0
  2  0 25
> table(d5.6[,'g'],mod22$class)
   
     1  2
  1 28  0
  2  0 25

        由以上结果可以看出,若按回代法估计:\Sigma _{1}=\Sigma _{2}时,\hat{P}(2|1)=0,\hat{P}(1|2)=0\Sigma _{1}\neq \Sigma _{2}时,\hat{P}(2|1)=0,\hat{P}(1|2)=0。若按交叉验证法估计:\Sigma _{1}=\Sigma _{2}时,\hat{P}(2|1)=0,\hat{P}(1|2)=0.08\Sigma _{1}\neq \Sigma _{2}时,\hat{P}(2|1)=0,\hat{P}(1|2)=0。从中可以看出回代法的乐观估计。  

【第三问】 

> mod111=lda(d5.6[,'g']~x1+x2+x3+x4+x5+x6,prior=c(0.8,0.2),d5.6)
> pre3=predict(mod111,xnew)
> pre3$class
 [1] 1 1 1 1 1 1 1 1 2 1 2 2 1 2
Levels: 1 2

        从结果可以看出,判为一级的有编号1~8、10与13;判为健将级的有编号9、11、12和14的运动员。第三问判为一级的数量略多于第一问两种情况下判为一级的数量,从中可以看出先验概率对分类的影响。 

2.【习题5.8】

【代码】

> d5.8=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.csv',header=1)
>ldf=lda(d5.8[,'g']~x1+x2+x3+x4+x5+x6+x7+x8,d5.8,prior=c(1,1,1)/3)
> ldf
Call:
lda(d5.8[, "g"] ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = d5.8, 
    prior = c(1, 1, 1)/3)

Prior probabilities of groups:
        1         2         3 
0.3333333 0.3333333 0.3333333 

Group means:
        x1       x2       x3
1 110.5882 2.352941 1.235294
2 111.0000 2.600000 0.650000
3  90.0000 2.333333 1.333333
         x4       x5       x6
1 203.52941 1.294118 14.58824
2 185.50000 2.250000 15.25000
3  98.33333 1.116667 10.00000
        x7       x8
1 8.117647 85.00000
2 7.950000 91.75000
3 5.000000 58.33333

Coefficients of linear discriminants:
            LD1          LD2
x1 -0.008046317 -0.049972739
x2 -0.450346119  0.209579174
x3  0.687547192  0.615110916
x4 -0.001034259  0.005976727
x5 -1.052964843 -1.410665769
x6 -0.253084783  0.135622604
x7 -0.255654826  0.167722187
x8  0.021432623  0.034778776

Proportion of trace:
   LD1    LD2 
0.8106 0.1894 
> Z=predict(ldf)
> newg=Z$class
> table(d5.8[,'g'],newg)
   newg
     1  2  3
  1 15  2  0
  2  3 16  1
  3  3  0  3
> plot(Z$x)
> text(Z$x[,1],Z$x[,2],d5.8[,'g'],adj=-0.8,cex=0.8)

【图】

> install.packages('MorphoTools')
> library('MorphoTools2')
> dd5.8=read.morphodata('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.txt')
> dd5.8
ID Population Taxon  x1 x2 x3  x4  x5   x6 x7  x8
1   1          1     1 110  2  2 180 1.5 10.5 10  70
2   2          2     1 110  6  2 290 2.0 17.0  1 105
3   3          3     1 110  1  1 180 0.0 12.0 13  55
4   4          4     1 110  1  1 180 0.0 12.0 13  65
5   5          5     1 110  1  1 280 0.0 15.0  9  45
6   6          6     1 110  3  1 250 1.5 11.5 10  90
7   7          7     1 110  2  1 260 0.0 21.0  3  40
8   8          8     1 110  2  1 180 0.0 12.0 12  55
9   9          9     1 100  2  1 220 2.0 15.0  6  90
10 10         10     1 130  3  2 170 1.5 13.5 10 120
11 11         11     1 100  3  2 140 2.5 10.5  8 140
12 12         12     1 110  2  1 200 0.0 21.0  3  35
13 13         13     1 140  3  1 190 4.0 15.0 14 230
14 14         14     1 100  3  1 200 3.0 16.0  3 110
15 15         15     1 110  1  1 140 0.0 13.0 12  25
16 16         16     1 100  3  1 200 3.0 17.0  3 110
17 17         17     1 110  2  1 200 1.0 16.0  8  60
18 18         18     2  70  4  1 260 9.0  7.0  5 320
19 19         19     2 110  2  0 125 1.0 11.0 14  30
20 20         20     2 100  2  0 290 1.0 21.0  2  35
21 21         21     2 110  1  0  90 1.0 13.0 12  20
22 22         22     2 110  3  3 140 4.0 10.0  7 160
23 23         23     2 110  2  0 220 1.0 21.0  3  30
24 24         24     2 110  2  1 125 1.0 11.0 13  30
25 25         25     2 110  1  0 200 1.0 14.0 11  25
26 26         26     2 100  3  0   0 3.0 14.0  7 100
27 27         27     2 120  3  0 240 5.0 14.0 12 190
28 28         28     2 110  2  1 170 1.0 17.0  6  60
29 29         29     2 160  3  2 150 3.0 17.0 13 160
30 30         30     2 120  2  1 190 0.0 15.0  9  40
31 31         31     2 140  3  2 220 3.0 21.0  7 130
32 32         32     2  90  3  0 170 3.0 18.0  2  90
33 33         33     2 100  3  0 320 1.0 20.0  3  45
34 34         34     2 120  3  1 210 5.0 14.0 12 240
35 35         35     2 110  2  0 290 0.0 22.0  3  35
36 36         36     2 110  2  1  70 1.0  9.0 15  40
37 37         37     2 110  6  0 230 1.0 16.0  3  55
38 38         38     3 120  1  2 220 0.0 12.0 12  35
39 39         39     3 120  1  2 220 1.0 12.0 11  45
40 40         40     3 100  4  2 150 2.0 12.0  6  95
41 41         41     3  50  1  0   0 0.0 13.0  0  15
42 42         42     3  50  2  0   0 1.0 10.0  0  50
43 43         43     3 100  5  2   0 2.7  1.0  1 110
> can5.8=cda.calc(ddd5.8)
Error in methods::is(object, class) : object 'ddd5.8' not found
> can5.8=cda.calc(dd5.8)
> round(can5.8$coeffs.raw,3)
     Can1   Can2
x1  0.022  0.045
x2  0.369 -0.332
x3 -0.838 -0.386
x4 -0.001 -0.006
x5  1.420  1.040
x6  0.202 -0.204
x7  0.195 -0.235
x8 -0.031 -0.027
> can5.8
$objects
$objects$ID
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ... 43

$objects$Population
 [1] 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
43 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ... 43

$objects$Taxon
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
[43] 3
Levels: 1 2 3

$objects$scores
           Can1        Can2
1  -1.134861088  0.22533079
2  -0.549240640 -1.39740521
3  -1.447367516 -1.22291692
4  -1.754242201 -1.49258336
5  -1.391225868 -1.22534778
6  -0.393069476 -0.88498333
7  -0.811684454 -1.11391857
8  -1.273503886 -1.32001555
9  -0.325858125 -0.07855418
10 -1.239007535 -1.09847314
11 -2.076985740 -0.69765224
12 -0.612437530 -0.61804666
13  1.066206669 -1.99160156
14  0.481515914  0.71206730
15 -0.489249647 -0.14178266
16  0.683716023  0.50820335
17  0.005887126 -0.40946725
18  0.782506571  0.59701008
19  1.981924440  0.84481125
20  1.158118623  0.04798314
21  1.960319922  1.72037360
22 -1.470893125  0.72786655
23  1.783687528  0.82289780
24  0.949002686  0.69361808
25  1.729852441  0.95507859
26  2.155348784  2.03819784
27  3.473914121 -0.02121919
28 -0.159500004  0.03780099
29  1.642947880 -0.55377085
30 -0.576523471 -0.42718554
31  1.700569697 -0.47795269
32  1.941558980  1.19182159
33  1.190494699 -0.76605018
34  1.124769751 -1.57553166
35  0.358723945 -0.97696906
36  0.670211930  0.69201883
37  1.474303923 -0.22174188
38 -1.673638586 -0.62129360
39 -0.755477448  0.38430425
40 -1.131907730 -0.23189344
41 -2.921408598  1.45542780
42 -2.812678838  1.83073993
43 -3.314820150  4.08080476


$eigenvalues
     Can1      Can2 
1.5613432 0.4546081 

$eigenvaluesAsPercentages
     Can1      Can2 
0.7744945 0.2255055 

$cumulativePercentageOfEigenvalues
[1] 0.7744945 1.0000000

$groupMeans
  Taxon       Can1       Can2
1     1 -0.6624358 -0.7204204
2     2  1.1935670  0.2674529
3     3 -2.1016552  1.1496816

$rank
[1] 2

$coeffs.std
          Can1       Can2
x1  0.40085903  0.8148037
x2  0.45959593 -0.4138933
x3 -0.63439984 -0.2927091
x4 -0.05578159 -0.4396310
x5  2.51160836  1.8390483
x6  0.80300920 -0.8096169
x7  0.88224127 -1.0632588
x8 -2.04939155 -1.8009049

$coeffs.raw
           Can1         Can2
x1  0.022344104  0.045417606
x2  0.369109646 -0.332405063
x3 -0.837675738 -0.386499597
x4 -0.000763493 -0.006017311
x5  1.420281838  1.039957871
x6  0.202200109 -0.203863959
x7  0.195246015 -0.235306430
x8 -0.030687468 -0.026966644

$totalCanonicalStructure
         Can1        Can2
x1  0.3552393 -0.47710068
x2  0.1257746  0.06284426
x3 -0.4745516 -0.19823124
x4  0.2829126 -0.67174219
x5  0.3482493  0.13990051
x6  0.4326728 -0.42739067
x7  0.1946673 -0.31965137
x8  0.1881562 -0.14474942

$canrsq
[1] 0.6095799 0.3125296

attr(,"class")
[1] "cdadata"
> data5.8=read.csv('D:/个人成长/学业/课程/应用多元统计分析/上机/第五章作业/exec5.8.csv',header=1)
> colMeans(data5.8)
         ID  Population       Taxon          x1          x2          x3          x4 
 22.0000000  22.0000000   1.7441860 107.9069767   2.4651163   0.9767442 180.4651163 
         x5          x6          x7          x8 
  1.7139535  14.2558140   7.6046512  84.4186047 

    由以上结果可得两个费希尔判别函数为:

y_{1}=0.022(x_{1}-107.907)+0.369(x_{2}-2.465)-0.838(x_{3}-0.977)-0.001(x_{4}-180.465)+1.420(x_{5}-1.714)+0.202(x_{6}-14.256)+0.196(x_{7}-7.065)-0.031(x_{8}-84.419)

 y_{2}=0.045(x_{1}-107.907)-0.332(x_{2}-2.465)-0.386(x_{3}-0.977)-0.006(x_{4}-180.465)+1.04(x_{5}-1.714)-0.204(x_{6}-14.256)-0.235(x_{7}-7.065)-0.027(x_{8}-84.419)

二、计算题

1.

2.

3.


;