✅ 操作成功!

eof分析

发布时间:2023-06-05 作者:admin 来源:文学

eof分析

eof分析

skdj指标-厉行节约

2023年2月18日发(作者:气瓶充装许可证)

5.4主成分聚类与主成分回归

5.4.1变量聚类与样品分类

主成分分析可用于聚类:变量聚类与样品聚类。

变量聚类:由主成分系数的差异,可将变量聚类。例如例5.5中第2主成分中murder,rape,

assult系数为负的,burglary,larceny,auto系数是正的。按系数正负可把7个变量分为两

类:murder,rape,assult属于暴力程度严重的一类;burglary,larceny,auto属于暴力程

度较轻的一类。按照这种方法,根据主成分系数的正负可以将变量聚类。

样品聚类:如果2个主成分能很好的概括随机向量的信息,计算每个样品的这两个主成

分得分,把他们的散点图画出来,就能从图上将样品分类。

例5.5(续2)按照第一、第二主成分得分,画出散点图

datacrime;/*建立数据集crime*/

inputstate$1-15murderraperobberyassultburglarylarcenyauto;

/*建立变量statemurderraperobberyassultburglarylarceny

auto。state$1-15表示前15列存州名。murderraperobberyassultburglarylarcenyauto

表7种罪的犯罪率*/

cards;/*以下为数据体*/

Albama14.225.296.8278.31135.51881.9280.7

Alaska10.851.696.8284.01331.73369.8753.3

Arirona9.534.2138.2312.32346.14467.4439.5

Arkansas8.834.2138.2312.32346.14467.4439.5

Califonia11.549.4287.0358.02139.43499.8663.5

Colorado6.342.0170.7292.91935.23903.2477.1

Conecticat4.216.8129.5131.81346.02620.7593.2

Delaware6.024.9157.0194.21682.63678.4467.0

Florida10.239.6187.9449.11859.93840.5351.4

Geogia11.731.1140.5256.51351.12170.2297.9

Hawaii7.225.5128.064.11911.53920.4489.4

Idaho5.519.439.6172.51050.82599.6237.6

Illinois9.921.8211.3209.01085.02828.5528.6

Indiana7.426.5123.2153.51086.22498.7377.4

Iowa2.310.641.289.8812.52685.1219.9

Kansas6.622.0100.7180.51270.42739.3244.3

Kentaky10.119.181.1123.3872.21662.1245.4

Loisana15.530.9142.9335.51165.52469.9337.7

Maine2.413.538.7170.01253.12350.7246.9

Maryland8.034.8292.1358.91400.03177.7428.5

Masschusetts3.120.8169.1231.61532.22311.31140.1

Michigan9.338.9261.9274.61522.73159.0545.5

Minnesota2.719.585.985.81134.72559.3343.1

Mississippi14.319.665.7189.1915.61239.9144.4

Missouri9.628.3189.0233.51318.32424.2378.4

Montana5.416.739.2156.8804.92773.2309.3

Nebraska3.918.164.7112.7760.02316.1249.1

Nevada15.849.1323.1355.02453.14212.6559.2

MewHampashare3.210.723.276.01041.72343.9293.4

NewJersey5.621.0180.4185.11435.82774.5511.5

NewMaxico8.839.1109.6343.41418.73008.6259.5

NewYork10.729.4472.6319.11728.02782.0745.8

NorthCarolina10.617.061.3318.31154.12037.8192.1

NorthDakoda100.99.013.343.8446.11843.0144.7

Ohio7.827.3190.5181.11216.02696.8400.4

Oklahoma8.629.273.8205.01288.22228.1326.8

Oregan4.939.9124.1286.91636.43506.1388.9

Pennsyvania5.619.0130.3128.0877.51624.1333.2

RhodeIsland3.610.586.5201.01849.52844.1791.4

SouthCarolina11.933.0105.9485.31613.62342.4245.1

SouthDakoda2.013.517.9155.7570.51704.4147.5

Tennessee10.129.7145.8203.91259.71776.5314.0

Texas13.333.8152.4208.21603.12988.7397.6

Utah3.520.368.8147.31171.63004.6334.5

Vermont1.415.930.8101.21348.22201.0265.2

Virginia9.023.392.1165.7986.22521.2226.7

Wasinton4.339.6106.2224.81605.63386.9360.3

WestViginia6.013.242.290.9597.41341.7163.3

Wiskonsin2.812.952.263.7846.92614.2220.7

Wyoming5.421.939.7173.9811.62772.2282.0

;

procprincompout=crimprinn=2;

varmurderraperobberyassultburglarylarcenyauto;

run;

PROCPLOTdata=crimprin;

PLOTPRIN2*PRIN1=STATE/VPOS=31;

TITLE2‘PLOTOFTHEFIRSTTWOPRINCIPALCOMPONENTS’;

RUN;

例5.7(气温分析)本例的输入资料文件(TEMPERAT)是美国六十四个城市一月与七

月的平均日温。

DATATEMPERAT;

TITLE2'MEANTEMPERATUREINJANUARYANDJULYFORSELECTEDCITIES';

INPUTCITY$1-15JANUARYJULY;

CARDS;

MOBILE51.281.6

PHOENIX51.291.2

LITTLEROCK39.581.4

SACRAMENTO45.175.2

DENVER29.973.0

HARTFORD24.872.7

WILMINGTON32.075.8

WASHINGTONDC35.678.7

JACKSONVILLE54.681.0

MIAMI67.282.3

ATLANTA42.478.0

BOISE29.074.5

CHICAGO22.971.9

PEORIA23.875.1

INDIANAPOLIS27.975.0

DESMOINES19.475.1

WICHITA31.380.7

LOUISVILLE33.376.9

NEWORLEANS52.981.9

PORTLAND,MAINE21.568.0

BALTIMORE33.476.6

BOSTON29.273.3

DETROIT25.573.3

SAULTSTEMARIE14.263.8

DULUTH8.565.6

MINNEAPOLIS12.271.9

JACKSON47.181.7

KANSASCITY27.878.8

STLOUIS31.378.6

GREATFALLS20.569.3

OMAHA22.677.2

RENO31.969.3

CONCORD20.669.7

ATLANTICCITY32.775.1

ALBUQUERQUE35.278.7

ALBANY21.572.0

BUFFALO23.770.1

NEWYORK32.276.6

CHARLOTTE42.178.5

RALEIGH40.577.5

BISMARCK8.270.8

CINCINNATI31.175.6

CLEVELAND26.971.4

COLUMBUS28.473.6

OKLAHOMACITY36.881.5

PORTLAND,OREG38.167.1

PHILADELPHIA32.376.8

PITTSBURGH28.171.9

PROVIDENCE28.472.1

COLUMBIA45.481.2

SIOUXFALLS14.273.3

MEMPHIS40.579.6

NASHVILLE38.379.6

DALLAS44.884.8

ELPASO43.682.3

HOUSTON52.183.3

SALTLAKECITY28.076.7

BURLINGTON16.869.8

NORFOLK40.578.3

RICHMOND37.577.9

SPOKANE25.469.7

CHARLESTON,WV34.575.0

MILWAUKEE19.469.9

CHEYENNE26.669.1

;

PROCPLOT;

PLOTJULY*JANUARY=CITY/VPOS=36;

PROCPRINCOMPCOVOUT=PRIN;

VARJULYJANUARY;

PROCPLOT;

PLOTPRIN2*PRIN1=CITY/VPOS=26;

TITLE3'PLOTOFPRINCIPALCOMPONENTS';

Run;

例5.8美国大学生篮球队排名

databballm;

labelcsn='CommunitySportsNews(ChapelHillNC)'

dursun='DurhamSun'

durher='DurhamMorningHerald'

waspost='WashingtonPost'

usatoda='USAToday'

spormag='SportMagazine'

insport='InsideSports'

upi='UnitedPressInternational'

ap='AssociatedPress'

sporill='SportsIllustrated'

;

title1'Pre-Season1985CollegeBasketballRankings';

inputschool$sundurherwaspostusatoda

spormaginsportupiapsporill;

formatcsn--sporill5.1;

cards;

Louisville

GeorgiaTech2243111211

Kansas34515118457

Michigan4594253132

Duke56754104565

UNC6122342323

Syracuse71

NotreDame881312.

Kentucky91121113

LSU10913.1315169148

DePaul11.211520.19..19

Georgetown

Navy132.20.

Illinois6

Iowa1516..23..14.20

Arkansas16...25....16

MemphisState17.11.16820.1512

Washington18......17..

UAB191310.1217.161615

UNLV2018181922.141818.

NCState2117141615.12151718

Maryland22...19...1914

Pittsburg23.........

Oklahoma247.1317

Indiana2512201821.....

Virginia26.22..18....

OldDominion27.........

Auburn281011

29....14....

UCLA30......19..

's..19.......

Tennessee..24..16....

Montana...20......

Houston....24.....

VirginiaTech......13...

;

procprincompdata=bballn=1out=pcbballstandard;

varcsn--sporill;

weightweight;

procsortdata=pcbball;

byprin1;

procprint;

varschoolprin1;

title2'CollegeTeamsasOrderedbyPRINCOMP';

run;

例5.955个地区或国家的赛跑纪录如表5-7,试作主成分分析,并将55个国家或地区按赛

跑成绩分类。

表5-755个地区或国家的赛跑纪录

序号国家或

地区

100m

(秒)

200m

(秒)

400m

(秒)

800m

(分)

1500m

(分)

5000m

(分)

10000m

(分)

马拉松

(分)

1argentin10.3920.8146.841.813.7014.0429.36137.72

2australi10.3120.0644.841.743.5713.2827.66128.30

3austria10.4420.8146.821.793.6013.2627.72135.90

4belgium10.3420.6845.041.733.6013.2227.45129.95

5bermuda10.2820.5845.911.803.7514.6830.55146.62

6brazil10.2220.4345.211.733.6613.6228.62133.13

7burma10.6421.5248.301.803.8514.4530.28139.95

8canada10.1720.2245.681.763.6313.5528.09130.15

9chile10.3420.8046.201.793.7113.6129.30134.03

10china10.5121.0447.301.813.7313.9029.13133.53

11columbia10.4321.0546.101.823.7413.4927.88131.35

12cookis12.1823.2052.942.024.2416.7035.38164.70

13costa10.9421.9048.661.873.8414.0328.81136.58

14czech10.3520.6545.641.763.5813.4228.19134.32

15denmark10.5620.5245.891.783.6113.5028.11130.78

16domrep10.1420.6546.801.823.8214.9131.45154.12

17finland10.4320.6945.491.743.6113.2727.52130.87

18france10.1120.3845.281.733.5713.3427.97132.30

19gdr10.1220.3344.871.733.5613.1727.42129.92

20frg10.1620.3744.501.733.5313.2127.61132.23

21gbni10.1120.2144.931.703.5113.0127.51129.13

22greece10.2220.7146.561.783.6414.5928.45134.60

23guatemal10.9821.8248.401.893.8014.1630.11139.33

24hungary10.2620.6246.021.773.6213.4928.44132.58

25india10.6021.4245.731.763.7313.7728.81131.98

26indonesi10.5921.4947.801.843.9214.7330.79148.83

27ireland10.6120.9646.301.793.5613.3227.81132.35

28israel10.7121.0047.801.773.7213.6628.93137.55

29italy10.0119.7245.261.733.6013.2327.52131.08

30japan10.3420.8145.861.793.6413.4127.72128.63

31kenya10.4620.6644.921.733.5513.1027.38129.75

32korea10.3420.8946.901.793.7713.9629.23136.25

33dprkorea10.9121.9447.301.853.7714.1329.67130.87

34luxembou10.3520.7747.401.823.6713.6429.08141.27

35malaysia10.4020.9246.301.823.8014.6431.01154.10

36mauritiu11.1922.4547.701.883.8315.0631.77152.23

37mexico10.4221.3046.101.803.6513.4627.95129.20

38netherla10.5220.9545.101.743.6213.3627.61129.02

39nz10.5120.8846.101.743.5413.2127.70128.98

40norway10.5521.1646.711.763.6213.3427.69131.48

41png10.9621.7847.901.904.0114.7231.36148.22

42philippi10.7821.6446.241.813.8314.7430.64145.27

43poland10.1620.2445.361.763.6013.2927.89131.58

44portugal10.5321.1746.701.793.6213.1327.38128.65

45rumania10.4120.9845.871.763.6413.2527.67132.50

46singapor10.3821.2847.401.883.8915.1131.32157.77

47spain10.4220.7745.981.763.5513.3127.73131.57

48sweden10.2520.6145.631.773.6113.2927.94130.63

49switzerl10.3720.4645.781.783.5513.2227.91131.20

50taipei10.5921.2946.801.793.7714.0730.07139.27

51thailand10.3921.0947.911.833.8415.2332.56149.90

52turkey10.7121.4347.601.793.6713.5628.58131.50

53usa9.9319.7543.861.733.5313.2027.43128.22

54ussr10.0720.0044.601.753.5913.2027.53130.55

55wsamoa10.8221.8649.002.024.2416.2834.71161.83

可用下列SAS程序作主成分分析,并将第1,2主成分画散点图

datarunrecod;

inputcountry$x1-x8;

cards;

argentin10.3920.8146.841.813.7014.0429.36137.72

australi10.3120.0644.841.743.5713.2827.66128.30

austria10.4420.8146.821.793.6013.2627.72135.90

belgium10.3420.6845.041.733.6013.2227.45129.95

bermuda10.2820.5845.911.803.7514.6830.55146.62

brazil10.2220.4345.211.733.6613.6228.62133.13

burma10.6421.5248.301.803.8514.4530.28139.95

canada10.1720.2245.681.763.6313.5528.09130.15

chile10.3420.8046.201.793.7113.6129.30134.03

china10.5121.0447.301.813.7313.9029.13133.53

columbia10.4321.0546.101.823.7413.4927.88131.35

cookis12.1823.2052.942.024.2416.7035.38164.70

costa10.9421.9048.661.873.8414.0328.81136.58

czech10.3520.6545.641.763.5813.4228.19134.32

denmark10.5620.5245.891.783.6113.5028.11130.78

domrep10.1420.6546.801.823.8214.9131.45154.12

finland10.4320.6945.491.743.6113.2727.52130.87

france10.1120.3845.281.733.5713.3427.97132.30

gdr10.1220.3344.871.733.5613.1727.42129.92

frg10.1620.3744.501.733.5313.2127.61132.23

gbni10.1120.2144.931.703.5113.0127.51129.13

greece10.2220.7146.561.783.6414.5928.45134.60

guatemal10.9821.8248.401.893.8014.1630.11139.33

hungary10.2620.6246.021.773.6213.4928.44132.58

india10.6021.4245.731.763.7313.7728.81131.98

indonesi10.5921.4947.801.843.9214.7330.79148.83

ireland10.6120.9646.301.793.5613.3227.81132.35

israel10.7121.0047.801.773.7213.6628.93137.55

italy10.0119.7245.261.733.6013.2327.52131.08

japan10.3420.8145.861.793.6413.4127.72128.63

kenya10.4620.6644.921.733.5513.1027.38129.75

korea10.3420.8946.901.793.7713.9629.23136.25

dprkorea10.9121.9447.301.853.7714.1329.67130.87

luxembou10.3520.7747.401.823.6713.6429.08141.27

malaysia10.4020.9246.301.823.8014.6431.01154.10

mauritiu11.1922.4547.701.883.8315.0631.77152.23

mexico10.4221.3046.101.803.6513.4627.95129.20

netherla10.5220.9545.101.743.6213.3627.61129.02

nz10.5120.8846.101.743.5413.2127.70128.98

norway10.5521.1646.711.763.6213.3427.69131.48

png10.9621.7847.901.904.0114.7231.36148.22

philippi10.7821.6446.241.813.8314.7430.64145.27

poland10.1620.2445.361.763.6013.2927.89131.58

portugal10.5321.1746.701.793.6213.1327.38128.65

rumania10.4120.9845.871.763.6413.2527.67132.50

singapor10.3821.2847.401.883.8915.1131.32157.77

spain10.4220.7745.981.763.5513.3127.73131.57

sweden10.2520.6145.631.773.6113.2927.94130.63

switzerl10.3720.4645.781.783.5513.2227.91131.20

taipei10.5921.2946.801.793.7714.0730.07139.27

thailand10.3921.0947.911.833.8415.2332.56149.90

turkey10.7121.4347.601.793.6713.5628.58131.50

usa9.9319.7543.861.733.5313.2027.43128.22

ussr10.0720.0044.601.753.5913.2027.53130.55

wsamoa10.8221.8649.002.024.2416.2834.71161.83

procprincompout=w;

varx1-x8;

run;

由执行程序后得到的表可作如下分析。以下是特征值表

Eigenvalues

EigenvalueDifferenceProportionCumulative

PRIN16.622155.744530.8277680.82777

PRIN20.877620.718300.1097020.93747

PRIN30.159320.035270.0199150.95739

PRIN40.124050.044170.0155060.97289

PRIN50.079880.011920.0099850.98288

PRIN60.067970.021550.0084960.99137

PRIN70.046420.023820.0058020.99717

PRIN80.02260.0.0028251.00000

两个主成分的累计方差贡献率为0.9375,可见只要两个主成分就够了。以下是由特征向量表

Eigenvectors

PRIN1PRIN2PRIN3PRIN4PRIN5PRIN6PRIN7PRIN8

X10.3175560.5668780.3322620.1276280.262555-.5937040.1362410.105542

X20.3369790.4616260.360657-.259116-.1539570.656137-.112640-.096054

X30.3556450.248273-.5604670.652341-.2183230.156625-.002854-.000127

X40.3686840.012430-.532482-.4799990.540053-.014692-.238016-.038165

X50.372810-.139797-.153443-.404510-.487715-.1578430.6100110.139291

X60.364374-.3120300.1897640.029588-.253979-.141299-.5912990.546697

X70.366773-.3068600.1817520.080069-.133176-.219017-.176871-.796795

X80.341926-.4389630.2632090.2995120.4979280.3152850.3988220.158164

第1主成分个分量系数是大体相同的正数,第1主成分反映各国或地区跑步实力,它越小

速度快,实力越强。第2主成分在100米,200米,400米的系数为正,1500米,5000米,10000

米和马拉松系数为负,且距离越长,负的越多,反映短距离跑速度与长距离跑速度差异,即

长距离跑相对短距离跑的实力,它越小短距离相对长距离实力越强。

为了将第1主成分作为横轴,第2主成分作为立轴画出散点图,可用下列程序

procgplotdata=w;

plotprin2*prin1=country;

run;

得图5.1

图5.1径塞记录第1,2主成分散点图

COUNTRYargentinaustraliaustriabelgiumbermuda

brazilburmacanadachilechina

columbiacookiscostaczechdenmark

domrepdprkoreafinlandfrancefrg

gbnigdrgreeceguatemalhungary

indiaindonesiirelandisraelitaly

japankenyakorealuxemboumalaysia

mauritiumexiconetherlanorwaynz

philippipngpolandportugalrumania

singaporspainswedenswitzerltaipei

thailandturkeyusaussrwsamoa

PRIN2

-3

-2

-1

0

1

2

PRIN1

-4-3-2-111

从图5.1可见这些国家或地区可分为4类:第1类只有库克群岛,第1,2主成分都最大,说明

跑步速度慢,长距离跑相对短距离跑的实力大(短距离跑速度特别慢);第2类包括百慕大,

多米尼加,马来西亚,新加坡,泰国第1主成分较大,第2主成分为负值,说明总体来说,跑

步速度不快,短距离跑相对长距离跑的实力强,即短距离跑得快而长距离跑得慢;第3类包

括中国台北,朝鲜,哥斯达尼加,巴布亚新几内亚,危地马拉,毛里求斯,缅甸,菲律宾,

印度尼西亚,第1主成分较大,说明总体来说,跑步速度不快,第2主成分为正值,说明短距

离跑相对长距离跑的实力小(长距离跑速度较快);其他国家或地区为一类;第1主成分较

小,说明跑步速度总的较快,第2主成分为正值,说明短距离跑得快而长距离跑得慢。

5.4.2主成分回归

在回归分析中,常遇到自变量存在多重共线性问题,即自变量的观测值存在线性相关,

或近似线性相关。这时设计矩阵满足0'XX,第4章中已指出,用公式

YXXX')'(1

估计参数会造成较大方差。选取自变量的主成分,它们是彼此正交的。

用少量主成分作回归,再将主成分化为原始变量,这样得到的回归方程就不存在较大方差了。

与逐步回归相比,这样做的好处是,所有变量都被计算,而逐步回归略去一些自变量,这些

自变量不被计算。主成分回归原理见例5.10。

例5.10(主成分回归的原理)设自变量x1,x2,x3和因变量y的观测值如表5-8,以x1,x2,x3

为自变量y为因变量作回归。由于x3的值近似为x1与x2的和,存在多重共线性,直接以

x1,x2,x3为自变量不好,作主成分回归。

首先将x1,x2,x3零均值化,得到x1*,x2*,x3*

x1*=(x1-0.22)/0.49193,

x2*=(x2-0.24)/0.53666,

x3*=(x3-0.46)/0.50794

用相关阵计算主成分(计算所得表略去),主成分得分见表5-9,因为前一、两个主成分的

累计百分比分别为0.576和0.992,取前两个主成分为自变量(略去1个主成分)。由表5-9

数据作以y为因变量,Prin1和Prin2为自变量的回归方程

y=1.48+1.40128Prin1+0.31623Prin2

有主成分分析算得结果,Prin1,Prin2可用x1*,x2*,x3*表示:

prin1=0.461761x1*+0.461761x2*+0.757333x3*

prin2=0.707107x1*-0.707107x2*

通过变量替换,变成以x1,x2,x3为自变量的回归方程

y=-0.0598224+1.7699x1+0.789043x2+2.08929x3

这就是主成分回归所得方程。

表5-8变量观测值

X1x2x3y

1.1014

01.213

000.30.3

0000.1

0000

表5-9变量主成分得分(用协差阵计算)

YPrin1Prin2Prin3

4.00.551320.905250.01335

3.00.88824-0.687390.01118

0.3-0.33232-0.04123-0.13951

0.1-0.55362-0.088320.05749

0.0-0.55362-0.088320.05749

上述步骤很麻烦,可用SAS-REG过程实现:在procreg语句中,加选项outest=文件名(存

储主成分回归的输出数据集)在model语句自变量后加“/”号,再加选项pcomit=k(k为略去

的主成分个数)。执行后看最后一张表最后一行即可。

例5.111978-1993我国民航客运量y(万人),国民收入x1(亿元),消费额x2(亿元),

铁路客运量x3(万人),民航航线里程数x4(万公里),来华旅游人数x5(万人)如表5-10。

试建立以民航客运量为因变量的回归方程。

表5-10我国民航数据表

yearyx1x2x3x4x5

1978231314.89180.92

1979298335.00420.39

1989220419.53570.25

1981430021.82776.71

3.27792.43

335810604422.91947.70

39.021285.22

.721783.30

555210857932.432281.95

8.912690.23

388.383169.48

769.192450.14

0.681746.20

571.913335.65

83.663311.50

8296.084152.70

解直接建立回归方程发现,回归方程消费额x2的系数是负的,这与实际情况不符:消费额

越多,应当乘飞机人数越多;而回归方程x2的系数是负的,导出错误结论:消费额越多,则

乘飞机人数越少。通过检验发现,x2的系数是负的原因是:自变量出现多重共线性。本例中

自变量有5个,主成分也有5个。通过主成分分析,前3个主成分累计百分比为99.61,可以选

用前3个主成分,即去掉2个主成分,于是可用如下SAS程序计算

dataairpline;

doyear=1978to1993;

inputyx1-x5;

output;

end;

cards;

231314.89180.92

298335.00420.39

343368825319220419.53570.25

430021.82776.71

44542583.27792.43

39604422.91947.70

554565239.021285.22

7447027.721783.30

9977859555210857932.432281.95

13142938.912690.23

1442117388.383169.48

1283131769.192450.14

16671250.681746.20

2178165571.913335.65

288629383.663311.50

33832488296.084152.70

;

procreg;

modely=x1-x5/selection=stepwise;

modely=x1-x5/selection=stepwise;

run;

procregoutest=w;/*主成分回归计算结果存数据集wu*/

modely=x1-x5/pcomit=2;/*实行主成分分析,去掉2个主成分*/

procprintdata=w;

run;

执行后输出的最后一个表为

__I

_D_PN

M_ERC_T

OTPIORE

DYVDMMR

OEPAGISC

BLERETEEXXXXX

S______P12345

Y

1MODEL1PARMSY..75.9488622.3270.13433-0.15715-0.00973818.44350.29278-1

2MODEL1IPCY.296.1468702.5730.038780.05967-0.0106489.90570.21851-1

这表倒数第2行和最后一行分别是:不减少主成分和减少两个主成分所得参数估计。由这表

最后一行可见主成分方法得到的回归方程就是

y=702.573+0.03878x1+0.05967x2-.010648x3+9.9057x4+0.21851x5

其中消费额x2的系数就是正的了。

dataUSairpollution;

inputcity$SO2tempmanupopulwindprecippredays;

cards;

Albany4647.6441168.833.36135

Albuquerque1156.8462448.97.7758

Atlanta2461.53684979.148.34115

Baltimore4755.06259059.641.31111

Buffalo1147.139146312.436.11166

Charleston3155.235716.540.75148

Chicago11050.63344336910.434.44122

Cincinnati2354.04624537.139.04132

Cleveland6549.7100775110.934.99155

Columbus2651.52665408.637.01134

Dallas966.264184410.935.9478

Denver1751.94545159.012.9586

DesMoines1749.010420111.230.85103

Detroit3549.91064151310.130.96129

Hartford5649.14121589.043.37127

Houston1068.9721123310.848.19103

Indianapolis2852.33617469.738.74121

Jacksonville1468.41365298.854.47116

KansasCity1454.538150710.037.0099

LittleRock1361.0911328.248.52100

Louisville3055.62915938.343.11123

Memphis1061.63376249.249.10105

Miami1075.52073359.059.80128

Milwaukee1645.756971711.829.07123

Minneapolis2943.569974410.625.94137

Nashville1859.42754487.946.00119

NewOrleans968.32043618.456.77113

Norfolk3159.39630810.644.68116

Omaha1451.518134710.930.1898

Philadelphia6954.6169219509.639.93115

Phoenix1070.32135826.07.0536

Pittsburgh6150.43475209.436.22147

Providence9450.034317910.642.75125

Richmond2657.81972997.642.59115

SaltLakeCity2851.01371768.715.1789

SanFrancisco1256.74537168.720.6667

Seattle2951.13795319.438.79164

5655.97756229.535.89105

Washington2957.34347579.338.89111

Wichita856.612527712.730.5882

Wilmington3654.080809.040.25114

;

procreg;

modelSO2=tempmanupopulwindprecippredays/selection=stepwise;

run;

procregoutest=w;/*主成分回归计算结果存数据集wu*/

modelSO2=tempmanupopulwindprecippredays/pcomit=3;

procprintdata=w;

run;

Model:MODEL1

DependentVariable:SO2

NumberofObservationsRead41

NumberofObservationsUsed36

NumberofObservationswithMissingValues5

AnalysisofVariance

SumofMean

SourceDFSquaresSquareFValuePr>F

Model6137862297.6195010.46<.0001

Error296369.92191219.65248

CorrectedTotal3520156

RootMSE14.82068R-Square0.6840

DependentMean31.19444AdjR-Sq0.6186

CoeffVar47.51063

ParameterEstimates

ParameterStandard

VariableDFEstimateErrortValuePr>|t|

Intercept1133.6761650.055242.670.0123

temp1-1.529920.65951-2.320.0276

manu10.066920.017173.900.0005

popul1-0.042360.01649-2.570.0156

wind1-3.383461.88276-1.800.0827

precip10.804160.397602.020.0524

predays1-0.172060.17864-0.960.3434

5.5经验正交函数分解

经验正交函数分解也称为经验正交分解或自然正交分解,其算法类似主成分分析,但

含义不同,分析方法不同。主成分分析是对随机向量作分析,而经验正交函数分解是对确定

性变量进行分析。由于经验正交函数分解在气象上的用途较大,在气象上称为EOF方法,见

吴洪宝(2005)。

对气象要素场的分解,目前已有多种方法,如谐波分析、球函数分解、Chebixief分解。

与它们相比,经验正交函数分解更有其优越性。(1)没有固定的函数形式,因而不需许多

数学假设,更易符合实际。(2)能在不规则分布站点使用。(3)既可以在空间不同点进行

分解,也可在同一站点对不同时间点分解,也可对同一站点不同要素做分解。

为了便于叙述经验正交分解,我们举不同站点,不同时间气温的例5.12和例5.13作为例

子;容易看出气温可换为任何其他指标,站点这一属性变量也可换为别的属性变量,如例5.14

中换为害虫的品种。

例5.12设有p个站点,在i时刻j个站点温度观测值为

ij

t,温度矩阵为

npn

p

p

t......t

............

t......t

t......t

1

221

111

,令

n

1k

kjijij

t

n

1

tf,

ij

f是各站点观测值减去对时间的平均值(距平),

ij

f形成矩阵

npn

p

p

ff

ff

ff

F

......

............

......

......

1

221

111

设样本协差阵为R,则FF

1-n

1

RT,若

j

d为R的第j个彼此正交的单位特征向量,则

pijFdT

jj

,...,

是第j个主成分得分所成向量。用分块矩阵写成

]...[]...,[

11pp

ddFTT

由特征向量正交性,通过分块矩阵乘法易证



T

T

p

1

p1

d

...

d

T...TF,

于是,由分块矩阵乘法

T

j

j

d

p

1

j

TF。称

j

d为第j个空间函数,

j

d彼此正交。每一个

j

d是

一个空间正交

经验函数。和主成分分析一样,通常只取少数前k个

j

T和

j

d,

T

j

k

j

j

dTF

1

,即F可用少

数空间正交经验

函数描述。

我们也可以考虑时间正交经验函数,令

p

1k

ikijij

t

p

1

tg,设

ij

g形成矩阵

np

f......g

............

g......g

g......g

G

p1

2n21

1n11

设GG

1-p

1

HT,若

j

c为H的第j个彼此正交的单位特征向量,则

n,...,2,1GcLj

jj

是第j个主成分得分所成向量。用分块矩阵写成

]c...c[G]L...,L[

n1n1

由特征向量正交性,通过分块矩阵乘法易证



T

T

n

1

n1

c

...

c

L...LG,

于是,由分块矩阵乘法

T

j

j

j

cLGn

1

。称

j

c为第j个时间函数,

j

c彼此正交。每一个

j

c是

一个正交经验函数。和主成分分析一样,通常只取较小的数k,只考虑前k个

j

c和

j

L,

T

j

k

j

j

cLG

1

,

即G可用少数时间正交经验函数描述。

上述分解称为正交经验函数分解,也称为正交经验分解。

容易看出,正交经验函数分解与主成分分析的计算相同:若有q行p列数据矩阵F,将矩

阵q个行变量作为p维随机变量的q次观测值,用协方差阵做主成分分析。各个主成分的系数

构成p个正交经验函数,p个主成分得分就是正交经验函数的系数。

正交经验(函数)分解可以用SAS-PRINCOMP过程完成,例5.13说明其过程。

例5.13设某三气象站(a,b,c)五次气温观测如表4-11,试做经验正交函数分解。

表5-11三气象站五次气温观测值

abc

101214

91112

91213

101214

81113

用空间经验函数分解,可采用下列程序

datatemperat;

inputabc;

cards;

101214

91112

91213

101214

81113

;

procprincompcovout=w1;

varabc;

procprintdata=w1;

run;

执行该程序后,得到的主要输出为

SimpleStatistics

ABC

Mean9.2.6000000013.20000000

StD0.8366600270.547722560.83666003

上表指出三地气温平均值分别是9.2,11.6,13.2;因而距平F应是观测值减去此三数,即{

ij

f}

0.80.40.8

-0.2-0.6-1.2

-0.20.4-0.2

0.80.40.8

-1.2-0.6-0.2

Eigenvectors

PRIN1PRIN2PRIN3

A0.6425420.707107-.295194

B0.4174680.0000000.908692

C0.642542-.707107-.295194

上表给出三个特征向量,它们是彼此正交的经验函数(空间函数)。第一特征向量

)'0.642542,0.417468,0.642542(x

1

中各分量相差不多,反映三地气温偏高的特征;第

二特征向量)'-0.707107,0.00000,((707107x

2

中第1,3个分量绝对值相同,符号相反,

反映a地气温偏高,c地气温偏低的趋势。第三特征值很小,第三特征向量不考虑

OBSABCPRIN1PRIN2PRIN3

11012141.195050.00000-0.10883

291112-1.150040.70711-0.13194

391213-0.09003-0.000000.48155

41012141.195050.00000-0.10883

581113-1.15004-0.70711-0.13194

上表给出'19505.119505.1,09003.0,19505.1,19505.1(

1

),T,

)'70711.0,00000.0,00000.0,70711.0,00000.0(

2

T

说明第一特征在第1,4年较强(1、4年气温偏高),第2,5年反方向较强(2、5年气温偏低),

第三年很弱(气温接近平均值);第二特征第二年较强(a地气温距平比c地高),第五年反

方向较强(c地气温比a地高)。

若想得到时间正交经验分解,则须把5年的观测值作为5个随机变量,a、b、c,3地

代表不同观测次数,就可得另一分解。这时

采用程序

datatemperat;

inputyear1-year5;

cards;

1099108

1211121211

1412131413

;

procprincompcovout=w2;

varyear1-year5;

procprintdata=w2;

run;

执行程序后得到的主要结果是

Eigenvectors

PRIN1PRIN2PRIN3PRIN4PRIN5

YEAR10.436094-.465405-.0371970.3030460.707107

YEAR20.3342450.2403580.9113220.0000000.000000

YEAR30.4504420.713419-.3533700.4040610.000000

YEAR40.436094-.465405-.0371970.303046-.707107

YEAR50.5522920.007656-.204583-.8081220.000000

这5个特征向量就是彼此正交的时间经验函数。由上表可见第1特征向量是

)'552295.0,436094.0,450442.0,334245.0,436094.0(

1

X反映五年气温偏高特

征;第2特征向量是)'007656.0,465405.0,713419.0,240358.0,465405.0(

2

X,反映

第三年偏高,第1,4年偏低特征。

OBSYEAR1YEAR2YEAR3YEAR4YEAR5PRIN1PRIN2PRIN3PRIN4PRIN5

11099108-4.82526-0.22404000

2.595810.55828000

3.22945-0.33425000

上表表明)22945.4,59581.0,82546.4(L

1

,中-4.82546和4.22945绝对值相对很

大符号一正一负,0.59581绝对值相对偏小,说明第一特征在a,c地突出,a地5年温度偏

低,c地5年温度偏高;)'33425.0,55828.0,22404.0(L

2

。0.55828绝对值相对

偏大,这说明第二特征在b地突出,b地第三年偏高,第1,4年偏低。

例5.14已知山楂园昆虫群落调查表如表5-12,其中x1-x16表示不同时刻昆虫的群

落x1:桃蚜,x2:山楂木虱,x3:草履蚧,x4:山楂叶螨,x5:梨网蝽,x6:黑绒金龟子,x7:

苹毛金龟子,x8:顶梢叶蛾,x9:苹小卷叶蛾,x10:金纹细蛾,x11:舟形毛虫,x12:山楂粉

蝶,x13:桃小食心虫,x14:梨小食心虫,x15:白小食心虫,x16:桑天蛾。T表示时段。做经

验正交函数分解。

表5-12山楂园昆虫群落调查表

tx1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16

100090

224187000

33295313220050

46758622100340070

52665330

6382640

72130

81176

98

3726196

89815

127

93230

0

1500000

0000

解设变量x1-x16表示各种昆虫群落,程序为

datainsect;

inputtx1-x16;

cards;

100090

224187000

33295313220050

46758622100340070

52665330

6382640

72130

81176

98

3726196

89815

127

93230

0

1500000

160000

;

procprincompcovn=4out=eo;

varx1-x16;

run;

procprintdata=eo;

varprin1-prin4;

run;

执行后的主要输出为

Eigenvalues

EigenvalueDifferenceProportionCumulative

PRIN184013.743593.60.5386760.538676

PRIN240420.122524.40.2591640.797841

PRIN317895.713220.90.1147430.912584PRIN4

4674.9.0.0299740.942558

以上是协方差阵的前4个特阵值,累积比例0.9426表明只要取4个特阵向量-经验正交函数就

够了。

Eigenvectors

PRIN1PRIN2PRIN3PRIN4

X10.6157500.2680840.105590-.024590

X20.0757760.000319-.1925030.662614

X30.009459-.008403-.0304540.098534

X4-.0651800.1232610.0421560.037380

X5-.3690400.888579-.092851-.056854

X60.3798580.1634050.080058-.139139

X70.5317130.2497750.0989180.059858

X80.0098340.006608-.0461470.139279

X9-.0065960.005590-.0248130.099052

X10-.1200400.0658800.218413-.357796

X11-.0863550.0342700.5243090.237510

X120.0392510.005635-.0181960.074936

X13-.107747-.0445230.7659760.047768

X14-.1086120.1557970.0985430.531365

X15-.0239710.046099-.046134-.138075

X16-.0065500.0142580.0059160.010357

以上是前4个特征向量,即4个主要正交经验函数。第1特征向量中x1、x7系数大,x5负系数

大,表明桃蚜和苹毛金龟子高发而梨网蝽低发;第2特征向量中x5系数大,表明梨网蝽高发,

第3征向量中x11,x13系数大,表明舟形毛虫和桃小食心虫高发;第4特征向量中x2,x14系数

大,表明山楂木虱,梨小食心虫高发。

OBSPRIN1PRIN2PRIN3PRIN4

1-31.461-188.305-73.085-29.684

2-8.182-179.016-70.590-23.294

3369.967-4.305-8.985-31.552

4859.690254.04973.040-13.554

5279.934-0.902-44.50851.884

6-3.851-75.585-112.096114.183

7-79.377-59.232-104.294112.322

8-129.83622.840-79.17344.987

9-203.233184.630-87.230-95.750

10-304.774407.810-62.373-98.047

11-306.690370.82843.951100.632

12-176.478-25.204346.23128.616

13-121.268-154.751278.666-18.080

14-68.164-175.74233.614-58.972

15-40.817-186.957-61.825-49.482

16-35.460-190.160-71.343-34.209

上表为主成分得分。Prin1得分中3,4,5时段得分数大,表明该三个时段桃蚜和苹毛金龟

子高发高发;Prin2得分中4,9,10,11时段得分数大,表明上述时段梨网蝽高发(与实

际情况略有差异,因有第1主成分影响);Prin3得分中第12,13时段得分大,表明这2

时段舟形毛虫和桃小食心虫高发Prin4得分中6,7,10时短得分大,表明这三时段山楂木

虱,梨小食心虫高发。

应用实例

例1利用武汉市2005年五月份的每天的各监测站平均的SO2监测值与每天早上八点钟的风

力、气温、三小时降水作主分量分析。

解:采用以下程序

datatemperat;

inputdateemo_so2dd_08t_08r_08;

cards;

2005050122.43123.0018.00

2005050245.57220.400.00

2005050370.14122.300.00

2005050447.14222.907.00

2005050542.29123.000.10

2005050634.57117.800.00

2005050746.14120.000.00

2005050827.86121.600.00

2005050945.57121.900.00

2005051053.57124.200.00

2005051143.29224.100.00

2005051254.71225.200.00

2005051349.29023.800.70

2005051426.57220.700.00

2005051520.71220.000.00

2005051626.14122.0013.00

2005051718.29220.3018.00

2005051830.29220.200.00

2005051933.00220.400.20

2005052046.14020.900.00

2005052124.14119.205.00

2005052226.14121.100.00

2005052342.71023.300.30

2005052445.00022.300.00

2005052548.29224.100.00

2005052652.86223.800.00

2005052752.29024.300.00

2005052833.43125.300.00

2005052944.14224.300.00

2005053054.86227.600.00

2005053130.71125.900.00

;

procprincompcovout=prin;

vardd_08t_08r_08;

procprintdata=prin;

run;

执行上述程序,输出的结果是:

EigenvaluesoftheCovarianceMatrix

EigenvalueDifferenceProportionCumulative

125.667416820.93505960.82990.8299

24.73235734.20413130.15300.9829

30.52822590.01711.0000

Eigenvectors

Prin1Prin2Prin3

dd_080.0107590.0033340.999937

t_08-.0747310.997201-.002521

r_080.9971460.074699-.010978

上表给出所考察变量样本协差阵的特征值25.6674168、4.7323573、0.5282259和特征向

量(0.010759,-0.074731,0.997146)、(0.003334,0.997201,0.074699)和(0.999937,

-0.002521,-0.010978)。因此第一、二、三主分量分别是

prin1=0.010759*dd_08-0.074731*t_08+0.997146*r_08,

prin2=0.003334*dd_08+0.997201*t_08+0.074699*r_08,

prin3=0.999937*dd_08-0.002521*t_08-0.010978*r_08

由于第1个主分量累计贡献为0.8299,前两个第2个主分量所占比例为0.9829,所以对

三个主分量只需要考虑前两个。

例2

资料:江苏省沿江七市1954—2002年梅雨量。资料说明:NJ-南京,SZ-苏州,WX-无锡,CZ-

常州,ZJ-镇江,YZ-扬州,NT-南通。所用资料省略了小数位,6234表示623.4毫米。

方法:主分量分析。

程序主体如下:

DATAMEIYU;

TITLE'MEIYUINSELECTEDCITIES';

INPUTYEARNJSZWXCZZJYZNT;

CARDS;

827993300

1955212

4254502967

19572611

2628

746969

529931

1231017

933872341

115481470

5881203

423791474988894

61961602

312672287

327061756

19696796

19765934

114111019

1972448

7913851347

530342718

239364951

52533289

3227

415

26492619

1983341469

717932513

134283287

822893374299322082520

723772569

6257822682

024084870

435463905

966588251199511

524381403

18211553

567778339821777047349

62739

525231937

1887

429582493222314282919

447803060

006884

921731501

40

251552

2399272272242334

228222236873821905

;

procprincompcovoutstat=prinout=temp1;

VARNJSZWXCZZJYZNT;

procprintdata=temp1;

run;

结果分析:

第一主分量解释了总方差的80.7%,第二主分量解释了12.7%,二者共解释了总方差的

93.3%。故只需分析第一和第二主分量就可概括江苏沿江地区梅雨的时空演变情况。

从第一主分量可以看出,第一特点是量值均为正数,表明江苏沿江地区梅雨量异常具有

同步性,第二特点是宁、镇、扬地区为高值区,苏、锡、常、通为低值区。结合第一主分量

时间系数曲线(图1)中存在的年际振荡的趋势,江苏沿江地区梅雨似乎存在一个振幅周期

约3~4年的振荡。图中黑粗线为三阶多项式回归趋势线。可以看出,上世纪50年代末至

70年代初为梅雨量较少的时期,70年代中后期至90年代初梅雨量偏多,1993年以来,梅

雨量偏少,1999年以后似乎有回升的趋势。由于资料只到2002年,但我们知道2003年为

我省历史罕见的梅雨量异常多年,因此也印证了这种变化趋势。

从第二主分量可以看出,宁、镇、扬地区为负值,苏、锡、常、通地区正值,表明这两

个地区梅雨季节旱涝异常呈反位相。即当宁、镇、扬地区偏旱时,苏、锡、常、通地区偏涝;

当宁、镇、扬地区偏涝时,苏、锡、常、通地区偏旱。也就是东旱西涝或东涝西旱。结合第

二主分量时间系数曲线(图2)可以看出上世纪80年代末以前,以负系数为主(即东旱西涝),

而90年代至今以正系数为主(既东涝西旱)。

综上可将江苏省沿江地区梅雨降水分成四种型:旱型、涝型、东旱西涝型、东涝西旱型。

例3全国14个城市的春夏秋冬四季的平均季雷暴日如下表,试用主分量分析方法分析。

SpringSummerAutumnWinter

Shanghai6.919.75.40.3

Ganyu5.129.73.40.1

Xuzhou5.526.12.90.1

Sheyang5.727.85.30.2

Qingjiang6.029.54.10.4

Xuyi6.333.04.10.2

Dongtai5.429.54.80.4

Gaoyou5.730.15.80.4

Qidong6.423.45.80.2

Nantong6.425.75.30.3

Nanjing7.425.93.20.4

Wujin7.726.64.20.3

Liyang9.131.75.40.5

Wuxian10.029.67.20.4

Sas程序如下:

datatemperat;

inputcitySpringSummerAutumnWinter;

cards;

Shanghai6.919.75.40.3

Ganyu5.129.73.40.1

-10000

-5000

0

5000

10000

15000

20000

4941999

图1江苏省沿江地区梅雨降水第一主分量的时间系数

-5000

-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

4941999

图2江苏省沿江地区梅雨降水第二主分量的时间系数

Xuzhou5.526.12.90.1

Sheyang5.727.85.30.2

Qingjiang6.029.54.10.4

Xuyi6.333.04.10.2

Dongtai5.429.54.80.4

Gaoyou5.730.15.80.4

Qidong6.423.45.80.2

Nantong6.425.75.30.3

Nanjing7.425.93.20.4

Wujin7.726.64.20.3

Liyang9.131.75.40.5

Wuxian10.029.67.20.4

;

procprincompdata=temperatcovoutstat=prin;

varSpringSummerAutumnWinter;

procprintdata=prin;

run;

运行结果如下:

EigenvaluesoftheCovarianceMatrix

EigenvalueDifferenceProportionCumulative

112.12309309.47492280.77720.7772

22.64817021.83097510.16980.9470

30.81719510.80774130.05240.9994

40.00945380.00061.0000

Eigenvectors

Prin1Prin2Prin3Prin4

Spring0.0320460.818308-.572537-.039334

Summer0.999396-.0203880.027273-.006921

Autumn-.0108230.5727200.819409-.021051

Winter0.0079570.044148-.0050870.998980

结果分析:

上表给出协差阵的四个特征值分别为:

λ1=12.1230930

λ2=2.6481702

λ3=0.8171951

λ4=0.0094538

对应得特征向量分别为:

(0.032046,0.999396,-.010823,0.007957)

(0.818308,-.020388,0.572720,0.044148)

(-.572537,0.027273,0.819409,-.005087)

(-.039334,-.006921,-.021051,0.998980)

由此可得:

Prin1=0.032046Spring+0.818308Summer-.572537Autumn-.039334Winter

Prin2=0.999396Spring-.020388+Summer+0.027273Autumn-.006921Winter

Prin3=-.010823Spring+0.572720Summer+0.819409Autumn-.021051Winter

Prin4=0.007957Spring+0.044148Summer-.005087Autumn+0.998980Winter

Spring和Summer两个主分量解释了总方差的94.7%,可见只要用Spring和Summer两个主

分量就能很好的概括这组数据。

例4对徐州,丰县和沛县三个地区从1961年到2000年逐年降水资料进行主分量分析。单

位:mm

程序如下:

datarain;

inputyearc1c2c3;

cards;

1961727.8586.8591.5

19621013.11285.41049.6

19631213.4855.91367.4

1964969.51103.31109.9

19651633.91158.61144.5

1966611.7462.7570.7

1967756.3842.7729.9

1968638.9486.7550.0

1969969.6779.6744.1

1970793.6792.9721.7

1971887.6801.31178.9

1972873.0673.0671.4

1973808.5899.7865.9

19741021.3848.2897.9

19751065.1639.21450.0

1976784.81335.4630.2

1977695.3747.4742.8

19782123.21610.3628.0

1979946.4922.3959.2

1980776.51260.0646.9

1981612.2418.6492.4

19821017.3558.9546.3

1983950.3563.7556.5

1984952.3818.1657.3

1985966.61098.3992.1

1986681.6431.2630.7

1987620.5694.5686.6

1988500.6352.0425.9

1989692.7541.6632.6

19901088.8803.1833.4

1991781.9664.4649.9

1992888.6875.5899.2

1993836.4800.2745.8

1994821.11533.9609.9

1995825.3651.3719.7

1996965.8673.0659.7

1997794.3427.9591.0

19981128.9725.6819.9

1999641.9462.4571.6

2000979.6630.4865.4

procprincompdata=raincovoutstat=prin;

varc1c2c3;

procprintdata=prin;

例5回归分析

一资料选取及分析方法简介

资料选取的是1951---1980年(计29年)的气象资料。该回归方程的目的是研究长江

中下游六月份降水量Y与六个预报因子X1---X6之间的相关关系。其中,X1:12月份北半

球极涡强度。X2:9月份40°N,90°W、100°W两点500hPa高度平均。X3:10月份65°

N,150°E-170°W五点500hPa高度平均。X4:12月份180°E-160°W三点500hPa高度平

均。X5:2月份80°N、150°W-130°W;70°N、150°W-130°W六点500hPa高度平均。X6:

9月份月平均温度减20度。以上因子X1—X6中凡是高度平均均以减去5000位势米。(注:

该资料中的数据已标准化)

分析方法是正交组合的多元回归。该方法是将M个因子正交组合成M个新的因子(即相

互独立的主成份),然后利用正交组合因子建立多元回归方程。具体做法:用SAS的PROC

PRINCOMP过程来分析组合因子,这里组合因子为主分量分析中的主成份,用主成份来建立

回归的好处:一、由于组合因子彼此正交,建立回归方程时特别简单,回归系数可以独立计

算。而且由于信息的重新组合分配,使预报因子减少成为可能。二、因为组合因子和预报对

象的相关系数与原变量与预报对象的相关系数有很大的不同。三、组合因子能利用到各个原

始变量的信息,这与逐步回归有很大不同。把PROCPRINCOMP得到的主成份与预报对象组成

新的数据来用正交回归PROCORTHOREG过程建立回归方程,同时根据组合因子与预报对象Y

的相关系数来对预报因子进行取舍。

二程序

(一)组合因子的产生-----PROCPRINCOMP过程

1程序PROCPRINCOMP

DATAchang;

TITLE'长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析';

INPUTx1x2x3x4x5x6;

cards;

-0.362-0.684-0.6721.3050.529-0.548

1.8341.0042.328-1.685-0.4400.284

-2.0690.1601.256-1.570-2.217-0.363

-1.0941.8482.1130.385-0.1861.208

0.8581.004-0.2441.880-1.5712.040

-0.1180.582-1.1010.2700.529-0.271

0.126-1.106-0.030-0.4201.337-1.842

1.3460.582-1.101-0.075-1.086-0.641

-0.362-1.1061.256-0.6502.3060.191

-0.6061.004-0.244-0.535-1.0860.746

0.126-1.1060.1850.6151.3370.284

0.370-1.106-0.0300.6150.8520.284

-0.3622.270-0.672-0.880-1.2470.653

0.6140.160-0.4581.5350.3681.670

0.370-1.106-0.8870.960-0.117-0.918

0.126-1.106-0.887-0.190-0.440-1.295

2.078-0.684-1.5300.615-0.2780.099

-0.118-1.528-0.2441.075-0.1170.469

1.3460.1600.613-1.6851.6601.116

0.6140.1600.1850.500-0.601-0.271

-1.582-0.262-1.3151.075-0.117-0.548

-0.3620.160-0.6720.6150.045-1.288

-0.3620.160-0.458-1.110-0.601-1.658

-1.094-1.1061.042-1.5700.206-0.179

-1.826-1.106-1.315-0.650-0.2782.040

1.346-0.2620.613-1.340-0.117-1.011

0.1260.5821.2560.6151.014-0.086

-0.3621.4260.1850.1550.2060.653

-0.6061.0040.8280.1551.014-0.918

;

PROCPRINCOMPdata=changoutstat=prin;

varx1x2x3x4x5x6;

procprintdata=prin;

run;

2PROCPRINCOMP过程产生的新的组合因子

-1.4925-0.6176-0.29760.37460.68700.153979.5

2.18921.84331.6778-1.1641-0.37730.6343293.7

2.29390.7754-2.49810.2298-0.45341.0176389.2

2.9734-0.36860.19471.10601.04230.7352298.9

1.1684-2.84551.37770.0285-0.0918-0.7724274.3

-0.4992-0.5474-0.1514-.21400.4159-1.0957172.7

-1.78371.6799-0.4490-0.40520.4572-.138055.5

0.0030-0.83120.0536-2.0415-0.2565-0.1761221.8

-0.85532.14600.75101.6607-0.0109-0.2100211.9

1.5047-0.7266-0.56840.0152-0.4221-0.4042191.9

-1.32590.37840.75881.00340.02420.1837156.5

-1.23780.07010.69340.6076-0.20170.343086.3

2.2302-1.0085-.5056-0.7373-.1537-1.2968305.5

-0.2875-1.66291.54550.8072-0.0486-0.0495137.4

-1.6780-0.5551-0.3623-0.56750.11800.5770137.5

-1.22630.1116-1.0241-0.9073-0.43900.3318130.4

-1.4191-1.16841.0846-1.4798-0.91610.078771.6

-1.1812-0.72490.02850.7725-0.45921.0031139.4

0.33571.63121.94450.1894-1.0951-1.1240197.2

0.1120-0.37020.2667-0.66900.29330.6062218.0

-0.9717-1.1936-1.59040.60610.6363-0.3143182.1

-0.7718-0.2807-0.8398-0.53740.9663-0.2113159.8

0.03240.7587-1.5532-1.26580.0750-0.3341155.3

0.25751.8429-0.99760.8823-0.80080.3803268.4

-0.0028-1.0067-1.15701.9727-2.0726-0.6394243.3

0.05481.55140.3334-1.5011-0.46950.3055143.4

0.21390.63850.97570.55811.26990.1265115.1

1.0972-0.42690.37750.42170.6268-0.7700238.2

0.27260.9013-0.07230.28441.6478-0.5158219.1

(二)组合因子建立回归方程----PROCORTHOREG

datapreci;

inputz1-z6y;

cards;

-1.4925-0.6176-0.29760.37460.68700.153979.5

2.18921.84331.6778-1.1641-0.37730.6343293.7

2.29390.7754-2.49810.2298-0.45341.0176389.2

2.9734-0.36860.19471.10601.04230.7352298.9

1.1684-2.84551.37770.0285-0.0918-0.7724274.3

-0.4992-0.5474-0.1514-.21400.4159-1.0957172.7

-1.78371.6799-0.4490-0.40520.4572-.138055.5

0.0030-0.83120.0536-2.0415-0.2565-0.1761221.8

-0.85532.14600.75101.6607-0.0109-0.2100211.9

1.5047-0.7266-0.56840.0152-0.4221-0.4042191.9

-1.32590.37840.75881.00340.02420.1837156.5

-1.23780.07010.69340.6076-0.20170.343086.3

2.2302-1.0085-.5056-0.7373-.1537-1.2968305.5

-0.2875-1.66291.54550.8072-0.0486-0.0495137.4

-1.6780-0.5551-0.3623-0.56750.11800.5770137.5

-1.22630.1116-1.0241-0.9073-0.43900.3318130.4

-1.4191-1.16841.0846-1.4798-0.91610.078771.6

-1.1812-0.72490.02850.7725-0.45921.0031139.4

0.33571.63121.94450.1894-1.0951-1.1240197.2

0.1120-0.37020.2667-0.66900.29330.6062218.0

-0.9717-1.1936-1.59040.60610.6363-0.3143182.1

-0.7718-0.2807-0.8398-0.53740.9663-0.2113159.8

0.03240.7587-1.5532-1.26580.0750-0.3341155.3

0.25751.8429-0.99760.8823-0.80080.3803268.4

-0.0028-1.0067-1.15701.9727-2.0726-0.6394243.3

0.05481.55140.3334-1.5011-0.46950.3055143.4

0.21390.63850.97570.55811.26990.1265115.1

1.0972-0.42690.37750.42170.6268-0.7700238.2

0.27260.9013-0.07230.28441.6478-0.5158219.1

;

procorthoregdata=preci;

modely=z1-z6;

run;

三回归结果及分析

(一)PROCPRINCOMP过程结果及分析:

1结果

长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析13

20:41Sunday,May3,1998

PrincipalComponentAnalysis

29Observations

6Variables

SimpleStatistics

X1X2X3

Mean-0.000172414-0..000000000

StD1...017603979

X4X5X6

Mean0..-0.003448276

StD1..99704825321.022061348

长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析14

20:41Sunday,May3,1998

PrincipalComponentAnalysis

CorrelationMatrix

X1X2X3X4X5X6

X11.00000.0118-.02700.06360.1135-.0144

X20.01181.00000.2637-.0867-.37340.2639

X3-.02700.26371.0000-.41870.18880.1039

X40.0636-.0867-.41871.00000.05680.1658

X50.1135-.37340.18880.05681.0000-.0929

X6-.01440.26390.10390.1658-.09291.0000

长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析15

20:41Sunday,May3,1998

PrincipalComponentAnalysis

EigenvaluesoftheCorrelationMatrix

EigenvalueDifferenceProportionCumulative

PRIN11.600370.1915480.2667280.26673

PRIN21.408820.3076110.2348040.50153

PRIN31.101210.1480790.1835350.68507

PRIN40.953130.3736490.1588550.84392

PRIN50.579480.2225020.0965810.94050

PRIN60.35698.0.0594971.00000

长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析16

20:41Sunday,May3,1998

PrincipalComponentAnalysis

Eigenvectors

PRIN1PRIN2PRIN3PRIN4PRIN5PRIN6

X1-.122551-.0313110.585940-.779020-.1643880.082339

X20.6127440.2785840.101450-.1933060.538883-.457036

X30.500793-.4925720.2625760.1588300.2080340.607560

X4-.3991540.5260260.3024650.1990080.5286080.391735

X5-.330872-.5047860.4700050.3226400.236233-.504883

X60.2998870.3837580.5148270.432242-.551426-.082743

长江中下游6月份降水量的Y与X1,X2,X3,X4,X5,X6的关系之组合因子分析17

20:41Sunday,May3,1998

OBS_TYPE__NAME_X1X2X3X4X5X6

1MEAN-0.0002-0.00010.00000.00030.0312-0.0034

2STD1.01781.01761.01761.01760.99701.0221

3N29.000029.000029.000029.000029.000029.0000

4CORRX11.00000.0118-0.02700.06360.1135-0.0144

5CORRX20.01181.00000.2637-0.0867-0.37340.2639

6CORRX3-0.02700.26371.0000-0.41870.18880.1039

7CORRX40.0636-0.0867-0.41871.00000.05680.1658

8CORRX50.1135-0.37340.18880.05681.0000-0.0929

9CORRX6-0.01440.26390.10390.1658-0.09291.0000

10EIGENVAL1.60041.40881.10120.95310.57950.3570

11SCOREPRIN1-0.12260.61270.5008-0.3992-0.33090.2999

12SCOREPRIN2-0.03130.2786-0.49260.5260-0.50480.3838

13SCOREPRIN30.58590.10150.26260.30250.47000.5148

14SCOREPRIN4-0.7790-0.19330.15880.19900.32260.4322

15SCOREPRIN5-0.16440.53890.20800.52860.2362-0.5514

16SCOREPRIN60.0823-0.45700.60760.3917-0.5049-0.0827

2结果分析

在表EIGENVALUESOFTHECOVARIANCEMATRIX中给出协方差阵的六个特征值分别为:

λ1=1.6004,λ2=1.4088,λ3=1.1012,λ4=0.9531,λ5=0.5795,λ6=0.3570,

对应的特征向量分别为:

t1=(-0.1226,0.6127,0.5008,-0.3992,-0.3309,0.2999)

t2=(-0.0313,0.2786,-0.4926,0.5260,-0.5048,0.3838)

t3=(0.5859,0.1015,0.2626,0.3025,0.4700,0.5148)

t4=(-0.7790,-0.1933,0.1588,0.1990,0.3226,0.4322)

t5=(-0.1644,0.5389,0.2080,0.5286,0.2362,-0.5514)

t6=(0.0823,-0.4570,0.6076,0.3917,-0.5049,-0.0827)

由此可以得到第一主分量

PRIN1=-0.1226X1+0.6127X2+0.5008X3-0.3992X4-0.3309X5+0.2999X6

第二、三个等主分量类似可得。再将观测值代入主分量即得到新的组合因子(组合因

子即主成份)。

(二)PROCORTHOREG过程结果:

1结果

TheSASSystem21:49Sunday,May3,19984

ORTHOREGRegressionProcedure

DependentVariableY

SumofSquaredErrors38506.064031

DegreesofFreedom22

MeanSquaredError1750.2756378

RootMeanSqrError41.836295699

R-square0.7803532718

VariableDFParameterEstimateStdErrorT-RatioProb>|t|

INTERCEP1189.4.8.280.0001

z1150.68782492178346..370.0001

z211.374.7049073770.200.8395

z31-14.687.5323693735-1.950.0641

z4111.219.1.380.1802

z51-13.798983212231910.340164965-1.330.1957

z61-0.27313.242646135-0.020.9871

2SAS结果分析

由上述程序的结果我们可以看出正交回归统计分析ORTHOREG过程给出两个表:方差

分析表、参数估计表。

方差分析表:残差平方和SE=38506.064031也给出自由度为22,则平均平方

和MSE=SE/22=1750.2756378。表中还给出复相关系数平方R2=0.7803532718。

参数估计表:它给出常数项和回归系数分别是189.4

50.68782492178341.374-14.6811.219

-13.7989832122319-0.273;它们的标准误差是7.8.

6.7049073777.53236937358.1.34016496513.242646135。t值为:

24.288.370.20-1.951.38-1.33-0.02并给出t分布随机变量大于它们的概率

是0.00010.00010.83950.06410.18020.19570.9871。前两个概率小于0.05,

说明截距和Z1的作用明显,不能舍去。

由参数估计表的参数估计值结果可以得到回归方程:

Y=189.41+50.69Z1+1.38Z2-14.69Z3+11.22Z4-13.80Z5-0.22Z6

3物理意义分析

原预报因子与预报对象的相关系数Riy为-.36410.54920.4458-.4197-.5036

0.3761(Riy的求法:将原预报因子X1-X6与预报对象Y建立一个数据集从而可用PROC

PRINCOMP过程来计算,该过程的其中一个结果为各个因子之间的相关系数,这样可从中提

取出X1-X6与Y的相关系数。)可见,预报因子与预报对象的关系为:X1:12月份北半球

极涡强度,X4:12月份180°E-160°W三点500hPa高度平均及X5:2月份80°N、150°W-130°

W;70°N、150°W-130°W六点500hPa高度平均这三个因子与预报对象Y是反相关。而X2:

9月份40°N,90°W、100°W两点500hPa高度平均,X3:10月份65°N,150°E-170°W

五点500hPa高度平均,X6:9月份月平均温度减20度这三个因子与预报对象正相关。组合

因子与预报对象的相关系数R*

iy为0.83940.0203-.19610.1393-.1332-.0422

(R*

iy的求法与Riy的求法相同,不同的是原预报因子变为组合因子Z1-Z6来建立数据集)组

合因子Z1、Z2、Z4与预报因子呈正相关,组合因子Z3、Z5及Z6与预报对象呈负相关。(注:

由于篇幅有限这里不打印出这两个相应的相关矩阵)。

由上述的结果可以看出:原预报因子与预报对象的关系都非常密切,相关系数相差不

多,这是因为原预报因子之间有正交、彼此不相互独立。而组合因子与预报对象的关系则大

不相同,第一个组合因子(即第一主分量)明显与预报因子的相关系数达到0.839,所以用

组合因子,只要选取一个组合因子就可以预报六月份长江中下游的降水。可见用组合因子建

立回归方程,预报方程的显著性大大提高了。

根据参数估计的t检验及气象中应用的相关系数R*

iy可以看出组合因子中的Z1对预报

方程的贡献最大,故方程可简化为:

Y=189.41+50.69Z1

根据预报对象Y及预报因子Z1画图,则图象如下:

如上图,Y与Z1近似呈线性且斜率为正值。故可以看出用一个组合因子Z1就可以很好

的建立这个线性回归方程。

四总结

回归分析是气象研究和预报中经常使用的相关方法。我们可以用回归方法来建立相关预

报方程(降水、气温、体感温度、湿度等预报方程)。方法可采用逐步回归法、选择最优的

变量子集法、正交组合的多元回归等,本文采用正交组合的多元回归。该方法简化了方程、

提高了预报的显著性。

本文采用了1951---1980年(计29年)的气象资料(五个高度长资料一个温度长资料)

及正交组合的多元回归方法调用SAS中的PROCPRINCOMP和PROCORTHOREG来建立相应的

回归方程。从而经分析简化最终得到方程:

Y=189.41+50.69Z1,

其中,Z1用原始变量表示为:Z1=-0.1226X1+0.6127X2+0.5008X3-0.3992X4-0.33

09X5+0.2999X6

👁️ 阅读量:0