龙空技术网

懿说学区(33) | SPSS统计分析(43)二元Logistic回归分析

LearningYard学苑 1185

前言:

如今兄弟们对“对小样本回归系数进行检验用的统计量”大致比较关心,朋友们都需要了解一些“对小样本回归系数进行检验用的统计量”的相关文章。那么小编在网络上搜集了一些对于“对小样本回归系数进行检验用的统计量””的相关知识,希望咱们能喜欢,看官们一起来了解一下吧!

Yishuo School District (33) | SPSS Statistical Analysis (43) Binary Logistic Regression Analysis

“分享兴趣,传播快乐,增长见闻,留下美好! 大家好,这里是小编。欢迎大家继续访问学苑内容,我们将竭诚为您带来更多更好的内容分享。

"Share interest, spread happiness, increase knowledge, and leave a good impression! Hello everyone, this is Xiaobian. Welcome to continue to visit the content of Xueyuan, and we will wholeheartedly bring you more and better content to share.

前面我们讲到的所有回归分析的变量都是定量变量,但在实际生活中,因变量既有定量的,也有定性的。因变量是定性变量的情况如医学上的阴性和阳性,生存和死亡,消费现象中的购买行为发生还是不发生,金融现象中的IPO通过还是不通过等等。

All the regression analysis variables we mentioned above are quantitative variables, but in real life, dependent variables are both quantitative and qualitative. Dependent variables are qualitative variables, such as negative and positive in medicine, survival and death, purchase behavior in consumption phenomenon or not, IPO in financial phenomenon or not, etc.

可以处理定性因变量的统计分析方法有很多,比如判别分析、Probit分析、Logistic回归分析和对数线性分析等。在社会科学中,应用最多的是Logistic回归分析。根据因变量取值类别数量不同,Logistic回归分析又分为二元Logistic回归分析和多元Logistic回归分析。二元Logistic回归模型中的因变量只可以取两个值1和0(虚拟变量)。我们用一个实例来简单介绍一下二元Logistic回归模型。

There are many statistical analysis methods that can handle qualitative dependent variables, such as discriminant analysis, probit analysis, logistic regression analysis and log linear analysis. Logistic regression analysis is most widely used in social sciences. Logistic regression analysis can be divided into binary logistic regression analysis and multivariate logistic regression analysis according to the number of categories of dependent variables. The dependent variable in the binary logistic regression model can only take two values 1 and 0 (dummy variable). Let's use an example to briefly introduce the binary logistic regression model.

诊断发现运营不良的金融企业是审计核查的一项重要功能,审计核查的分类失败会导致灾难性的后果。以下图表列出了66家公司的部分运营财务比率,其中33家在2年后破产(y=0),另外33家在同期保持偿付能力(y=1).请用变量x1(未分配利润/总资产)、x2(税前利润/总资产)、x3(销售额/总资产)拟合一个Logistic回归模型。

It is an important function of audit verification to diagnose and discover financial enterprises that are in bad operation. Failure to classify audit verification will lead to disastrous consequences. The following chart lists some operating financial ratios of 66 companies, 33 of which went bankrupt two years later (y=0), and 33 of which remained solvent in the same period (y=1). Please fit a logistic regression model with variables x1 (undistributed profits/total assets), x2 (pre tax profits/total assets), and x3 (sales/total assets).

第一步

分析并组织数据,一共有三个自变量,均是定量数据类型。而因变量是定性的,取值有两种状态(0和1),是典型的可用二元Logistic回归解决的问题。我们定义三个自变量x1,x2,x3,再定义因变量y,输入数据并保存。

The first step is to analyze and organize data. There are three independent variables, all of which are quantitative data types. The dependent variable is qualitative and has two values (0 and 1), which is a typical problem that can be solved by binary logistic regression. We define three independent variables x1, x2 and x3, and then define the dependent variable y, input data and save it.

第二步

设置二元Logistic回归分析,选择菜单“分析->回归->二元Logistic”,打开二元Logistic回归对话框,按下图所示进行设置。

Step 2: set binary logistic regression analysis, select the menu "Analysis ->Regression ->Binary Logistic", open the binary logistic regression dialog box, and set it as shown below.

第三步

主要结果及其分析。

下图是个案处理摘要信息,给出了数据进入模型的记录数。

The third step is the main results and analysis.

The figure below shows the summary information of case processing, showing the number of records of data entering the model.

下图是因变量的赋值表,在SPSS中,默认将二分类变量中出现次数较多的赋值为1。本例较为特殊,二分类变量的两种情况出现的次数是一样的,从表中可以看出,将“两年后破产”赋值为0,“两年后仍有偿付能力”赋值为1.

The following figure is the assignment table of dependent variables. In SPSS, by default, the variables with more occurrences in the secondary category are assigned to 1. This example is special. The two cases of the second category variable occur the same number of times. It can be seen from the table that "bankruptcy after two years" is assigned to 0, and "solvency after two years" is assigned to 1

下图是模型初始分类预测表,此时模型中不含任何自变量,只包含常数项,表格左方实际观测值,右方代表模型的预测值和正确率。此时预测所有的公司在两年后仍有偿付能力,预测的正确率是50%。

The following figure is the initial classification prediction table of the model. At this time, the model does not contain any independent variables, but only constant items. The left side of the table represents the actual observation value, and the right side represents the prediction value and accuracy of the model. At this time, it is predicted that all companies will still be solvent in two years, and the correct rate of prediction is 50%.

下面两张表给出了模型的检验结果,其中常数项系数为0.000,其显著性概率为1,可见常数项不显著。X1,x2和x3的相伴概率分别是0.000,0.000和0.094,如果以5%为置信的话,x1和x2的系数是显著的。

The following two tables show the test results of the model. The coefficient of constant term is 0.000, and its significance probability is 1. It can be seen that the constant term is not significant. The concomitant probabilities of X1, x2 and x3 are 0.000, 0.000 and 0.094 respectively. If 5% confidence is taken, the coefficients of x1 and x2 are significant.

下图是模型系数的Omnibus检验结果,共采用了三种检验方法,分别是步与步间的相对似然比检验,块(Block)间的相对似然比检验和模型间的相对似然比检验。由于本例中只有一个自变量组且采取强行进入法将所有变量纳入模型,所以三种检验方法的结果是一致的,模型具有显著的统计意义。

The following figure shows the Omnibus test results of model coefficients. Three test methods are used, namely, the relative likelihood ratio test between steps, the relative likelihood ratio test between blocks and the relative likelihood ratio test between models. Since there is only one independent variable group in this example and all variables are included in the model by forced entry, the results of the three test methods are consistent, and the model has significant statistical significance.

下图是模型情况摘要表,主要给出了对数似然值的两个决定系数,从数据上看,模型的拟合度还不错。

The following figure is a summary of the model, mainly showing the two determination coefficients of the logarithmic likelihood value. From the data point of view, the fitting degree of the model is good.

下图是模型的分类预测情况表,此时模型的预测准确率已经达到了97%。

The following figure shows the classification prediction of the model. At this time, the prediction accuracy of the model has reached 97%.

下面是Logistic模型的拟合结果。表格从左到右依次是表示变量及常数项的系数值(B)、标准误差(S.E)、瓦尔德(wald)卡方值、自由度(df)、显著性概率,Exp(B)。由于各回归系数均为正数,取相应的指数后会大于1,表示x1,x2,x3的取值越大,“两年后具有偿付能力”的可能性比“两年后破产”的可能性就越大。

The following is the fitting result of the Logistic model. From left to right, the table shows the coefficient value (B), standard error (S.E), wald chi square value, degree of freedom (df), significance probability and Exp (B) of variables and constant terms. Since all regression coefficients are positive, the corresponding index will be greater than 1, indicating that the greater the value of x1, x2, x3, the greater the possibility of "solvency in two years' time" than "bankruptcy in two years' time".

若预测值p的概率小于0.5,,样本被归于“两年后破产”组。反之,进入“两年后右偿付能力”组,其预测结果如下图所示,PRE_1表示预测概率值,PGR_1表示预测分类结果值。

If the probability of the predicted value p is less than 0.5, the sample is classified into the "bankruptcy after two years" group. On the contrary, enter the group of "two years later, right solvency", and the forecast results are shown in the figure below, PRE_ 1 is the predicted probability value, PGR_ 1 indicates the predicted classification result value.

下期预告:本期,我们学习了

非线性回归的实践操作。

下一期,我们将会学习

聚类和判别分析。

Preview of the next issue: In this issue, we learned the practical operation of nonlinear regression. In the next issue, we will learn about clustering and discriminant analysis.

今天的分享就到这里了

如果您对今天的文章有独特的想法

欢迎给我们留言

让我们相约明天

祝您今天过得开心快乐!

That's all for today's sharing. If you have unique ideas about today's article, please leave us a message. Let's meet tomorrow. I wish you a happy day today!

参考资料:百度百科,《SPSS 23 统计分析实用教程》

翻译:百度翻译

本文由learningyard新学苑原创,部分文字图片来源于他处,如有侵权,请联系删除

标签: #对小样本回归系数进行检验用的统计量