当自由度趋于无穷大时,图形趋于正态分布
# 产生服从卡方分布的观测数为1000的样本
df_n <- seq(1, 5)
chisq_data <- function(n){
x <- seq(-0.2, 16, length=300)
prob <- dchisq(x, df=n[1, 1])
return(data.frame(x=x, prob=prob, df=n[1,1]))
}
require(plyr, quietly=TRUE)
require(ggplot2, quietly=TRUE)
data <- ddply(data.frame(n=df_n), .(n), chisq_data)
head(data)
xxxxxxxxxx
## n x prob df
## 1 1 -0.20000000 0.000000 1
## 2 1 -0.14581940 0.000000 1
## 3 1 -0.09163880 0.000000 1
## 4 1 -0.03745819 0.000000 1
## 5 1 0.01672241 3.059352 1
## 6 1 0.07090301 1.446043 1
xxxxxxxxxx
tail(data)
xxxxxxxxxx
## n x prob df
## 1495 5 15.72910 0.003186504 5
## 1496 5 15.78328 0.003117378 5
## 1497 5 15.83746 0.003049697 5
## 1498 5 15.89164 0.002983433 5
## 1499 5 15.94582 0.002918558 5
## 1500 5 16.00000 0.002855045 5
xxxxxxxxxx
ggplot(data, aes(x=x, y=prob, color=factor(df), group=df))+geom_line(lwd=1)+scale_y_continuous(limits=c(0, 0.7))
书上P98例7-2:表格为
组别 | 有效 | 无效 | 合计 |
---|---|---|---|
胞磷胆碱组 | 46 | 6 | 52 |
神经节苷酯组 | 18 | 8 | 26 |
合计 | 64 | 14 | 78 |
H0:两种药物疗效相同 H1:有效率不等
xxxxxxxxxx
table7_2 <- matrix(c(46, 18, 6, 8), nrow=2, ncol=2)
chisq.test(table7_2)
xxxxxxxxxx
## Warning in chisq.test(table7_2): Chi-squared approximation may be incorrect
xxxxxxxxxx
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table7_2
## X-squared = 3.1448, df = 1, p-value = 0.07617
得到warning "Chi-squared approximation may be incorrect" 因为表格中有T<5, 此时可以采用校正【自动校正】或者fisher.test() 可以用以下代码查看理论值
xxxxxxxxxx
chisq.test(table7_2)$expected
xxxxxxxxxx
## Warning in chisq.test(table7_2): Chi-squared approximation may be incorrect
xxxxxxxxxx
## [,1] [,2]
## [1,] 42.66667 9.333333
## [2,] 21.33333 4.666667
超几何分布 从一个有限总体中进行不放回抽样,设N件产品,有M件不合格品,若从中不放回地随机抽取n件,则其中含有的不合格品件数X服从超几何分布,记为X~h(n, N, M)
P100 例7-4
组别 | 感染 | 未感染 | 合计 |
---|---|---|---|
预防组 | 4 | 18 | 22 |
非预防组 | 5 | 6 | 11 |
合计 | 9 | 24 | 33 |
假设两组(预防组和非预防组)的感染率都是9:33【零假设】,则边缘值固定的情况下,相当于在总数33的所有个体中【有9个感染的,24个未感染的】,取22个值作为有效组,在这22个值中,记感染的人数为X,则X~h(22, 9, 24)。 H0: 两个组无查别
xxxxxxxxxx
# x代表取出来的白球数, k代表取的次数,m代表总白球数,n代表总黑球数
sum(dhyper(x=0:9, k=22, m=9, n=24))
xxxxxxxxxx
## [1] 1
xxxxxxxxxx
p_values <- dhyper(x=0:9, k=22, m=9, n=24)
p <- sum(p_values[p_values<=dhyper(x=4, k=22, m=9, n=24)])
print(p)
xxxxxxxxxx
## [1] 0.1210448
xxxxxxxxxx
fisher.test(matrix(c(4, 5, 18, 6), nrow=2, ncol=2))
xxxxxxxxxx
##
## Fisher's Exact Test for Count Data
##
## data: matrix(c(4, 5, 18, 6), nrow = 2, ncol = 2)
## p-value = 0.121
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.03974151 1.76726409
## sample estimates:
## odds ratio
## 0.2791061
xxxxxxxxxx
mat <- matrix(c(11, 2, 12, 33), nrow=2)
# 方法1:
if (mat[1,2]+mat[2,1]>=40) {
x_sq <- (mat[1,2]-mat[2,1])^2/(mat[1,2]+mat[2,1])
}else { x_sq <- (abs(mat[1,2]-mat[2,1])-1)^2/(mat[1,2]+mat[2,1]) }
cat("chi-squareed = ", x_sq,
" p-value = ",
pchisq(x_sq, df=1, lower.tail=FALSE),
sep="")
xxxxxxxxxx
## chi-squareed = 5.785714 p-value = 0.01615693
xxxxxxxxxx
# 方法2:
mcnemar.test(mat)
xxxxxxxxxx
##
## McNemar's Chi-squared test with continuity correction
##
## data: mat
## McNemar's chi-squared = 5.7857, df = 1, p-value = 0.01616
参考: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/mcnemar.test.html
xmat <- matrix(
c(431, 388, 495, 137, 490, 410, 587, 179, 902, 800, 950, 32),
nrow=4)
chisq.test(mat)
xxxxxxxxxx
##
## Pearson's Chi-squared test
##
## data: mat
## X-squared = 213.16, df = 6, p-value < 2.2e-16
xxxxxxxxxx
x_sq <- chisq.test(mat)$statistic[[1]]
ContCoef <- sqrt(x_sq/(x_sq+sum(mat)))
print(ContCoef)
xxxxxxxxxx
## [1] 0.1882638
方法2:
xxxxxxxxxx
# install.packages("DescTools")
mat <- matrix(
c(431, 388, 495, 137, 490, 410, 587, 179, 902, 800, 950, 32),
nrow=4)
DescTools::ContCoef(mat)
xxxxxxxxxx
## [1] 0.1882638
参考: https://www.rdocumentation.org/packages/DescTools/versions/0.99.19/topics/Association%20measures
双向有序分组变量的线性趋势检验
双向有序属性相同
单向有序