250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Python] ์ผ์› ๋ถ„์‚ฐ ๋ถ„์„(ANOVA) ๋ณธ๋ฌธ

๐Ÿ›  Machine Learning/๊ธฐ์ดˆ ํ†ต๊ณ„

[Python] ์ผ์› ๋ถ„์‚ฐ ๋ถ„์„(ANOVA)

ใ…… ใ…œ ใ…” ใ…‡ 2022. 3. 16. 13:07
728x90
๋ถ„์‚ฐ๋ถ„์„
  • ๋‘ ๊ฐœ ์ด์ƒ์˜ ์ง‘๋‹จ์—์„œ ๊ทธ๋ฃน ํ‰๊ท  ๊ฐ„ ์ฐจ์ด๋ฅผ ๊ทธ๋ฃน ๋‚ด ๋ณ€๋™์— ๋น„๊ตํ•˜์—ฌ ์‚ดํŽด๋ณด๋Š” ํ†ต๊ณ„ ๋ถ„์„ ๊ธฐ๋ฒ•
  • ๋‘ ๊ฐœ ์ด์ƒ์˜ ์ง‘๋‹จ์˜ ํ‰๊ท  ์ฐจ์ด์— ๋Œ€ํ•œ ํ†ต๊ณ„์  ์œ ์˜์„ฑ ๊ฒ€์ •

 

์ผ์› ๋ฐฐ์น˜ ๋ถ„์‚ฐ ๋ถ„์„
  • ๋ถ„์‚ฐ๋ถ„์„์—์„œ ๋ฐ˜์‘๊ฐ’์— ๋Œ€ํ•œ ํ•˜๋‚˜์˜ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์˜ ์˜ํ–ฅ์„ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋จ
  • ๋ชจ์ง‘๋‹จ์˜ ์ˆ˜์—๋Š” ์ œํ•œ์ด ์—†์œผ๋ฉฐ, ํ‘œ๋ณธ์˜ ์ˆ˜๋Š” ๊ฐ™์ง€ ์•Š์•„๋„ ๋จ
  • F ๊ฒ€์ • ํ†ต๊ณ„๋Ÿ‰ ์‚ฌ์šฉ
  • ๊ฐ ์ง‘๋‹จ์˜ ์ธก์ •์น˜๋Š” ๋…๋ฆฝ์ ์ด๋ฉฐ, ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ผ์•ผ ํ•จ
  • ๊ฐ ์ง‘๋‹จ ์ธก์ •์น˜์˜ ๋ถ„์‚ฐ์€ ๊ฐ™๋‹ค๊ณ  ๊ฐ€์ •(๋“ฑ๋ถ„์‚ฐ์„ฑ)

 

์š”์ธ ์ œ๊ณฑํ•ฉ(SS) ์ž์œ ๋„(df) ํ‰๊ท ์ œ๊ณฑ(MS) ๋ถ„์‚ฐ๋น„(F)
์ฒ˜๋ฆฌ SSA k-1 MSA F = MSA/MSE
์˜ค์ฐจ SSE N-k MSE  
์ „์ฒด SST N-1    

 

 

๊ท€๋ฌด๊ฐ€์„ค(H0) : k๊ฐœ์˜ ์ง‘๋‹จ ๊ฐ„ ๋ชจํ‰๊ท ์—๋Š” ์ฐจ์ด๊ฐ€ ์—†๋‹ค. 

๋Œ€๋ฆฝ๊ฐ€์„ค(H1) : k๊ฐœ์˜ ์ง‘๋‹จ ๊ฐ„ ๋ชจํ‰๊ท ์ด ๋ชจ๋‘ ๊ฐ™๋‹ค๊ณ ๋Š” ํ•  ์ˆ˜ ์—†๋‹ค. 

 

์‚ฌํ›„ ๊ฒ€์ •

๋ถ„์‚ฐ๋ถ„์„ ๊ฒฐ๊ณผ ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋˜์–ด ์ ์–ด๋„ ํ•œ ์ง‘๋‹จ์—์„œ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ํ†ต๊ณ„์ ์œผ๋กœ ์ฆ๋ช…๋œ ๊ฒฝ์šฐ, ์–ด๋–ค ์ง‘๋‹จ๋“ค์— ๋Œ€ํ•ด ํ‰๊ท ์˜ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•˜๋Š” ์ง€๋ฅผ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ์‹ค์‹œํ•˜๋Š” ๋ถ„์„

- ๋˜์นธ์˜ MRT, ํ”ผ์…”์˜ ์ตœ์†Œ์œ ์˜์ฐจ(LSD), ํŠœํ‚ค์˜ HSD, Scheffe ๋“ฑ๋“ฑ

 

 

 

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM
from scipy import stats

 

group_list = ['a','b','c']
subs_list = ['01','02','03','04','05','06','07','08','09','10']

df_1way = pd.DataFrame(columns = ['group','my_value'])
my_row = 0
for ind_g, group in enumerate(group_list):
    for sub in subs_list:
        my_val = np.random.normal(ind_g,1,1)[0]
        df_1way.loc[my_row] = [group,my_val]
        my_row = my_row+1

 

 

 

my_model = smf.ols(formula = 'my_value ~ group',data = df_1way)

my_model_fit = my_model.fit()

anova = sm.stats.anova_lm(my_model_fit,typ=2)
print(anova)

 

 

scipy stats ํ™œ์šฉ

stats.f_oneway(df_1way[df_1way['group'] == 'a'].my_value, df_1way[df_1way['group'] == 'b'].my_value,df_1way[df_1way['group'] == 'c'].my_value)

 

 

์ •๊ทœ์„ฑ ๊ฒ€์ • ์‹คํŒจ์‹œ -> kruskal test

from scipy import stats
x = [1,3,5,7,9]
y = [2,4,6,8,10]
stats.kruskal(x,y)

728x90
Comments