250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Python] ๊ฐ์„ฑ๋ถ„์„ - ๋น„์ง€๋„ ํ•™์Šต ๋ณธ๋ฌธ

๐Ÿ›  Machine Learning/ํ…์ŠคํŠธ ๋ถ„์„

[Python] ๊ฐ์„ฑ๋ถ„์„ - ๋น„์ง€๋„ ํ•™์Šต

ใ…… ใ…œ ใ…” ใ…‡ 2022. 2. 20. 12:45
728x90

์ด์ „ ํฌ์ŠคํŒ…(์ง€๋„ํ•™์Šต)์— ์ด์–ด์„œ ๋น„์ง€๋„ ํ•™์Šต์˜ ๊ฐ์„ฑ ๋ถ„์„๊นŒ์ง€ ๊ณต๋ถ€ํ•ด ๋ณด๋ ค๊ณ  ํ•œ๋‹ค!

https://suhye.tistory.com/entry/mn?category=1040378 

 

[Python] ๊ฐ์„ฑ ๋ถ„์„(Sentiment Analysis) - ์ง€๋„ํ•™์Šต

๊ฐ์„ฑ๋ถ„์„ ์ด๋ž€? ๊ฐ์„ฑ๋ถ„์„์ด๋ž€ ๋ฌธ์„œ์˜ ์ฃผ๊ด€์ ์ธ ๊ฐ์„ฑ/์˜๊ฒฌ/๊ฐ์ •/๊ธฐ๋ถ„ ๋“ฑ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ, ์†Œ์…œ๋ฏธ๋””์–ด๋‚˜ ์—ฌ๋ก ์กฐ์‚ฌ, ์˜จ๋ผ์ธ ๋ฆฌ๋ทฐ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ฌธ์„œ์˜ ๊ธ€์ž๊ฐ€ ๋‚˜ํƒ€

suhye.tistory.com

 

 

๋น„์ง€๋„ํ•™์Šต ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„์€ ์‚ฌ์ „(Lexicon)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•œ๋‹ค. 

๋Œ€๋ถ€๋ถ„์˜ ๊ฐ์„ฑ ๋ถ„์„ ๋ฐ์ดํ„ฐ์…‹์€ ๋ ˆ์ด๋ธ”์„ ๊ฐ–๊ณ  ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Lexicon์ด ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค. 

 

 

Lexicon์€ ๊ธ์ • ๊ฐ์„ฑ ๋˜๋Š” ๋ถ€์ • ๊ฐ์„ฑ์˜ ์ •๋„๋ฅผ ์˜๋ฏธํ•˜๋Š” ์ˆ˜์น˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ์ด๋ฅผ "๊ฐ์„ฑ ์ง€์ˆ˜" ๋ผ๊ณ  ํ•œ๋‹ค. 

์ฃผ๋กœ ๋‹จ์–ด์˜ ์œ„์น˜, ๋‹จ์–ด, ๋ฌธ๋งฅ ๋“ฑ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๊ฐ์„ฑ ์ง€์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. 

 

 

NLP ํŒจํ‚ค์ง€์˜ WordNet์€ ์˜์–ด ์–ดํœ˜ ์‚ฌ์ „์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์‰ฝ๋‹ค. ๋‹จ์ˆœํžˆ ์–ธ์–ด์˜ ๋œป์„ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ Sementic์„ ์ œ๊ณตํ•ด์ฃผ๋Š” ์–ดํœ˜ ์‚ฌ์ „์ด๋‹ค. 

๋‹จ์–ด๋Š” ๋ฌธ์žฅ์—์„œ ๋ฌธ๋งฅ์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ํ•ด์„๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ๋ฆ„์— ๋งž์ถฐ์„œ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. 

 

๋”ฐ๋ผ์„œ WordNet์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ ๊ฐ™์€ ๋‹จ์–ด๋ผ๋„ ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์–ด์˜ ์‹œ๋งจํ‹ฑ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ๋‹จ์–ด์˜ ํ’ˆ์‚ฌ๋กœ ๊ตฌ์„ฑ๋œ ๊ฐœ๋ณ„ ๋‹จ์–ด๋ฅผ Synset์ด๋ผ๋Š” ๊ฐœ๋…์„ ์ด์šฉํ•˜์—ฌ ํ‘œํ˜„ํ•œ๋‹ค. 

 

Synset์€ Set of cognitive synonyms์˜ ์ค„์ž„๋ง์œผ๋กœ ๋‹จ์ˆœํžˆ ํ•˜๋‚˜์˜ ๋‹จ์–ด๊ฐ€ ์•„๋‹Œ ๊ทธ ๋‹จ์–ด๊ฐ€ ๊ฐ€์ง€๋Š” ๋ฌธ๋งฅ, ์‹œ๋งจํ‹ฑ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” WordNet์˜ ํ•ต์‹ฌ ๊ฐœ๋…์ด๋‹ค. 

 

 

 

 

๋Œ€ํ‘œ์ ์ธ ๊ฐ์„ฑ ์‚ฌ์ „

1. SentiWordNet : NLTK์˜ WordNet๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๋‹จ์–ด ์ „์šฉ์˜ WordNet์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ์ด๋‹ค. WordNet์˜ Synset๋ณ„๋กœ ๊ฐ์„ฑ ์ ์ˆ˜๋ฅผ ํ• ๋‹นํ•˜๋Š”๋ฐ, ๊ธ์ •, ๋ถ€์ •, ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜๋ฅผ ํ• ๋‹นํ•œ๋‹ค. ๊ฐ๊ด€์ ์ธ ๊ฐ์„ฑ ์ ์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌธ์žฅ ๋ณ„ ๋‹จ์–ด์˜ ์ง€์ˆ˜๋“œ๋ฅผ ํ•ฉ์‚ฐํ•˜์—ฌ ์ตœ์ข… ๊ฐ์„ฑ ์ง€์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ ํ›„, ์ด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ฐ์„ฑ์ด ๊ธ์ •์ธ์ง€ ๋ถ€์ •์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. 

 

 

2. VADER : ์ฃผ๋กœ ์†Œ์…œ ๋ฏธ๋””์–ด์˜ ํ…์ŠคํŠธ์— ๋Œ€ํ•œ ๊ฐ์„ฑ ๋ถ„์„์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•œ ํŒจํ‚ค์ง€๋กœ, ๋น„๊ต์  ๋น ๋ฅด๊ฒŒ ๋›ฐ์–ด๋‚œ ๊ฐ์„ฑ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. 

 

 

3. Pattern : ์„ฑ๋Šฅ ์ธก๋ฉด์—์„œ ๊ฐ€์žฅ ์ฃผ๋ชฉ ๋ฐ›๋Š” ํŒจํ‚ค์ง€์ด์ง€๋งŒ ํŒŒ์ด์ฌ ๋ฒ„์ „3์—์„œ๋Š” ํ˜ธํ™˜์ด ๋˜์ง€ ์•Š๋Š”๋‹ค(ใ… )

 

 

 

 

SentiWordNet์„ ์ด์šฉํ•œ ๊ฐ์„ฑ ๋ถ„์„

1. ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋กœ๋”ฉ

import nltk
nltk.download('all')

nltk์˜ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋‹ค์šด๋ฐ›์•„ ์ค€๋‹ค!

(์ข€ ์˜ค๋ž˜๊ฑธ๋ฆผ)

 

 

 

2. 'present' ๋‹จ์–ด์— ๋Œ€ํ•œ Synset ์ถ”์ถœ

from nltk.corpus import wordnet as wn

term = 'present'

#present ๋‹จ์–ด๋กœ wordnet์˜ synsets ์ƒ์„ฑ
synsets = wn.synsets(term)
print("synsets() type : ",type(synsets))
print("synsets() ๋ฐ˜ํ™˜ ๊ฐ’ ๊ฐœ์ˆ˜ : ",len(synsets))
print("synsets() ๋ฐ˜ํ™˜ ๊ฐ’ : ",synsets)

ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ง€์ •๋œ ๋‹จ์–ด์— ๋Œ€ํ•ด WordNet์— ๋“ฑ์žฌ๋˜์–ด์žˆ๋Š” ๋ชจ๋“  Synset ๊ฐ์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค. 

 

present์˜ synset์„ ๋ฐ˜ํ™˜ํ•œ ๊ฒฐ๊ณผ, ์„œ๋กœ ๋‹ค๋ฅธ semantic์„ ๊ฐ€์ง€๋Š” synset ๊ฐ์ฒด๊ฐ€ ๋ฐ˜ํ™˜๋˜์—ˆ๋‹ค. 

'present.n.01'์—์„œ present๋Š” ์˜๋ฏธ, n์€ ๋ช…์‚ฌ ํ’ˆ์‚ฌ, 01์€ present๊ฐ€ ๋ช…์‚ฌ๋กœ์„œ ๊ฐ€์ง€๋Š” ์˜๋ฏธ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ์ธ๋ฑ์Šค๋ฅผ ์˜๋ฏธํ•œ๋‹ค. 

 

 

 

Synset๊ฐ์ฒด๊ฐ€ ๊ฐ€์ง€๋Š” ์—ฌ๋Ÿฌ ์†์„ฑ

for synset in synsets:
    print("-----Synset name : ",synset.name(),'-----')
    print("POS : ",synset.lexname())
    print("Definition : ",synset.definition())
    print("Lemmas : ",synset.lemma_names())

 

 

present๋Š” ๊ฐ™์€ ๋ช…์‚ฌ์ง€๋งŒ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋œป์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. 

'present.n.01'์€ ์‹œ๊ฐ„์ ์ธ ์˜๋ฏธ๋กœ ํ˜„์žฌ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. 

'present.n.02'๋Š” ์„ ๋ฌผ์„ ์˜๋ฏธํ•œ๋‹ค. 

์ด์ฒ˜๋Ÿผ synset์€ ํ•˜๋‚˜์˜ ๋‹จ์–ด๊ฐ€ ๊ฐ€์ง€๋Š” ์—ฌ๋Ÿฌ ์‹œ๋งจํ‹ฑ ์ •๋ณด๋ฅผ ๊ฐœ๋ณ„ ํด๋ž˜์Šค๋กœ ๋‚˜ํƒ€๋‚ด์–ด ์ค€๋‹ค. 

 

 

 

 

์–ธ์–ด๊ฐ„์˜ ๊ด€๊ณ„ ์œ ์‚ฌ๋„

#synset ๊ฐ์ฒด๋ฅผ ๋‹จ์–ด๋ณ„๋กœ ์ƒ์„ฑ
tree = wn.synset('tree.n.01')
lion = wn.synset('lion.n.01')
tiger = wn.synset('tiger.n.02')
cat = wn.synset('cat.n.01')
dog = wn.synset('dog.n.01')

entities = [tree,lion,tiger,cat,dog]
similarities = []
entity_names = [entity.name().split('.')[0] for entity in entities]



#๋‹จ์–ด๋ณ„ synset๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋‹ค๋ฅธ ๋‹จ์–ด์˜ synset๊ณผ์˜ ์œ ์‚ฌ๋„ ์ธก์ •
for entity in entities:
    similarity = [round(entity.path_similarity(compared_entity),2) for compared_entity in entities]
    similarities.append(similarity)
    
    
    
#๊ฐœ๋ณ„ ๋‹จ์–ด๋ณ„ synset๊ณผ ๋‹ค๋ฅธ ๋‹จ์–ด synset๊ณผ์˜ ์œ ์‚ฌ๋„๋ฅผ DataFrameํ˜•ํƒœ๋กœ ์ €์žฅ
similarity_df = pd.DataFrame(similarities,columns=entity_names,index = entity_names)
similarity_df

WordNet์—์„œ๋Š” ์–ดํœ˜ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์œ ์‚ฌ๋„๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. 

Synset๊ฐ์ฒด๋Š” ๋‹จ์–ด ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด์„œ path_similarity()๋ฅผ ๋ฉ”์„œ๋“œ๋กœ ์ œ๊ณตํ•œ๋‹ค. 

 

 

'tree', 'lion', 'tiger', 'cat', 'dog'์˜ ์ƒํ˜ธ ์œ ์‚ฌ๋„ ๊ฒฐ๊ณผ ๊ฐ ๊ฐ์ฒด๋ณ„ ์œ ์‚ฌ๋„ ์ •๋„๊ฐ€ ์ˆ˜์น˜๋กœ ํ‘œํ˜„๋˜์—ˆ๋‹ค. 

 

 

 

 

Senti_Synset

WordNet์˜ Synset๊ณผ ์œ ์‚ฌํ•œ SentiWordNet์˜ Senti_Synset ํด๋ž˜์Šค๋Š” ํด๋ž˜์Šค๋ฅผ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•ด์ค€๋‹ค.

import nltk
from nltk.corpus import sentiwordnet as swn

senti_synsets = list(swn.senti_synsets('slow'))
print('senti_synsets() ๋ฐ˜ํ™˜ type :',type(senti_synsets))
print('senti_synsets() ๋ฐ˜ํ™˜ ๊ฐ’ ๊ฐœ์ˆ˜ :',len(senti_synsets))
print('senti_synsets() ๋ฐ˜ํ™˜ ๊ฐ’ :',senti_synsets)

 

 

 

 

 

๋‹จ์–ด์˜ ๊ฐ์„ฑ ์ง€์ˆ˜์™€ ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜

father = swn.senti_synset('father.n.01')
print('father ๊ธ์ •๊ฐ์„ฑ ์ง€์ˆ˜ : ',father.pos_score())
print('father ๋ถ€์ •๊ฐ์„ฑ ์ง€์ˆ˜ : ',father.neg_score())
print('father ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜ : ',father.obj_score())
print("\n---------------------")

fabulous = swn.senti_synset('fabulous.a.01')
print('fabulous ๊ธ์ • ๊ฐ์„ฑ ์ง€์ˆ˜ : ',fabulous.pos_score())
print('fabulous ๋ถ€์ • ๊ฐ์„ฑ ์ง€์ˆ˜ : ',fabulous.neg_score())
print('fabulous ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜ : ',fabulous.obj_score())

father ๋‹จ์–ด์˜ ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜๋Š” 1, ๊ฐ์„ฑ ์ง€์ˆ˜๋Š” 0์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ,

'fabulous' ๋‹จ์–ด์˜ ๊ธ์ • ์ง€์ˆ˜๋Š” 0.875, ๋ถ€์ •์ง€์ˆ˜๋Š” 0.125, ๊ฐ๊ด€์„ฑ ์ง€์ˆ˜๋Š” 0์œผ๋กœ ํ™•์ธ๋˜์—ˆ๋‹ค. 

 

728x90
Comments