250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Python] ๊ตฐ์ง‘ ํ‰๊ฐ€(์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜) ๋ณธ๋ฌธ

๐Ÿ›  Machine Learning/Clustering

[Python] ๊ตฐ์ง‘ ํ‰๊ฐ€(์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜)

ใ…… ใ…œ ใ…” ใ…‡ 2022. 3. 1. 21:10
728x90

Clustering Evaluation

iris ๋ฐ์ดํ„ฐ์…‹์—๋Š” ๊ฒฐ๊ณผ๊ฐ’์— ํ’ˆ์ข…์„ ์˜๋ฏธํ•˜๋Š” ํƒ€๊นƒ ๋ ˆ์ด๋ธ”์ด ์žˆ์–ด ๊ตฐ์ง‘ํ™”๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ์ง€ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ๊ตฐ์ง‘ํ™” ๋ฐ์ดํ„ฐ์…‹์—๋Š” ํƒ€๊นƒ ๋ ˆ์ด๋ธ”์ด ์กด์žฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ ๋‚ด์— ์ˆจ์–ด ์žˆ๋Š” ๋ณ„๋„์˜ ๊ทธ๋ฃน์„ ์ฐพ์•„ ์˜๋ฏธ๋ฅผ ๋ถ€์—ฌํ•˜๊ณ ๋‚˜, ๋™์ผํ•œ ๋ถ„๋ฅ˜ ๊ฐ’์— ์†ํ•˜๋”๋ผ๋„ ๊ทธ ์•ˆ์—์„œ ๋” ์„ธ๋ถ„ํ™”๋œ ๊ตฐ์ง‘ํ™”๋ฅผ ์ถ”๊ตฌํ•˜๊ฑฐ๋‚˜, ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ„๋ฅ˜๊ฐ’์˜ ๋ฐ์ดํ„ฐ๋„ ๋” ๋„“์€ ๊ตฐ์ง‘ํ™” ๋ ˆ๋ฒจํ™” ๋“ฑ์˜ ์˜์—ญ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋น„์ง€๋„ํ•™์Šต์˜ ํŠน์„ฑ ์ƒ ์ •ํ™•ํ•˜๊ฒŒ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ๋Š” ์–ด๋ ต์ง€๋งŒ, ๊ตฐ์ง‘ํ™”์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ์‹ค๋ฃจ์—ฃ ๋ถ„์„์„ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 

 

 

 

 

 

 

Silhouette analysis

์‹ค๋ฃจ์—ฃ ๋ถ„์„์ด๋ž€ ๊ฐ ๊ตฐ์ง‘ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ํšจ์œก์ ์œผ๋กœ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ๋Š” ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ํšจ์œจ์ ์œผ๋กœ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์€ ๋‹ค๋ฅธ ๊ตฐ์ง‘๊ณผ๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€๊ณ , ๋™์ผ ๊ตฐ์ง‘ ๋‚ด์—์„œ๋Š” ์„œ๋กœ ๊ฐ€๊น๊ฒŒ ์ž˜ ๋ญ‰์ณ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. 

 

์‹ค๋ฃจ์—ฃ ๋ถ„์„์€ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜(silhouette coefficient)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋Š” ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ™์€ ๊ตฐ์ง‘ ๋‚ด์˜ ๋ฐ์ดํ„ฐ์™€๋Š” ์–ผ๋งˆ๋‚˜ ๊ฐ€๊น๊ฒŒ ๊ตฐ์ง‘ํ™” ๋˜์–ด ์žˆ๊ณ , ๋‹ค๋ฅธ ๊ตฐ์ง‘์˜ ๋ฐ์ดํ„ฐ์™€๋Š” ์–ผ๋งˆ๋‚˜ ๋ฉ€๋ฆฌ ๋ถ„๋ฆฌ ๋˜์–ด ์žˆ๋Š” ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ์ด๋‹ค. 

 

 

a(i) :  ํ•ด๋‹น ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์™€ ๊ฐ™์€ ๊ตฐ์ง‘ ๋‚ด์— ์žˆ๋Š” ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์™€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‰๊ท ํ•œ ๊ฐ’ 

b(i) : ํ•ด๋‹น ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์†ํ•˜์ง€ ์•Š์€ ๊ตฐ์ง‘ ์ค‘ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ตฐ์ง‘๊ณผ์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ

 

๋‘ ๊ตฐ์ง‘๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š” ๊ฐ€์˜ ๊ฐ’์€ b(i)-a(i) ์ด๋ฉฐ, ์ด ๊ฐ’์„ ์ •๊ทœํ™” ํ•˜๊ธฐ ์œ„ํ•ด MAX(a(i),b(i))๊ฐ’์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค. 

๋”ฐ๋ผ์„œ i๋ฒˆ์งธ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๊ฐ’ s(i)๋Š” ์œ„์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค. 

 

์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋Š” -1~1์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ๊ทผ์ฒ˜์˜ ๊ตฐ์ง‘๊ณผ ๋” ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๋Š” ๊ฒƒ์„ ์˜๋ฏธ,

0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ทผ์ฒ˜์˜ ๊ตฐ์ง‘๊ณผ ๊ฐ€๊นŒ์›Œ์ง„๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. 

-๊ฐ’์€ ์•„์˜ˆ ๋‹ค๋ฅธ ๊ตฐ์ง‘์— ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ํ• ๋‹น๋์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

 

 

 

 

 

 

์ข‹์€ ๊ตฐ์ง‘ํ™” ์กฐ๊ฑด

  • ์ „์ฒด ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜์˜ ํ‰๊ท ๊ฐ’, ์ฆ‰ ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ silhouette_score()๊ฐ’์ด 0~1์‚ฌ์ด์ด๋ฉฐ, 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ์ข‹๋‹ค.
  • ์ „์ฒด ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜์˜ ํ‰๊ท ๊ฐ’๊ณผ ๊ฐœ๋ณ„ ๊ตฐ์ง‘์˜ ํ‰๊ท ๊ฐ’์˜ ํŽธ์ฐจ๊ฐ€ ํฌ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค. ์ฆ‰, ๊ฐœ๋ณ„ ๊ตฐ์ง‘์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ํ‰๊ท ๊ฐ’์ด ์ „์ฒด ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜์˜ ํ‰๊ท ๊ฐ’์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค. 

 

 

 


 

 

 

iris ๋ฐ์ดํ„ฐ ์…‹์„ ์ด์šฉํ•œ ๊ตฐ์ง‘ ํ‰๊ฐ€

sklearn.metircs ๋ชจ๋“ˆ์˜ silhouette_samples()์™€ silhouette_score()์ด์šฉ

from sklearn.preprocessing import scale
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline

iris = load_iris()
feature_names = ['sepal_length','sepal_width','petal_length','petal_width']
irisdf = pd.DataFrame(data = iris.data,columns = feature_names)
kmeans = KMeans(n_clusters=3,init='k-means++',max_iter=300,random_state=0).fit(irisdf)
irisdf['cluster'] = kmeans.labels_

#iris์˜ ๋ชจ๋“  ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ์— ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ๊ฐ’์„ ๊ตฌํ•จ
score_samples = silhouette_samples(iris.data,irisdf['cluster'])
print('์‹ค๋ฃจ์—ฃ return ๊ฐ’์˜ shape : ',score_samples.shape)

#irisdf์— ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ์นผ๋Ÿผ ์ถ”๊ฐ€
irisdf['silhouette'] = score_samples

#๋ชจ๋“  ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท  ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ๊ฐ’
average_score = silhouette_score(iris.data,irisdf['cluster'])
print('๋ถ“๊ฝƒ ๋ฐ์ดํ„ฐ ์…‹ Silhoutte Anaylsis Score : {:.3f}'.format(average_score))
irisdf.head()

 

ํ‰๊ท  ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ๊ฐ’์€ 0.553์ด์ง€๋งŒ, 1๋ฒˆ ๊ตฐ์ง‘์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ๊ฐ’์€ 0.8์ด์ƒ์œผ๋กœ ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚œ๋‹ค.

 

 

 

 

irisdf.groupby('cluster')['silhouette'].mean()

 

๊ตฐ์ง‘ ์ปฌ๋Ÿผ๋ณ„๋กœ group by ํ•˜์—ฌ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜์˜ ํ‰๊ท ๊ฐ’์„ ์ถœ๋ ฅํ•ด ๋ณด๋‹ˆ 1๋ฒˆ ๊ตฐ์ง‘์€ 0.79์ด์ง€๋งŒ 0๋ฒˆ ๊ตฐ์ง‘๊ณผ 2๋ฒˆ ๊ตฐ์ง‘์€ 0.4๋กœ ์ƒ๋Œ€์ ์œผ๋กœ ํ‰๊ท ๊ฐ’์ด 1๋ฒˆ์— ๋น„ํ•˜์—ฌ ๋‚ฎ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

 

 

 


 

 

 

 

๊ตฐ์ง‘๋ณ„ ํ‰๊ท  ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜์™€ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•

์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท  ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ๊ฐ’์ด ๋†’๋‹ค๊ณ  ํ•ด์„œ ๋ฐ˜๋“œ์‹œ ์ตœ์  ๊ตฐ์ง‘ ๊ฐœ์ˆ˜๋กœ ๊ตฐ์ง‘ํ™”๊ฐ€ ์ž˜ ๋๋‹ค๊ณ  ๋ณผ ์ˆ˜๋Š” ์—†๋‹ค.

๊ฐœ๋ณ„ ๊ตฐ์ง‘๋ณ„๋กœ ์ ๋‹นํžˆ ๋ถ„๋ฆฌ๋œ ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ตฐ์ง‘ ๋‚ด์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋กœ ๋ญ‰์ณ ์žˆ๋Š” ๊ฒฝ์šฐ์— KMeans์˜ ์ ์ ˆํ•œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜๊ฐ€ ์„ค์ •๋˜์—ˆ๋‹ค๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ์‹œ๊ฐํ™” ํ•จ์ˆ˜

def visualize_silhouette(cluster_lists, X_features): 
    
    from sklearn.datasets import make_blobs
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_samples, silhouette_score

    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import math
    
    # ์ž…๋ ฅ๊ฐ’์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋“ค์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ›์•„์„œ, ๊ฐ ๊ฐฏ์ˆ˜๋ณ„๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ ์šฉํ•˜๊ณ  ์‹ค๋ฃจ์—ฃ ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•จ
    n_cols = len(cluster_lists)
    
    # plt.subplots()์œผ๋กœ ๋ฆฌ์ŠคํŠธ์— ๊ธฐ์žฌ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ˆ˜๋งŒํผ์˜ sub figures๋ฅผ ๊ฐ€์ง€๋Š” axs ์ƒ์„ฑ 
    fig, axs = plt.subplots(figsize=(4*n_cols, 4), nrows=1, ncols=n_cols)
    
    # ๋ฆฌ์ŠคํŠธ์— ๊ธฐ์žฌ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋“ค์„ ์ฐจ๋ก€๋กœ iteration ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ์‹ค๋ฃจ์—ฃ ๊ฐœ์ˆ˜ ์‹œ๊ฐํ™”
    for ind, n_cluster in enumerate(cluster_lists):
        
        # KMeans ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ˆ˜ํ–‰ํ•˜๊ณ , ์‹ค๋ฃจ์—ฃ ์Šค์ฝ”์–ด์™€ ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ์˜ ์‹ค๋ฃจ์—ฃ ๊ฐ’ ๊ณ„์‚ฐ. 
        clusterer = KMeans(n_clusters = n_cluster, max_iter=500, random_state=0)
        cluster_labels = clusterer.fit_predict(X_features)
        
        sil_avg = silhouette_score(X_features, cluster_labels)
        sil_values = silhouette_samples(X_features, cluster_labels)
        
        y_lower = 10
        axs[ind].set_title('Number of Cluster : '+ str(n_cluster)+'\n' \
                          'Silhouette Score :' + str(round(sil_avg,3)) )
        axs[ind].set_xlabel("The silhouette coefficient values")
        axs[ind].set_ylabel("Cluster label")
        axs[ind].set_xlim([-0.1, 1])
        axs[ind].set_ylim([0, len(X_features) + (n_cluster + 1) * 10])
        axs[ind].set_yticks([])  # Clear the yaxis labels / ticks
        axs[ind].set_xticks([0, 0.2, 0.4, 0.6, 0.8, 1])
        
        # ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋ณ„๋กœ fill_betweenx( )ํ˜•ํƒœ์˜ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ ํ‘œํ˜„. 
        for i in range(n_cluster):
            ith_cluster_sil_values = sil_values[cluster_labels==i]
            ith_cluster_sil_values.sort()
            
            size_cluster_i = ith_cluster_sil_values.shape[0]
            y_upper = y_lower + size_cluster_i
            
            color = cm.nipy_spectral(float(i) / n_cluster)
            axs[ind].fill_betweenx(np.arange(y_lower, y_upper), 0, ith_cluster_sil_values, \
                                facecolor=color, edgecolor=color, alpha=0.7)
            axs[ind].text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
            y_lower = y_upper + 10
            
        axs[ind].axvline(x=sil_avg, color="red", linestyle="--")
# make_blobs ์„ ํ†ตํ•ด clustering ์„ ์œ„ํ•œ 4๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์˜ 500๊ฐœ 2์ฐจ์› ๋ฐ์ดํ„ฐ ์…‹ ์ƒ์„ฑ  
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=500, n_features=2, centers=4, cluster_std=1, \
                  center_box=(-10.0, 10.0), shuffle=True, random_state=1)  

# cluster ๊ฐœ์ˆ˜๋ฅผ 2๊ฐœ, 3๊ฐœ, 4๊ฐœ, 5๊ฐœ ์ผ๋•Œ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ณ„ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ํ‰๊ท ๊ฐ’์„ ์‹œ๊ฐํ™” 
visualize_silhouette([ 2, 3, 4, 5], X)

๊ตฐ์ง‘์ด ๊ฐ๊ฐ 2,3,4,5๊ฐœ์ผ ๋•Œ์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋ฅผ ์‹œ๊ฐํ™” ํ•ด๋ณธ ๊ฒฐ๊ณผ 4๊ฐœ๋กœ ๊ตฐ์ง‘ํ™”ํ•˜์˜€์„ ๋•Œ ๊ฐ€์žฅ ์ด์ƒ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

1) ๊ตฐ์ง‘์ด 2๊ฐœ์ผ ๋•Œ

1๋ฒˆ ๊ตฐ์ง‘์€ 0๋ฒˆ ๊ตฐ์ง‘๊ณผ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๊ณ , ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ๋ผ๋ฆฌ๋„ ์ž˜ ๋ญ‰์ณ ์žˆ์ง€๋งŒ, 0๋ฒˆ ๊ตฐ์ง‘์˜ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ๋ผ๋ฆฌ๋Š” ๋งŽ์ด ๋–จ์–ด์ ธ ์žˆ๋‹ค. 

 

 

 

2) ๊ตฐ์ง‘์ด 3๊ฐœ์ผ ๋•Œ

1๋ฒˆ ๊ตฐ์ง‘๊ณผ 2๋ฒˆ ๊ตฐ์ง‘์€ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๊ฐ€ ํ‰๊ท ๊ฐ’ ์ด์ƒ์ด๋ฉฐ ๊ตฐ์ง‘์˜ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ๋“ค์ด ์ž˜ ๋ญ‰์ณ์ ธ ์žˆ์ง€๋งŒ,

0๋ฒˆ ๊ตฐ์ง‘์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋Š” ํ‰๊ท  ์ดํ•˜์ด๋ฉฐ ๋ฉ€๋ฆฌ ํผ์ ธ ์žˆ๋‹ค. 

 

 

 

 

3) ๊ตฐ์ง‘์ด 4๊ฐœ์ผ ๋•Œ

1๋ฒˆ ๊ตฐ์ง‘์˜ ๋ชจ๋“  ๊ฐ’์€ ํ‰๊ท  ์ด์ƒ์ด๋ฉฐ, ๋‚˜๋จธ์ง€ ๊ตฐ์ง‘๋“ค์˜ ๋ฐ์ดํ„ฐ๋“ค๋„ ํ‰๊ท  ์ด์ƒ์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•„ ๊ฐ€์žฅ ์ด์ƒ์ ์ธ ๊ตฐ์ง‘ํ™” ๊ฐœ์ˆ˜๋กœ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

iris ๋ฐ์ดํ„ฐ์˜ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”

from sklearn.datasets import load_iris

iris=load_iris()
visualize_silhouette([ 2, 3, 4,5 ], iris.data)

iris ๋ฐ์ดํ„ฐ์˜ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ 2๊ฐœ์˜ ๊ตฐ์ง‘์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ๋ฐ”๋žŒ์งํ•ด ๋ณด์ธ๋‹ค. 

 

 

 

 

##์ฐธ๊ณ 

https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

 

Selecting the number of clusters with silhouette analysis on KMeans clustering

Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the ne...

scikit-learn.org

 

 

728x90

'๐Ÿ›  Machine Learning > Clustering' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Python] DBSCAN  (0) 2022.03.04
[python] GMM(Gaussian Mixture Model)  (0) 2022.03.03
[Python] ํ‰๊ท  ์ด๋™  (0) 2022.03.02
[Python] KMeans Clustering(K-ํ‰๊ท  ๊ตฐ์ง‘ํ™”)  (0) 2022.02.28
Comments