250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Python] SVD(Singular Value Decomposition) ๋ณธ๋ฌธ

๐Ÿ›  Machine Learning/์ฐจ์› ์ถ•์†Œ

[Python] SVD(Singular Value Decomposition)

ใ…… ใ…œ ใ…” ใ…‡ 2022. 3. 7. 21:46
728x90

SVD ๊ฐœ์š”

SVD๋Š” PCA์™€ ๋น„์Šทํ•˜๊ฒŒ ํ–‰๋ ฌ ๋ถ„ํ•ด ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜์ง€๋งŒ, PCA๋Š” ์ •๋ฐฉํ–‰๋ ฌ๋งŒ์„ ๊ณ ์œ  ๋ฒกํ„ฐ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๋ฐ˜๋ฉด, SVD๋Š” ํ–‰๊ณผ ์—ด์ด ๋‹ค๋ฅธ ๋ชจ๋“  ํ–‰๋ ฌ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

์ผ๋ฐ˜์ ์œผ๋กœ, SVD๋Š” m×n ํฌ๊ธฐ์˜ ํ–‰๋ ฌ A๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„ํ•ดํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. 

 

์ถœ์ฒ˜ : https://www.fun-coding.org/recommend_basic6.html

 

SVD๋Š” ํŠน์ด๊ฐ’ ๋ถ„ํ•ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, ํ–‰๋ ฌ U์™€ V์— ์†ํ•˜๋Š” ๋ฒกํ„ฐ๋Š” ํŠน์ด๋ฒกํ„ฐ์ด๋‹ค. ๋ชจ๋“  ํŠน์ด๋ฒกํ„ฐ๋Š” ์„œ๋กœํ•˜๋Š” ์„ฑ์งˆ์„ ๊ฐ€์ง„๋‹ค. 

U : m×m ํฌ๊ธฐ์˜ ํ–‰๋ ฌ, ์—ญํ–‰๋ ฌ์ด ๋Œ€์นญ ํ–‰๋ ฌ

∑ : m×n ํฌ๊ธฐ์˜ ํ–‰๋ ฌ, ๋น„๋Œ€๊ฐ ์„ฑ๋ถ„์ด 0

V : n×nํฌ๊ธฐ์˜ ํ–‰๋ ฌ, ์—ญํ–‰๋ ฌ์ด ๋Œ€์นญ

V,U๋Š” ์ง๊ตํ–‰๋ ฌ

 

 

๋žœ๋คํ–‰๋ ฌ ์ƒ์„ฑ

import numpy as np
from numpy.linalg import svd

np.random.seed(121)
a = np.random.randn(4,4)
print(np.round(a,3))

๊ฐœ๋ณ„ ๋กœ์šฐ์˜ ์˜์กด์„ฑ์„ ์—†์• ๊ธฐ ์œ„ํ•ด ๋žœ๋ค์œผ๋กœ 4×4์˜ ๋žœ๋คํ–‰๋ ฌ ์ƒ์„ฑ

 

 

 

U,Sigma,Vt ๋„์ถœ

U,Sigma,Vt = svd(a)
print(U.shape,Sigma.shape,Vt.shape)
print("U : \n",np.round(U,3))
print("Sigma Value \n",np.round(Sigma,3))
print("V transpose matrix :\n",np.round(Vt,3))

svd์— ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์›๋ณธ ํ–‰๋ ฌ์„ ์ž…๋ ฅํ•˜๋ฉด, U,Sigma, V์ „์น˜ ํ–‰๋ ฌ์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

∑(Sigma)ํ–‰๋ ฌ์€ ๋Œ€๊ฐ์— ์œ„์น˜ํ•œ ๊ฐ’๋งŒ 0์ด ์•„๋‹ˆ๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ๋Š” ๋ชจ๋‘ 0์ด๋ฏ€๋กœ 0์ด ์•„๋‹Œ ๊ฒฝ์šฐ๋งŒ 1์ฐจ์› ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. 

 

 

 

Sigma๋ฅผ ๋‹ค์‹œ 0์„ ํฌํ•จํ•œ ๋Œ€์นญํ–‰๋ ฌ๋กœ ๋ฐ˜ํ™˜

sigma_mat = np.diag(Sigma)
a_ = np.dot(np.dot(U,sigma_mat),Vt)
print(np.round(a_,3))

Sigma ํ–‰๋ ฌ์˜ ๊ฒฝ์šฐ 0์ด์•„๋‹Œ ๊ฐ’๋งŒ 1์ฐจ์›์œผ๋กœ ์ถ”์ถœํ•˜์˜€์œผ๋ฏ€๋กœ 0์„ ํฌํ•จํ•œ ๋Œ€์นญํ–‰๋ ฌ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„ ๋‚ด์ ์„ ์ˆ˜ํ–‰ํ•ด์•ผํ•œ๋‹ค.

U, Sigma, Vt ๋ฅผ ๋‚ด์ ํ•˜์—ฌ ์›๋ณธํ–‰๋ ฌ์„ ๋ณต์›ํ•œ ๊ฒฐ๊ณผ, ์›๋ณธ๊ณผ ๋™์ผํ•˜๊ฒŒ ๋ณต์›๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

๋ฐ์ดํ„ฐ ๋กœ์šฐ ๊ฐ„ ์˜์กด์„ฑ์ด ์žˆ์„ ๊ฒฝ์šฐ

a[2] = a[0]+a[1]
a[3] = a[0]
print(np.round(a,3))

๋ฐ์ดํ„ฐ ๋กœ์šฐ๊ฐ„์— ์˜์กด์„ฑ์ด ์žˆ์„ ๊ฒฝ์šฐ ์–ด๋–ป๊ฒŒ Sigma ๊ฐ’์ด ๋ณ€ํ•˜๊ณ , ์ด์— ๋”ฐ๋ฅธ ์ฐจ์›์ถ•์†Œ๊ฐ€ ์ง„ํ–‰๋˜๋Š” ์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด aํ–‰๋ ฌ์˜ 3๋ฒˆ์งธ ๋กœ์šฐ๋ฅผ (์ฒซ๋ฒˆ์งธ ๋กœ์šฐ + ๋‘๋ฒˆ์งธ ๋กœ์šฐ)๋กœ ์—…๋ฐ์ดํŠธํ•˜๊ณ , 4๋ฒˆ์งธ๋Š” ์ฒซ๋ฒˆ์งธ ๋กœ์šฐ์™€ ๊ฐ™๋„๋ก ์—…๋ฐ์ดํŠธํ•˜์˜€๋‹ค. 

 

 

 

 

๋‹ค์‹œ SVD ์ˆ˜ํ–‰ํ•ด Sigma ๊ฐ’ ํ™•์ธ

U,Sigma,Vt = svd(a)
print(U.shape,Sigma.shape,Vt.shape)
print("Sigma Value : \n",np.round(Sigma,3))

๋‹ค์‹œ SVD๋กœ ๋ถ„ํ•ดํ•œ ๊ฒฐ๊ณผ, ์ด์ „๊ณผ ์ฐจ์›์€ ๋™์ผํ•˜์ง€๋งŒ, Sigma ๊ฐ’ ์ค‘ 2๊ฐœ๊ฐ€ 0์œผ๋กœ ๋ณ€ํ–ˆ๋‹ค. ์ฆ‰, ์„ ํ˜• ๋…๋ฆฝ์ธ ๋กœ์šฐ ๋ฒกํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ 2๊ฐœ๋ผ๋Š” ์˜๋ฏธ์ด๋‹ค. 

 

#U ํ–‰๋ ฌ์€ Sigma์™€ ๋‚ด์ ์„ ์ˆ˜ํ–‰ํ•˜๋ฏ€๋กœ Sigma์˜ ์•ž 2ํ–‰์— ๋Œ€์‘๋˜๋Š” ์—ด๋งŒ ์ถ”์ถœ
U_=U[:,:2]
Sigma_ = np.diag(Sigma[:2])

#V์ „์น˜ ํ–‰๋ ฌ์€ ์•ž2ํ–‰๋งŒ ์ถ”์ถœ
Vt_ = Vt[:2]
print(U_.shape,Sigma_.shape,Vt_.shape)

#U,Sigma,Vt์˜ ๋‚ด์ ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ๋‹ค์‹œ ์›๋ณธ ํ–‰๋ ฌ ๋ณต์›
a_ = np.dot(np.dot(U_,Sigma_),Vt_)
print(np.round(a_,3))

๋‹ค์‹œ ๋‚ด์ ํ•˜์—ฌ ๋ณต์›ํ•œ ๊ฒฐ๊ณผ, ์ •ํ™•ํ•˜๊ฒŒ ๋ณต์›๋˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ, ์›๋ณธํ–‰๋ ฌ์— ๊ฐ€๊น๊ฒŒ ๋ณต์›๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

 

 

 

 

Truncated SVD

import numpy as np
from scipy.sparse.linalg import svds
from scipy.linalg import svd

#์›๋ณธ ํ–‰๋ ฌ ์ถœ๋ ฅํ•˜๊ณ  SVD ์ ์šฉํ•  ๊ฒฝ์šฐ U, Sigma, Vt์˜ ์ฐจ์›ํ™•์ธ
np.random.seed(121)
matrix = np.random.random((6,6))
print("์›๋ณธ ํ–‰๋ ฌ : \n",matrix)
U,Sigma,Vt = svd(matrix,full_matrices=False)
print("Sigma ๊ฐ’ ํ–‰๋ ฌ : \n",Sigma)

#Truncated SVD๋กœ Sigma ํ–‰๋ ฌ์˜ ํŠน์ด๊ฐ’์„ 4๊ฐœ๋กœ ํ•˜์—ฌ Truncated SVD ์ˆ˜ํ–‰
num_components = 4
U_tr,Sigma_tr, Vt_tr = svds(matrix,k=num_components)
print("Truncated SVD ๋ถ„ํ•ด ํ–‰๋ ฌ ์ฐจ์›: ",U_tr.shape,Sigma_tr.shape,Vt_tr.shape)
print("Truncated SVD Sigma๊ฐ’ ํ–‰๋ ฌ : ",Sigma_tr)
matrix_tr = np.dot(np.dot(U_tr,np.diag(Sigma_tr)),Vt_tr)

print("Truncated SVD๋กœ ๋ถ„ํ•ด ํ›„ ๋ณต์› ํ–‰๋ ฌ : \n",matrix_tr)

6×6 ํ–‰๋ ฌ์„ SVD ๋ถ„ํ•ดํ•˜๋ฉด U,Sigma,Vt๊ฐ€ ๊ฐ๊ฐ (6,6),(6,),(6,6)์ฐจ์›์ด์ง€๋งŒ,

TruncatedSVD์˜ n_components๋ฅผ 4๋กœ ์„ค์ •ํ•˜๋ฉด U,Sigma,Vt๊ฐ€ ๊ฐ๊ฐ (6,4),(4,),(4,6)์ฐจ์›์œผ๋กœ ๋ถ„ํ•ด๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. (์™„๋ฒฝํ•˜๊ฒŒ ๋ณต์›๋˜์ง€ ์•Š์Œ)

 

์‚ฌ์ดํ‚ท๋Ÿฐ TruncatedSVD ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•œ ๋ณ€ํ™˜

from sklearn.decomposition import TruncatedSVD,PCA
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
%matplotlib inline

iris = load_iris()
iris_ftrs = iris.data

#2๊ฐœ์˜ ์ฃผ์š” ์ปดํฌ๋„ŒํŠธ๋กœ TruncatedSVD ๋ณ€ํ™˜
tsvd = TruncatedSVD(n_components=2)
tsvd.fit(iris_ftrs)
iris_tsvd = tsvd.transform(iris_ftrs)

#์‚ฐ์ ๋„ 2์ฐจ์›์œผ๋กœ TruncatedSVD ๋ณ€ํ™˜๋œ ๋ฐ์ดํ„ฐ ํ‘œํ˜„, ํ’ˆ์ข…์€ ์ƒ‰๊น”๋กœ ๊ตฌ๋ถ„
plt.scatter(x=iris_tsvd[:,0],y=iris_tsvd[:,1],c=iris.target)
plt.xlabel('TruncatedSVD Component 1')
plt.ylabel('TruncatedSVD Component 2')

 

ํ’ˆ์ข…๋ณ„๋กœ ์–ด๋Š ์ •๋„ ํด๋Ÿฌ์Šคํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•  ์ •๋„๋กœ ๊ฐ ๋ณ€ํ™˜ ์†์„ฑ์œผ๋กœ ๋›ฐ์–ด๋‚œ ๊ณ ์œ ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. 

 

 

 

 

iris data ์Šค์ผ€์ผ๋ง ๋ณ€ํ™˜ ํ›„ TruncatedSVD ์™€ PCA ํด๋ž˜์Šค ๋ณ€ํ™˜

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
iris_scaled = scaler.fit_transform(iris_ftrs)

#์Šค์ผ€์ผ๋ง๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ TruncatedSVD ๋ณ€ํ™˜
tsvd = TruncatedSVD(n_components = 2)
tsvd.fit(iris_scaled)
iris_tsvd = tsvd.transform(iris_scaled)

#์Šค์ผ€์ผ๋ง๋œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์œผ๋กœ PCA ๋ณ€ํ™˜ ์ˆ˜ํ–‰
pca = PCA(n_components=2)
pca.fit(iris_scaled)
iris_pca = pca.transform(iris_scaled)

#TruncatedSVD ๋ณ€ํ™˜ ๋ฐ์ดํ„ฐ๋ฅผ ์™ผ์ชฝ, PCA ๋ณ€ํ™˜๋ฐ์ดํ„ฐ๋ฅผ ์˜ค๋ฅธ์ชฝ
fig,(ax1,ax2) = plt.subplots(figsize=(9,4), ncols=2)
ax1.scatter(x=iris_tsvd[:,0],y=iris_tsvd[:,1],c=iris.target)
ax2.scatter(x=iris_pca[:,0],y=iris_pca[:,1],c=iris.target)
ax1.set_title('Truncated SVD Transformed')
ax2.set_title('PCA Transformed')

print((iris_pca-iris_tsvd).mean())
print((pca.components_-tsvd.components_).mean())

๋ชจ๋‘ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์œผ๋กœ 2๊ฐœ์˜ ๋ณ€ํ™˜์ด ๊ฑฐ์˜ ๋™์ผํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰ ๋ฐ์ดํ„ฐ์…‹์ด ์Šค์ผ€์ผ๋ง์œผ๋กœ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ์ด ๋™์ผํ•ด์ง€๋ฉด ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ SVD์™€ PCA๋Š” ์„œ๋กœ ๋™์ผํ•œ ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. 

ํ•˜์ง€๋งŒ PCA๋Š” ๋ฐ€์ง‘ํ–‰๋ ฌ์— ๋Œ€ํ•œ ๋ณ€ํ™˜๋งŒ ๊ฐ€๋Šฅํ•˜๋ฉฐ, SVD๋Š” ํฌ์†Œ ํ–‰๋ ฌ์— ๋Œ€ํ•œ ๋ณ€ํ™˜๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. 

SVD๋Š” PCA์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์ปดํ“จํ„ฐ ๋น„์ „ ์˜์—ญ์—์„œ ์ด๋ฏธ์ง€ ์••์ถ•์„ ํ†ตํ•œ ํŒจํ„ด ์ธ์‹๊ณผ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ๋ถ„์•ผ์— ์‚ฌ์šฉ๋œ๋‹ค. ๋˜ํ•œ, ํ…์ŠคํŠธ์˜ ํ† ํ”ฝ ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ•์ธ LSA์˜ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. 

728x90

'๐Ÿ›  Machine Learning > ์ฐจ์› ์ถ•์†Œ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Python]NMF  (0) 2022.03.08
[Python] LDA(Linear Discriminant Analysis)  (0) 2022.03.07
[Python] PCA ์˜ˆ์ œ  (0) 2022.03.06
[Python] PCA(Principal Component Analysis)  (0) 2022.03.05
Comments