[PYTHON/SCIKIT-LEARN] k-평균 클러스터링(k-means clustering) 사용하기

■ k-평균 클러스터링(k-means clustering)을 사용하는 방법을 보여준다.

▶ 예제 코드 (PY)


import sklearn.cluster as cluster
import sklearn.datasets as datasets
import sklearn.metrics as metrics

mnistBunch   = datasets.load_digits()
imageNDArray = mnistBunch.images
imageCount   = len(imageNDArray)
imageNDArray = imageNDArray.reshape(len(imageNDArray), -1)
labelNDArray = mnistBunch.target

trainCount = int((imageCount / 4) * 3)
testCount  = int((imageCount / 4))

kmeans = cluster.KMeans(n_clusters = 10, init = "k-means++", n_init = 10)

kmeans.fit(imageNDArray[:trainCount])

print(kmeans.labels_)
print(type(kmeans.labels_))
print(kmeans.labels_.shape)
print()

testLabelList = labelNDArray[testCount:]

predictionLabelList = kmeans.predict(imageNDArray[testCount:])

print("성능 리포트 : \n %s \n" % (metrics.classification_report(testLabelList, predictionLabelList)))

"""
[6 4 4 ... 3 1 3]
<class 'numpy.ndarray'>
(1347,)

성능 리포트 :
               precision    recall  f1-score   support

           0       0.00      0.00      0.00       131
           1       0.01      0.01      0.01       137
           2       0.00      0.00      0.00       131
           3       0.85      0.82      0.83       136
           4       0.00      0.00      0.00       139
           5       0.91      0.72      0.80       136
           6       0.01      0.01      0.01       138
           7       0.85      0.98      0.91       134
           8       0.02      0.02      0.02       130
           9       0.00      0.00      0.00       136

    accuracy                           0.26      1348
   macro avg       0.27      0.26      0.26      1348
weighted avg       0.27      0.26      0.26      1348
"""

import sklearn.cluster as cluster

import sklearn.datasets as datasets

import sklearn.metrics as metrics

mnistBunch = datasets.load_digits()

imageNDArray = mnistBunch.images

imageCount = len(imageNDArray)

imageNDArray = imageNDArray.reshape(len(imageNDArray), -1)

labelNDArray = mnistBunch.target

trainCount = int((imageCount / 4) * 3)

testCount = int((imageCount / 4))

kmeans = cluster.KMeans(n_clusters = 10, init = "k-means++", n_init = 10)

kmeans.fit(imageNDArray[:trainCount])

print(kmeans.labels_)

print(type(kmeans.labels_))

print(kmeans.labels_.shape)

print()

testLabelList = labelNDArray[testCount:]

predictionLabelList = kmeans.predict(imageNDArray[testCount:])

print("성능 리포트 : \n %s \n" % (metrics.classification_report(testLabelList, predictionLabelList)))

"""

[6 4 4 ... 3 1 3]

(1347,)

성능 리포트 :

precision recall f1-score support

0 0.00 0.00 0.00 131

1 0.01 0.01 0.01 137

2 0.00 0.00 0.00 131

3 0.85 0.82 0.83 136

4 0.00 0.00 0.00 139

5 0.91 0.72 0.80 136

6 0.01 0.01 0.01 138

7 0.85 0.98 0.91 134

8 0.02 0.02 0.02 130

9 0.00 0.00 0.00 136

accuracy 0.26 1348

macro avg 0.27 0.26 0.26 1348

weighted avg 0.27 0.26 0.26 1348

"""

Post Views: 14

icodebroker

[PYTHON/SCIKIT-LEARN] k-평균 클러스터링(k-means clustering) 사용하기

분류

보관함