還記得從墨水印跡中找出形狀嗎? k表示此活動有點類似。 您查看形狀并展開以解釋存在多少個不同的群集/種群!
在K均值中,我們有聚類,每個聚類都有自己的質心。 質心和群集中數據點之間的差平方和構成該群集的平方值之和。 同樣,當所有聚類的平方和相加時,它成為聚類解的平方和之內的總和。
我們知道,隨著簇數的增加,該值會不斷減少,但是如果繪制結果,您可能會看到平方距離的總和急劇減小,直到達到某個k值,然后才逐漸減小。 在這里,我們可以找到最佳的群集數量。
The following code is for the K-Means
# importing required libraries
import pandas as pd
from sklearn.cluster import KMeans
# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')
# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)
# Now, we need to divide the training data into differernt clusters
# and predict in which cluster a particular data point belongs.
Create the object of the K-Means model
You can also add other parameters and test your code here
Some parameters are : n_clusters and max_iter
Documentation of sklearn KMeans:
model = KMeans()
# fit the model with the training data
# Number of Clusters
print('\nDefault number of Clusters : ',model.n_clusters)
# predict the clusters on the train dataset
predict_train = model.predict(train_data)
print('\nCLusters on train data',predict_train)
# predict the target on the test dataset
predict_test = model.predict(test_data)
print('Clusters on test data',predict_test)
# Now, we will train a model with n_cluster = 3
model_n3 = KMeans(n_clusters=3)
# fit the model with the training data
# Number of Clusters
print('\nNumber of Clusters : ',model_n3.n_clusters)
# predict the clusters on the train dataset
predict_train_3 = model_n3.predict(train_data)
print('\nCLusters on train data',predict_train_3)
# predict the target on the test dataset
predict_test_3 = model_n3.predict(test_data)
print('Clusters on test data',predict_test_3)
Shape of training data : (100, 5)
Shape of testing data : (100, 5)
Default number of Clusters : 8
CLusters on train data [6 7 0 7 6 5 5 7 7 3 1 1 3 0 7 1 0 4 5 6 4 3 3 0 4 0 1 1 0 3 4 3 3 0 0 1 2
1 4 3 0 2 1 1 0 3 3 0 7 1 3 0 5 1 0 1 5 4 6 4 3 6 5 0 3 0 4 33 1 5 1 6 5
7 7 6 3 5 3 5 3 1 5 2 5 0 3 2 3 4 7 1 0 1 5 3 6 1 6]
Clusters on test data [3 6 2 0 5 6 0 3 5 2 3 4 5 5 5 3 3 5 5 70 0 5 5 3 5 0 6 5 0 1 6 3 5 6 0 1
7 3 0 0 6 2 0 5 3 5 7 3 3 4 6 3 1 6 3 1 3 3 2 3 3 5 1 7 5 1 53 3 5 2 0 1
5 0 3 0 3 6 3 5 4 0 2 6 3 5 6 0 6 4 3 5 0 6 6 6 1 0]
Number of Clusters : 3
CLusters on train data [2 0 1 0 2 1 2 0 0 2 0 0 2 1 0 0 1 2 2 2 2 2 2 1 2 1 0 0 1 2 2 2 2 1 1 0 2
0 2 2 1 2 0 0 1 2 2 1 0 0 2 1 2 0 1 0 2 2 2 2 2 2 2 1 2 1 2 22 0 1 0 2 2
0 0 0 2 0 2 2 2 0 2 2 2 1 2 2 2 2 0 0 1 0 2 2 2 0 2]
Clusters on test data [2 2 2 1 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 01 1 2 2 2 2 1 2 2 1 0 2 2 2 2 1 0
0 2 1 1 2 2 1 2 2 2 0 2 2 2 2 2 0 2 2 0 2 2 2 2 2 2 0 0 2 0 22 2 0 2 1 0
2 1 2 1 2 0 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 0 1]