您好,登錄后才能下訂單哦!
今天小編給大家分享一下Python怎么用scikit-learn實現近鄰算法分類的相關知識點,內容詳細,邏輯清晰,相信大部分人都還太了解這方面的知識,所以分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后有所收獲,下面我們一起來了解一下吧。
scikit-learn庫
scikit-learn已經封裝好很多數據挖掘的算法
現介紹數據挖掘框架的搭建方法
1.轉換器(Transformer)用于數據預處理,數據轉換
2.流水線(Pipeline)組合數據挖掘流程,方便再次使用(封裝)
3.估計器(Estimator)用于分類,聚類,回歸分析(各種算法對象)
所有的估計器都有下面2個函數
fit() 訓練
用法:estimator.fit(X_train, y_train)
estimator = KNeighborsClassifier() 是scikit-learn算法對象
X_train = dataset.data 是numpy數組
y_train = dataset.target 是numpy數組
predict() 預測
用法:estimator.predict(X_test)
estimator = KNeighborsClassifier() 是scikit-learn算法對象
X_test = dataset.data 是numpy數組
示例
%matplotlib inline # Ionosphere數據集 # https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ # 下載ionosphere.data和ionosphere.names文件,放在 ./data/Ionosphere/ 目錄下 import os home_folder = os.path.expanduser("~") print(home_folder) # home目錄 # Change this to the location of your dataset home_folder = "." # 改為當前目錄 data_folder = os.path.join(home_folder, "data") print(data_folder) data_filename = os.path.join(data_folder, "ionosphere.data") print(data_filename) import csv import numpy as np
# Size taken from the dataset and is known已知數據集形狀 X = np.zeros((351, 34), dtype='float') y = np.zeros((351,), dtype='bool') with open(data_filename, 'r') as input_file: reader = csv.reader(input_file) for i, row in enumerate(reader): # Get the data, converting each item to a float data = [float(datum) for datum in row[:-1]] # Set the appropriate row in our dataset用真實數據覆蓋掉初始化的0 X[i] = data # 1 if the class is 'g', 0 otherwise y[i] = row[-1] == 'g' # 相當于if row[-1]=='g': y[i]=1 else: y[i]=0
# 數據預處理 from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14) print("訓練集數據有 {} 條".format(X_train.shape[0])) print("測試集數據有 {} 條".format(X_test.shape[0])) print("每條數據有 {} 個features".format(X_train.shape[1]))
輸出:
訓練集數據有 263 條
測試集數據有 88 條
每條數據有 34 個features
# 實例化算法對象->訓練->預測->評價 from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier() estimator.fit(X_train, y_train) y_predicted = estimator.predict(X_test) accuracy = np.mean(y_test == y_predicted) * 100 print("準確率 {0:.1f}%".format(accuracy)) # 其他評價方式 from sklearn.cross_validation import cross_val_score scores = cross_val_score(estimator, X, y, scoring='accuracy') average_accuracy = np.mean(scores) * 100 print("平均準確率 {0:.1f}%".format(average_accuracy)) avg_scores = [] all_scores = [] parameter_values = list(range(1, 21)) # Including 20 for n_neighbors in parameter_values: estimator = KNeighborsClassifier(n_neighbors=n_neighbors) scores = cross_val_score(estimator, X, y, scoring='accuracy') avg_scores.append(np.mean(scores)) all_scores.append(scores)
輸出:
準確率 86.4%
平均準確率 82.3%
from matplotlib import pyplot as plt plt.figure(figsize=(32,20)) plt.plot(parameter_values, avg_scores, '-o', linewidth=5, markersize=24) #plt.axis([0, max(parameter_values), 0, 1.0])
for parameter, scores in zip(parameter_values, all_scores): n_scores = len(scores) plt.plot([parameter] * n_scores, scores, '-o')
plt.plot(parameter_values, all_scores, 'bx')
from collections import defaultdict all_scores = defaultdict(list) parameter_values = list(range(1, 21)) # Including 20 for n_neighbors in parameter_values: for i in range(100): estimator = KNeighborsClassifier(n_neighbors=n_neighbors) scores = cross_val_score(estimator, X, y, scoring='accuracy', cv=10) all_scores[n_neighbors].append(scores) for parameter in parameter_values: scores = all_scores[parameter] n_scores = len(scores) plt.plot([parameter] * n_scores, scores, '-o')
plt.plot(parameter_values, avg_scores, '-o')
以上就是“Python怎么用scikit-learn實現近鄰算法分類”這篇文章的所有內容,感謝各位的閱讀!相信大家閱讀完這篇文章都有很大的收獲,小編每天都會為大家更新不同的知識,如果還想學習更多的知識,請關注億速云行業資訊頻道。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。