Python怎么用scikit-learn實現近鄰算法分類

發布時間：2023-02-28 11:08:56 來源：億速云閱讀：209 作者：iii 欄目：開發技術

今天小編給大家分享一下Python怎么用scikit-learn實現近鄰算法分類的相關知識點，內容詳細，邏輯清晰，相信大部分人都還太了解這方面的知識，所以分享這篇文章給大家參考一下，希望大家閱讀完這篇文章后有所收獲，下面我們一起來了解一下吧。

scikit-learn庫

scikit-learn已經封裝好很多數據挖掘的算法

現介紹數據挖掘框架的搭建方法

1.轉換器（Transformer）用于數據預處理，數據轉換

2.流水線（Pipeline）組合數據挖掘流程，方便再次使用（封裝）

3.估計器（Estimator）用于分類，聚類，回歸分析（各種算法對象）

所有的估計器都有下面2個函數

fit() 訓練

用法：estimator.fit(X_train, y_train)

estimator = KNeighborsClassifier() 是scikit-learn算法對象

X_train = dataset.data 是numpy數組

y_train = dataset.target 是numpy數組

predict() 預測

用法：estimator.predict(X_test)

estimator = KNeighborsClassifier() 是scikit-learn算法對象

X_test = dataset.data 是numpy數組

示例

%matplotlib inline
# Ionosphere數據集
# https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/
# 下載ionosphere.data和ionosphere.names文件，放在 ./data/Ionosphere/ 目錄下
import os
home_folder = os.path.expanduser("~")
print(home_folder) # home目錄
# Change this to the location of your dataset
home_folder = "." # 改為當前目錄
data_folder = os.path.join(home_folder, "data")
print(data_folder)
data_filename = os.path.join(data_folder, "ionosphere.data")
print(data_filename)
import csv
import numpy as np

# Size taken from the dataset and is known已知數據集形狀
X = np.zeros((351, 34), dtype='float')
y = np.zeros((351,), dtype='bool')


with open(data_filename, 'r') as input_file:
    reader = csv.reader(input_file)
    for i, row in enumerate(reader):
        # Get the data, converting each item to a float
        data = [float(datum) for datum in row[:-1]]
        # Set the appropriate row in our dataset用真實數據覆蓋掉初始化的0
        X[i] = data
        # 1 if the class is 'g', 0 otherwise
        y[i] = row[-1] == 'g' # 相當于if row[-1]=='g': y[i]=1 else: y[i]=0

# 數據預處理
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14)
print("訓練集數據有 {} 條".format(X_train.shape[0]))
print("測試集數據有 {} 條".format(X_test.shape[0]))
print("每條數據有 {} 個features".format(X_train.shape[1]))

輸出：

訓練集數據有 263 條
測試集數據有 88 條
每條數據有 34 個features

# 實例化算法對象->訓練->預測->評價
from sklearn.neighbors import KNeighborsClassifier

estimator = KNeighborsClassifier()
estimator.fit(X_train, y_train)
y_predicted = estimator.predict(X_test)
accuracy = np.mean(y_test == y_predicted) * 100
print("準確率 {0:.1f}%".format(accuracy))

# 其他評價方式
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(estimator, X, y, scoring='accuracy')
average_accuracy = np.mean(scores) * 100
print("平均準確率 {0:.1f}%".format(average_accuracy))

avg_scores = []
all_scores = []
parameter_values = list(range(1, 21))  # Including 20
for n_neighbors in parameter_values:
    estimator = KNeighborsClassifier(n_neighbors=n_neighbors)
    scores = cross_val_score(estimator, X, y, scoring='accuracy')
    avg_scores.append(np.mean(scores))
    all_scores.append(scores)

輸出：

準確率 86.4%
平均準確率 82.3%

from matplotlib import pyplot as plt
plt.figure(figsize=(32,20))
plt.plot(parameter_values, avg_scores, '-o', linewidth=5, markersize=24)
#plt.axis([0, max(parameter_values), 0, 1.0])

Python怎么用scikit-learn實現近鄰算法分類

for parameter, scores in zip(parameter_values, all_scores):
    n_scores = len(scores)
    plt.plot([parameter] * n_scores, scores, '-o')

Python怎么用scikit-learn實現近鄰算法分類

plt.plot(parameter_values, all_scores, 'bx')

Python怎么用scikit-learn實現近鄰算法分類

from collections import defaultdict
all_scores = defaultdict(list)
parameter_values = list(range(1, 21))  # Including 20
for n_neighbors in parameter_values:
    for i in range(100):
        estimator = KNeighborsClassifier(n_neighbors=n_neighbors)
        scores = cross_val_score(estimator, X, y, scoring='accuracy', cv=10)
        all_scores[n_neighbors].append(scores)
for parameter in parameter_values:
    scores = all_scores[parameter]
    n_scores = len(scores)
    plt.plot([parameter] * n_scores, scores, '-o')

Python怎么用scikit-learn實現近鄰算法分類

plt.plot(parameter_values, avg_scores, '-o')

Python怎么用scikit-learn實現近鄰算法分類

以上就是“Python怎么用scikit-learn實現近鄰算法分類”這篇文章的所有內容，感謝各位的閱讀！相信大家閱讀完這篇文章都有很大的收獲，小編每天都會為大家更新不同的知識，如果還想學習更多的知識，請關注億速云行業資訊頻道。

向AI問一下細節

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python怎么用scikit-learn實現近鄰算法分類

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python怎么用scikit-learn實現近鄰算法分類

猜你喜歡

最新資訊

相關推薦

相關標簽