中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Kaggle案例(一)Titanic: Machine Learning from Disaste

發布時間:2020-07-26 09:55:04 來源:網絡 閱讀:651 作者:up4ever 欄目:編程語言

1. 案例簡介

Titanic 案例是Kaggle 入門案例,鏈接地址https://www.kaggle.com/c/titanic 。以下是摘自官網上的描述信息:
Kaggle案例(一)Titanic: Machine Learning from Disaste

2. 分析數據

2.1 讀取數據

加載訓練數據

data_train = pd.read_csv("./input/train.csv")

預覽數據

data_train.head()

Kaggle案例(一)Titanic: Machine Learning from Disaste
訓練集數據說明:
Kaggle案例(一)Titanic: Machine Learning from Disaste

查看數據集信息

data_train.info()

Kaggle案例(一)Titanic: Machine Learning from Disaste

查看有缺失值的列

ata_train.columns[data_train.isnull().any()].tolist()

Kaggle案例(一)Titanic: Machine Learning from Disaste

計算缺失數

age_null_count = data_train.Age.isnull().sum()
cabin_null_count = data_train.Cabin.isnull().sum()
embarked_null_count = data_train.Embarked.isnull().sum()
print('Age列缺失:%s' %age_null_count)
print('Cabin列缺失:%s' %cabin_null_count)
print('Embarked列缺失:%s' %embarked_null_count)

Kaggle案例(一)Titanic: Machine Learning from Disaste

2.2 處理數據

Age列缺失值
使用Age列中位數填充缺失值

data_train.Age.fillna(data_train.Age.median())

Cabin列缺失值
Cabin列數據缺失條目較多,計算Survived列與Cabin列數據關系

Survived_cabin = data_train.Survived[pd.notnull(data_train.Cabin)].value_counts()
print(Survived_cabin)

Kaggle案例(一)Titanic: Machine Learning from Disaste

Survived_nocabin = data_train.Survived[pd.isnull(data_train.Cabin)].value_counts()
print(Survived_nocabin)

Kaggle案例(一)Titanic: Machine Learning from Disaste

可以發現有Cabin信息的乘客獲救幾率要大。將Cabin列數據作為一個分類標簽處理

Embarked列缺失值
使用Embarked列眾數填充缺失值

data_train.Embarked.fillna(data_train.Embarked.mode())

2.3 數據展現

獲救人數情況

# 繪制獲救人數情況
data_train.Survived.value_counts().plot(kind='bar')
plt.title("獲救情況")
plt.xticks([0,1], ["未獲救","獲救"], rotation=0)
plt.ylabel("人數")

Kaggle案例(一)Titanic: Machine Learning from Disaste
各等級的乘客年齡分布

data_train.Age[data_train.Pclass == 1].plot(kind='kde')   
data_train.Age[data_train.Pclass == 2].plot(kind='kde')
data_train.Age[data_train.Pclass == 3].plot(kind='kde')
plt.xlabel("年齡")
plt.ylabel("密度") 
plt.title("各等級的乘客年齡分布")
plt.legend(('一等艙', '二等艙','三等艙'),loc='best')

Kaggle案例(一)Titanic: Machine Learning from Disaste

各乘客等級的獲救情況

Survived_0 = data_train.Pclass[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Pclass[data_train.Survived == 1].value_counts()
df=pd.DataFrame({'獲救':Survived_1, '未獲救':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title("船艙等級的獲救情況")
plt.xlabel("船艙等級") 
plt.ylabel("人數") 
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

繪制登船口岸上船人數

data_train.Embarked.value_counts().plot(kind='bar')
plt.title("各登船口岸上船人數")
plt.ylabel("人數")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

各登錄港口的獲救情況

Survived_0 = data_train.Embarked[data_train.Survived == 0].value_counts()
Survived_1 = data_train.Embarked[data_train.Survived == 1].value_counts()
df=pd.DataFrame({'獲救':Survived_1, '未獲救':Survived_0})
df.plot(kind='bar', stacked=True)
plt.title("登陸港口乘客的獲救情況")
plt.xlabel("登陸港口") 
plt.ylabel("人數") 
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

各性別的獲救情況

Survived_m = data_train.Survived[data_train.Sex == 'male'].value_counts()
Survived_f = data_train.Survived[data_train.Sex == 'female'].value_counts()
df=pd.DataFrame({'男性':Survived_m, '女性':Survived_f})
df.plot(kind='bar', stacked=True)
plt.title("男女性別獲救情況")
plt.xlabel("性別") 
plt.ylabel("人數")
plt.xticks([0,1], ["未獲救","獲救"], rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

SibSp字段獲救情況

SibSp_0 = data_train.SibSp[data_train.Survived == 0].value_counts()
SibSp_1 = data_train.SibSp[data_train.Survived == 1].value_counts()
SibSp_df=pd.DataFrame({'未獲救':SibSp_0, '獲救':SibSp_1})
SibSp_df.plot(kind='bar',stacked=True)
plt.title("堂兄弟/妹個數獲救情況")
plt.xlabel("堂兄弟/妹個數") 
plt.ylabel("人數")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

Parch字段獲救情況

Parch_0 = data_train.Parch[data_train.Survived == 0].value_counts()
Parch_1 = data_train.Parch[data_train.Survived == 1].value_counts()
Parch_df=pd.DataFrame({'未獲救':Parch_0, '獲救':Parch_1})
Parch_df.plot(kind='bar',stacked=True)
plt.title("父母與小孩個數獲救情況")
plt.xlabel("父母與小孩個數") 
plt.ylabel("人數")
plt.xticks(rotation=0)

Kaggle案例(一)Titanic: Machine Learning from Disaste

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

峨山| 如皋市| 库车县| 山东| 卫辉市| 集贤县| 尚义县| 宝坻区| 永兴县| 新河县| 汨罗市| 甘德县| 农安县| 永丰县| 凯里市| 锡林郭勒盟| 咸丰县| 北宁市| 和林格尔县| 荥经县| 谢通门县| 长治市| 封丘县| 通许县| 新乡市| 长丰县| 安庆市| 苗栗县| 吉隆县| 武胜县| 清流县| 兴隆县| 靖边县| 漳州市| 镇安县| 东丽区| 海口市| 维西| 大田县| 湛江市| 思南县|