Python中Pandas知識點有哪些

發布時間：2021-11-30 15:30:05 來源：億速云閱讀：149 作者：小新欄目：大數據

這篇文章主要介紹了Python中Pandas知識點有哪些，具有一定借鑒價值，感興趣的朋友可以參考下，希望大家閱讀完這篇文章之后大有收獲，下面讓小編帶著大家一起了解一下。

pandas 是基于NumPy 的一種工具，該工具是為了解決數據分析任務而創建的。Pandas 納入了大量庫和一些標準的數據模型，提供了高效地操作大型數據集所需的工具。pandas提供了大量能使我們快速便捷地處理數據的函數和方法。你很快就會發現，它是使Python成為強大而高效的數據分析環境的重要因素之一。

1.pandas數據結構的介紹

Series：一維數組，與Numpy中的一維array類似。二者與Python基本的數據結構List也很相近。Series如今能保存不同種數據類型，字符串、boolean值、數字等都能保存在Series中。
Time- Series：以時間為索引的Series。
DataFrame：二維的表格型數據結構。很多功能與R中的data.frame類似。可以將DataFrame理解為Series的容器。
Panel ：三維的數組，可以理解為DataFrame的容器。

2.Series的操作

2.1 對象創建 2.1.1 直接創建2.1.2 字典創建

import pandas as pd import numpy as np # 直接創建 s = pd.Series(np.random.randn(5), index=['a','b','c','d','e']) print(s)  # 字典（dict）類型數據創建 s = pd.Series( {'a':10, 'b':20, 'c':30}, index=['b', 'c', 'a', 'd'])  OUT: a   -0.620323 b   -0.189133 c    1.677690 d   -1.480348 e   -0.539061 dtype: float64  OUT: a    10 b    20 c    30 dtype: int64

2.2 查看數據切片、索引、dict操作 Series既然是一維數組類型的數據結構，那么它支持想數組那樣去操作它。通過數組下標索引、切片都可以去操作他，且它的data可以是dict類型的，那么它肯定也就支持字典的索引方式。

import pandas as pd import numpy as np  s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])  print(s)  # 下標索引 print('下標索引方式s[0] = : %s' % s[0])  # 字典訪問方式 print('字典訪問方式s[b] = ：%s' % s['b'])  # 切片操作 print('切片操作s[2:]\n:%s' % s[2:]) print('a' in s) print('k' in s) OUT: a   -0.799676 b   -1.581704 c   -1.240885 d    0.623757 e   -0.234417 dtype: float64  下標索引方式s[0] = : -0.799676067487 字典訪問方式s[b] = ：-1.58170351838 切片操作s[2:]: c   -1.240885 d    0.623757 e   -0.234417 True False

2.3 Series的算術操作

import pandas as pd import numpy as np  s1 = pd.Series(np.random.randn(3), index=['a','b','c'])  s2 = pd.Series(np.random.randn(3), index=['a','b','c']) print(s1+s2) print(s1-s2) print(s1*s2) print(s1/s2) OUT: a    0.236514 b   -0.132153 c    0.203186 dtype: float64  a    0.305397 b   -1.474441 c   -1.697982 dtype: float64  a   -0.009332 b   -0.539128 c   -0.710465 dtype: float64  a   -7.867120 b   -1.196907 c   -0.786252 dtype: float64

3.dataframe的操作

3.1 對象創建

In [70]:  data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'year': [2000, 2001, 20     ...: 02, 2001, 2002],'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} In [71]: data Out[71]:  {'pop': [1.5, 1.7, 3.6, 2.4, 2.9],  'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],  'year': [2000, 2001, 2002, 2001, 2002]} # 建立DataFrame對象 In [72]: frame1 = DataFrame(data) # 紅色部分為自動生成的索引 In [73]: frame1 Out[73]:     pop   state  year 0  1.5    Ohio  2000 1  1.7    Ohio  2001 2  3.6    Ohio  2002 3  2.4  Nevada  2001 4  2.9  Nevada  2002  >>> lista = [1,2,5,7] >>> listb = ['a','b','c','d'] >>> df = pd.DataFrame({'col1':lista,'col2':listb}) >>> df    col1 col2 0     1    a 1     2    b 2     5    c 3     7    d

3.2 選擇數據

In [1]: import numpy as np    ...: import pandas as    ...: df = pd.DataFrame  In [2]: df Out[2]:     a   b   c 0   0   2   4 1   6   8  10 2  12  14  16 3  18  20  22 4  24  26  28 5  30  32  34 6  36  38  40 7  42  44  46 8  48  50  52 9  54  56  58  In [3]: df.loc[0,'c'] Out[3]: 4  In [4]: df.loc[1:4,['a','c']] Out[4]:     a   c 1   6  10 2  12  16 3  18  22 4  24  28 In [5]: df.iloc[0,2] Out[5]: 4  In [6]: df.iloc[1:4,[0,2]] Out[6]:     a   c 1   6  10 2  12  16 3  18  22

3.3 函數應用

frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),                      index=['Utah', 'Ohio', 'Texas', 'Oregon']) frame np.abs(frame)  OUT：             b      d            e Utah 0.204708 0.478943 0.519439 Ohio 0.555730 1.965781 1.393406 Texas 0.092908 0.281746 0.769023 Oregon 1.246435 1.007189 1.296221   f = lambda x: x.max() - x.min() frame.apply(f) OUT： b    1.802165 d    1.684034 e    2.689627 dtype: float64  def f(x):     return pd.Series([x.min(), x.max()], index=['min', 'max']) frame.apply(f)              b d       e Utah -0.20 0.48 -0.52 Ohio -0.56 1.97 1.39 Texas 0.09 0.28 0.77 Oregon 1.25 1.01 -1.30

3.4 統計概述和計算

df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],                    [np.nan, np.nan], [0.75, -1.3]],                   index=['a', 'b', 'c', 'd'],                   columns=['one', 'two']) df OUT:     one     two a 1.40 NaN b 7.10 -4.5 c NaN     NaN d 0.75 -1.3   df.info() df.describe()   <class 'pandas.core.frame.DataFrame'> Index: 4 entries, a to d Data columns (total 2 columns): one    3 non-null float64 two    2 non-null float64 dtypes: float64(2) memory usage: 256.0+ bytes  OUT：           one          two count 3.000000 2.000000 mean 3.083333 -2.900000 std 3.493685 2.262742 min 0.750000 -4.500000 25% 1.075000 -3.700000 50% 1.400000 -2.900000 75% 4.250000 -2.100000 max 7.100000 -1.300000

3.5 數據讀取

data = pd.read_csv('./dataset/HR.csv') data.info()  out： <class 'pandas.core.frame.DataFrame'> RangeIndex: 14999 entries, 0 to 14998 Data columns (total 10 columns): satisfaction_level       14999 non-null float64 last_evaluation          14999 non-null float64 number_project           14999 non-null int64 average_montly_hours     14999 non-null int64 time_spend_company       14999 non-null int64 Work_accident            14999 non-null int64 left                     14999 non-null int64 promotion_last_5years    14999 non-null int64 sales                    14999 non-null object salary                   14999 non-null object dtypes: float64(2), int64(6), object(2) memory usage: 1.1+ MB  data = pd.read_csv('./dataset/movielens/movies.dat', header=None, names=['name', 'types'], sep='::', engine='python') data.head() OUT：                 name types 1 Toy Story (1995) Animation|Children's|Comedy 2 Jumanji (1995) Adventure|Children's|Fantasy 3 Grumpier Old Men (1995) Comedy|Romance 4 Waiting to Exhale (1995) Comedy|Drama 5 Father of the Bride Part II (1995) Comedy   data = pd.read_excel('./dataset/my_excel.xlsx', sheet_name=1) data.head() ouput:         date H1 H2 H3 0 2014-06-01 1 2 3 1 2014-06-02 2 3 4 2 2014-06-03 3 4 5 3 2014-06-04 4 5 6

#4. Time- Series的操作

生成日期范圍：

import pandas as pd pd.data_range('20190313',periods=10)  OUT: DatetimeIndex(['2019-03-13', '2019-03-14', '2019-03-15', '2019-03-16',                '2019-03-17', '2019-03-18', '2019-03-19', '2019-03-20',                '2019-03-21', '2019-03-22'],               dtype='datetime64[ns]', freq='D')

5. 繪圖功能

ts = pd.DataFrame(np.random.randn(1000,4),index=pd.date_range('20180101',periods=1000),columns=list('abcd')) ts = ts.cumsum() ts.plot(figsize = (12,8)) plt.show()

感謝你能夠認真閱讀完這篇文章，希望小編分享的“Python中Pandas知識點有哪些”這篇文章對大家有幫助，同時也希望大家多多支持億速云，關注億速云行業資訊頻道，更多相關知識等著你來學習!

向AI問一下細節

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python中Pandas知識點有哪些

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python中Pandas知識點有哪些

猜你喜歡

最新資訊

相關推薦

相關標簽