您好,登錄后才能下訂單哦!
python3下multiprocessing、threading和gevent性能對比以及進程池、線程池和協程池性能對比,很多新手對此不是很清楚,為了幫助大家解決這個難題,下面小編將為大家詳細講解,有這方面需求的人可以來學習下,希望你能有所收獲。
目前計算機程序一般會遇到兩類I/O:硬盤I/O和網絡I/O。我就針對網絡I/O的場景分析下python3下進程、線程、協程效率的對比。進程采用multiprocessing.Pool進程池,線程是自己封裝的進程池,協程采用gevent的庫。用python3自帶的urlllib.request和開源的requests做對比。代碼如下:
import urllib.request import requests import time import multiprocessing import threading import queue def startTimer(): return time.time() def ticT(startTime): useTime = time.time() - startTime return round(useTime, 3) #def tic(startTime, name): # useTime = time.time() - startTime # print('[%s] use time: %1.3f' % (name, useTime)) def download_urllib(url): req = urllib.request.Request(url, headers={'user-agent': 'Mozilla/5.0'}) res = urllib.request.urlopen(req) data = res.read() try: data = data.decode('gbk') except UnicodeDecodeError: data = data.decode('utf8', 'ignore') return res.status, data def download_requests(url): req = requests.get(url, headers={'user-agent': 'Mozilla/5.0'}) return req.status_code, req.text class threadPoolManager: def __init__(self,urls, workNum=10000,threadNum=20): self.workQueue=queue.Queue() self.threadPool=[] self.__initWorkQueue(urls) self.__initThreadPool(threadNum) def __initWorkQueue(self,urls): for i in urls: self.workQueue.put((download_requests,i)) def __initThreadPool(self,threadNum): for i in range(threadNum): self.threadPool.append(work(self.workQueue)) def waitAllComplete(self): for i in self.threadPool: if i.isAlive(): i.join() class work(threading.Thread): def __init__(self,workQueue): threading.Thread.__init__(self) self.workQueue=workQueue self.start() def run(self): while True: if self.workQueue.qsize(): do,args=self.workQueue.get(block=False) do(args) self.workQueue.task_done() else: break urls = ['http://www.ustchacker.com'] * 10 urllibL = [] requestsL = [] multiPool = [] threadPool = [] N = 20 PoolNum = 100 for i in range(N): print('start %d try' % i) urllibT = startTimer() jobs = [download_urllib(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(urllibT, 'urllib.request') urllibL.append(ticT(urllibT)) print('1') requestsT = startTimer() jobs = [download_requests(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(requestsT, 'requests') requestsL.append(ticT(requestsT)) print('2') requestsT = startTimer() pool = multiprocessing.Pool(PoolNum) data = pool.map(download_requests, urls) pool.close() pool.join() multiPool.append(ticT(requestsT)) print('3') requestsT = startTimer() pool = threadPoolManager(urls, threadNum=PoolNum) pool.waitAllComplete() threadPool.append(ticT(requestsT)) print('4') import matplotlib.pyplot as plt x = list(range(1, N+1)) plt.plot(x, urllibL, label='urllib') plt.plot(x, requestsL, label='requests') plt.plot(x, multiPool, label='requests MultiPool') plt.plot(x, threadPool, label='requests threadPool') plt.xlabel('test number') plt.ylabel('time(s)') plt.legend() plt.show()
運行結果如下:
從上圖可以看出,python3自帶的urllib.request效率還是不如開源的requests,multiprocessing進程池效率明顯提升,但還低于自己封裝的線程池,有一部分原因是創建、調度進程的開銷比創建線程高(測試程序中我把創建的代價也包括在里面)。
下面是gevent的測試代碼:
import urllib.request import requests import time import gevent.pool import gevent.monkey gevent.monkey.patch_all() def startTimer(): return time.time() def ticT(startTime): useTime = time.time() - startTime return round(useTime, 3) #def tic(startTime, name): # useTime = time.time() - startTime # print('[%s] use time: %1.3f' % (name, useTime)) def download_urllib(url): req = urllib.request.Request(url, headers={'user-agent': 'Mozilla/5.0'}) res = urllib.request.urlopen(req) data = res.read() try: data = data.decode('gbk') except UnicodeDecodeError: data = data.decode('utf8', 'ignore') return res.status, data def download_requests(url): req = requests.get(url, headers={'user-agent': 'Mozilla/5.0'}) return req.status_code, req.text urls = ['http://www.ustchacker.com'] * 10 urllibL = [] requestsL = [] reqPool = [] reqSpawn = [] N = 20 PoolNum = 100 for i in range(N): print('start %d try' % i) urllibT = startTimer() jobs = [download_urllib(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(urllibT, 'urllib.request') urllibL.append(ticT(urllibT)) print('1') requestsT = startTimer() jobs = [download_requests(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(requestsT, 'requests') requestsL.append(ticT(requestsT)) print('2') requestsT = startTimer() pool = gevent.pool.Pool(PoolNum) data = pool.map(download_requests, urls) #for status, text in data: # print(status, text[:10]) #tic(requestsT, 'requests with gevent.pool') reqPool.append(ticT(requestsT)) print('3') requestsT = startTimer() jobs = [gevent.spawn(download_requests, url) for url in urls] gevent.joinall(jobs) #for i in jobs: # print(i.value[0], i.value[1][:10]) #tic(requestsT, 'requests with gevent.spawn') reqSpawn.append(ticT(requestsT)) print('4') import matplotlib.pyplot as plt x = list(range(1, N+1)) plt.plot(x, urllibL, label='urllib') plt.plot(x, requestsL, label='requests') plt.plot(x, reqPool, label='requests geventPool') plt.plot(x, reqSpawn, label='requests Spawn') plt.xlabel('test number') plt.ylabel('time(s)') plt.legend() plt.show()
運行結果如下:
從上圖可以看到,對于I/O密集型任務,gevent還是能對性能做很大提升的,由于協程的創建、調度開銷都比線程小的多,所以可以看到不論使用gevent的Spawn模式還是Pool模式,性能差距不大。
因為在gevent中需要使用monkey補丁,會提高gevent的性能,但會影響multiprocessing的運行,如果要同時使用,需要如下代碼:
gevent.monkey.patch_all(thread=False, socket=False, select=False)
可是這樣就不能充分發揮gevent的優勢,所以不能把multiprocessing Pool、threading Pool、gevent Pool在一個程序中對比。不過比較兩圖可以得出結論,線程池和gevent的性能最優的,其次是進程池。附帶得出個結論,requests庫比urllib.request庫性能要好一些哈:-)
看完上述內容是否對您有幫助呢?如果還想對相關知識有進一步的了解或閱讀更多相關文章,請關注億速云行業資訊頻道,感謝您對億速云的支持。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。