您好,登錄后才能下訂單哦!
利用cookielib和urllib2模塊模擬登陸163的例子有很多,近期看了《python模擬登陸163郵箱并獲取通訊錄》一文,受到啟發,試著對收件箱、發件箱等進行了分析,并列出了所有郵件列表及狀態,包括發件人、收件人、主題、發信時間、已讀未讀等狀態。
1、參考代碼
#-*- coding:UTF-8 -*- import urllib,urllib2,cookielib import xml.etree.ElementTree as etree #xml解析類 class Login163: #偽裝browser header = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} username = '' passwd = '' cookie = None #cookie對象 cookiefile = './cookies.dat' #cookie臨時存放地 user = '' def __init__(self,username,passwd): self.username = username self.passwd = passwd #cookie設置 self.cookie = cookielib.LWPCookieJar() #自定義cookie存放 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cookie)) urllib2.install_opener(opener) #登陸 def login(self): #請求參數設置 postdata = { 'username':self.username, 'password':self.passwd, 'type':1 } postdata = urllib.urlencode(postdata) #發起請求 req = urllib2.Request( url='http://reg.163.com/logins.jsp?type=1&product=mail163&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3D1%26verifycookie%3D1%26language%3D-1%26style%3D1', data= postdata,#請求數據 headers = self.header #請求頭 ) result = urllib2.urlopen(req).read() result = str(result) self.user = self.username.split('@')[0] self.cookie.save(self.cookiefile)#保存cookie if '登錄成功,正在跳轉...' in result: #print("%s 你已成功登陸163郵箱。---------\n" %(user)) flag = True else: flag = '%s 登陸163郵箱失敗。'%(self.user) return flag #獲取通訊錄 def address_list(self): #獲取認證sid auth = urllib2.Request( url='http://entry.mail.163.com/coremail/fcg/ntesdoor2?username='+self.user+'&lightweight=1&verifycookie=1&language=-1&style=1', headers = self.header ) auth = urllib2.urlopen(auth).read() for i,sid in enumerate(self.cookie):#enumerate()用于同時返數字索引與數值,實際上是一個元組:((0,test[0]),(1,test[1]).......)這有點像php里的foreach 語句的作用 sid = str(sid) if 'sid' in sid: sid = sid.split()[1].split('=')[1] break self.cookie.save(self.cookiefile) #請求地址 url = 'http://twebmail.mail.163.com/js4/s?sid='+sid+'&func=global:sequential&showAd=false&userType=browser&uid='+self.username #參數設定(var 變量是必需要的,不然就只能看到:<code>S_OK</code><messages/>這類信息) #這里參數也是在firebug下查看的。 postdata = { 'func':'global:sequential', 'showAd':'false', 'sid':sid, 'uid':self.username, 'userType':'browser', 'var':'<?xml version="1.0"?><object><array name="items"><object><string name="func">pab:searchContacts</string><object name="var"><array name="order"><object><string name="field">FN</string><boolean name="desc">false</boolean><boolean name="ignoreCase">true</boolean></object></array></object></object><object><string name="func">pab:getAllGroups</string></object></array></object>' } postdata = urllib.urlencode(postdata) #組裝請求 req = urllib2.Request( url = url, data = postdata, headers = self.header ) res = urllib2.urlopen(req).read() #解析XML,轉換成json #說明:由于這樣請求后163給出的是xml格式的數據, #為了返回的數據能方便使用最好是轉為JSON json = [] tree = etree.fromstring(res) obj = None for child in tree: if child.tag == 'array': obj = child break #這里多參考一下,etree元素的方法屬性等,包括attrib,text,tag,getchildren()等 obj = obj[0].getchildren().pop() for child in obj: for x in child: attr = x.attrib if attr['name']== 'EMAIL;PREF': value = {'email':x.text} json.append(value) return json #Demo print("Requesting......\n\n") login = Login163('xxxx@163.com','xxxxx') flag = login.login() if type(flag) is bool: print("Successful landing,Resolved contacts......\n\n") res = login.address_list() for x in res: print(x['email']) else: print(flag)
2、分析收件箱、發件箱等網址
在參考代碼中,獲取通訊錄的url為
url = 'http://twebmail.mail.163.com/js4/s?sid='+sid+'&func=global:sequential&showAd=false&userType=browser&uid='+self.username,通過對郵箱地址的分析,發現收件箱、發件箱等的url為url = 'http://twebmail.mail.163.com/js4/s?sid='+sid+'&func=mbox:listMessages&showAd=false&userType=browser&uid='+self.username,其中func=mbox:listMessages。其對收件箱、發件箱的具體區分在下面的postdata中,具體為:
(1)收件箱
postdata = { 'func':'global:sequential', 'showAd':'false', 'sid':'qACVwiwOfuumHPdcYqOOUTAjEXNbBeAr', 'uid':self.username, 'userType':'browser', 'var':'<!--?xml version="1.0"?--><object><int name="fid">1</int><string name="order">date</string><boolean name="desc">true</boolean><boolean name="topFirst">false</boolean><int name="start">0</int><int name="limit">20</int></object>' }
(2)發件箱
postdata = { 'func':'global:sequential', 'showAd':'false', 'sid':'qACVwiwOfuumHPdcYqOOUTAjEXNbBeAr', 'uid':self.username, 'userType':'browser', 'var':'<!--?xml version="1.0"?--><object><int name="fid">3</int><string name="order">date</string><boolean name="desc">true</boolean><boolean name="topFirst">false</boolean><int name="start">0</int><int name="limit">20</int></object>' }
可以看出,兩段代碼的不同之處就是fid的取值不同,其中收件箱為1,發件箱為3,草稿箱為2。
3、xml解析
利用ElementTree 類來進行xml到字典的轉換。在獲取通訊錄的實例中,主要使用了這一方法。本例子(具體代碼見后文)在收取郵件列表時,并沒有用這一方法,仍然使用的是字符串的處理方法。但這里還是列一下ElementTree 類對xml的處理。如(參考地址:http://hi.baidu.com/fc_lamp/blog/item/8ed2d53ada4586f714cecb3d.html):
-<result> <code>S_OK</code> -<array name="var"> -<object> <string name="code">S_OK</string> -<array name="var"> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> +<object></object> </array> </object> +<object></object> </array> </result>
解決方法:
#-*- coding:UTF-8 -*- import xml.etree.ElementTree as etree #xml解析類 def xml2json(xml): json = [] tree = etree.fromstring(xml) #如果是文件可用parse(source) obj = None for child in tree: if child.tag == 'array': obj = child break #這里多參考一下,etree元素的方法屬性等,包括attrib,text,tag,getchildren()等 obj = obj[0].getchildren().pop() for child in obj: for x in child: attr = x.attrib if attr['name']== 'EMAIL;PREF': value = {'email':x.text} json.append(value) return json
4、收件箱郵件列表
本例子只列出了收件箱郵件列表,如果需要,可根據以上介紹調整fid值,列出發件箱、草稿箱等的郵件列表。程序在windosxp、py2.6環境下調查通過,運行后,會在當前目錄下生成三個文件:inboxlistfile.txt記錄收件箱郵件列表,addfile.txt記錄通訊錄,cookies.dat記錄cookies。具體代碼如下:
#-*- coding:UTF-8 -*- import urllib,urllib2,cookielib import xml.etree.ElementTree as etree #xml解析類 class Login163: #偽裝browser header = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} username = '' passwd = '' cookie = None #cookie對象 cookiefile = './cookies.dat' #cookie臨時存放地 user = '' def __init__(self,username,passwd): self.username = username self.passwd = passwd #cookie設置 self.cookie = cookielib.LWPCookieJar() #自定義cookie存放 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cookie)) urllib2.install_opener(opener) #登陸 def login(self): #請求參數設置 postdata = { 'username':self.username, 'password':self.passwd, 'type':1 } postdata = urllib.urlencode(postdata) #發起請求 req = urllib2.Request( url='http://reg.163.com/logins.jsp?type=1&product=mail163&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3D1%26verifycookie%3D1%26language%3D-1%26style%3D1', data= postdata,#請求數據 headers = self.header #請求頭 ) result = urllib2.urlopen(req).read() result = str(result) #print result self.user = self.username.split('@')[0] self.cookie.save(self.cookiefile)#保存cookie if '登錄成功,正在跳轉...' in result: #print("%s 你已成功登陸163郵箱。---------n" %(user)) flag = True else: flag = '%s 登陸163郵箱失敗。'%(self.user) return flag #獲取通訊錄 def address_list(self): #獲取認證sid auth = urllib2.Request( url='http://entry.mail.163.com/coremail/fcg/ntesdoor2?username='+self.user+'&lightweight=1&verifycookie=1&language=-1&style=1', headers = self.header ) auth = urllib2.urlopen(auth).read() #authstr=str(auth) #print authstr for i,sid in enumerate(self.cookie): sid = str(sid) #print 'sid:%s' %sid if 'sid' in sid: sid = sid.split()[1].split('=')[1] break self.cookie.save(self.cookiefile) #請求地址 url = 'http://twebmail.mail.163.com/js4/s?sid='+sid+'&func=global:sequential&showAd=false&userType=browser&uid='+self.username #參數設定(var 變量是必需要的,不然就只能看到:<code>S_OK</code><messages>這類信息) #這里參數也是在firebug下查看的。 postdata = { 'func':'global:sequential', 'showAd':'false', 'sid':'qACVwiwOfuumHPdcYqOOUTAjEXNbBeAr', 'uid':self.username, 'userType':'browser', 'var':'<!--?xml version="1.0"?--><object><array name="items"><object><string name="func">pab:searchContacts</string><object name="var"><array name="order"><object><string name="field">FN</string><boolean name="desc">false</boolean><boolean name="ignoreCase">true</boolean></object></array></object></object><object><string name="func">pab:getAllGroups</string></object></array></object>' } postdata = urllib.urlencode(postdata) #組裝請求 req = urllib2.Request( url = url, data = postdata, headers = self.header ) res = urllib2.urlopen(req).read() #print str(res) #解析XML,轉換成json #說明:由于這樣請求后163給出的是xml格式的數據, #為了返回的數據能方便使用最好是轉為JSON json = [] tree = etree.fromstring(res) obj = None for child in tree: if child.tag == 'array': obj = child break #這里多參考一下,etree元素的方法屬性等,包括attrib,text,tag,getchildren()等 obj = obj[0].getchildren().pop() for child in obj: for x in child: attr = x.attrib if attr['name']== 'EMAIL;PREF': value = {'email':x.text} json.append(value) return json #獲取收件箱 def minbox(self): #獲取認證sid auth = urllib2.Request( url='http://entry.mail.163.com/coremail/fcg/ntesdoor2?username='+self.user+'&lightweight=1&verifycookie=1&language=-1&style=1', headers = self.header ) auth = urllib2.urlopen(auth).read() #authstr=str(auth) #print authstr for i,sid in enumerate(self.cookie): sid = str(sid) #print 'sid:%s' %sid if 'sid' in sid: sid = sid.split()[1].split('=')[1] break self.cookie.save(self.cookiefile) url = 'http://twebmail.mail.163.com/js4/s?sid='+sid+'&func=mbox:listMessages&showAd=false&userType=browser&uid='+self.username postdata = { 'func':'global:sequential', 'showAd':'false', 'sid':'qACVwiwOfuumHPdcYqOOUTAjEXNbBeAr', 'uid':self.username, 'userType':'browser', 'var':'<!--?xml version="1.0"?--><object><int name="fid">1</int><string name="order">date</string><boolean name="desc">true</boolean><boolean name="topFirst">false</boolean><int name="start">0</int><int name="limit">20</int></object>' } postdata = urllib.urlencode(postdata) #組裝請求 req = urllib2.Request( url = url, data = postdata, headers = self.header ) res = urllib2.urlopen(req).read() liststr=str(res).split('<object>')#用object進行分割 inboxlistcount=len(liststr)-1#記錄郵件封數 inboxlistfile=open('inboxlistfile.txt','w') t=0 #記錄當前第幾封信 for i in liststr: if 'xml' in i and ' version=' in i: inboxlistfile.write('inbox 共'+str(inboxlistcount)+'信') inboxlistfile.write('\n') if 'name="id"' in i: t=t+1 inboxlistfile.write('第'+str(t)+'封:') inboxlistfile.write('\n') #寫入from beginnum=i.find('name="from"') endnum=i.find('</string>',beginnum) inboxlistfile.write('From:'+i[beginnum+12:endnum]) inboxlistfile.write('\n') #寫入to beginnum=i.find('name="to"') endnum=i.find('</string>',beginnum) inboxlistfile.write('TO:'+i[beginnum+10:endnum]) inboxlistfile.write('\n') #寫入subject beginnum=i.find('name="subject"') endnum=i.find('</string>',beginnum) inboxlistfile.write('Subject:'+i[beginnum+15:endnum]) inboxlistfile.write('\n') #寫入date: beginnum=i.find('name="sentDate"') endnum=i.find('</date>',beginnum) inboxlistfile.write('Date:'+i[beginnum+16:endnum]) inboxlistfile.write('\n') if 'name="read">true' in i: inboxlistfile.write('郵件狀態:已讀') inboxlistfile.write('\n') else: inboxlistfile.write('郵件狀態:未讀') inboxlistfile.write('\n') #寫用郵件尺寸 beginnum=i.find('name="size"') endnum=i.find('</int>',beginnum) inboxlistfile.write('郵件尺寸:'+i[beginnum+12:endnum]) inboxlistfile.write('\n') #寫入郵件編號,用于下載郵件 beginnum=i.find('name="id"') endnum=i.find('</string>',beginnum) inboxlistfile.write('郵件編號:'+i[beginnum+10:endnum]) inboxlistfile.write('\n\n') inboxlistfile.close() if __name__=='__main__': print("Edit @xiaowuyi V1.0 http://www.cnblogs.com/xiaowuyi") login = Login163('XXXX@163.com','AAAAA') flag = login.login() if type(flag) is bool: #login.letterdown() print("登陸成功,正在下載列表和通訊錄………………") login.minbox() res = login.address_list() addfile=open('addfile.txt','w') for x in res: addfile.write(x['email']) addfile.close() print("已完成") else: print(flag)
原文鏈接:http://www.cnblogs.com/xiaowuyi/archive/2012/05/21/2511428.html
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。