Python 爬蟲之數據解析模塊bs4基礎

發布時間：2020-08-02 16:58:49 來源：網絡閱讀：447 作者：insist_way 欄目：編程語言

介紹：

最近在學Python爬蟲，在這里對數據解析模塊bs4做個學習筆記。

用途：

bs4用于解析xml文檔，而html只是xml的一種

bs4 官方文檔地址：

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

學習筆記：

from bs4 import BeautifulSoup

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were

<a class=... ... ... ... ... ... "sister" id="link1">Elsie</a>,

<a class="sister" id="link2">Lacie</a> and

<a class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.

...

"""

soup = BeautifulSoup(html_doc,'html.parser')? ? #創建一個BeautifulSoup對象，添加html文件解析器，在不同平臺可能不同，在Linux上就不需要

print(soup.prettify())? ? #美化輸出

print(soup.get_text())? ??#將html_doc變量中保存的全部內容輸出(Linux系統會以\n隔開)

print('')

print(type(soup.title))

print(dir(soup.title))

print(soup.title)? ? #獲取html標題

????<title>The Dormouse's story</title>

print(soup.title.text)? ? #獲取html標題內容

????"The Dormouse's story"

print(soup.a)? ? ? ?#獲取a標簽(第一個)

????<a class="sister" id="link1">Elsie</a>

print(soup.a.attrs)? ?#獲取第一個a標簽的所有屬性，組成一個字典

????{'href': 'http://example.com/elsie', 'class': ['sister'], 'id': 'link1'}

print(soup.a.attrs['href'])? ? #獲取第一個a標簽的href屬性

????'http://example.com/elsie'

print(soup.a.has_attr('class'))? ? ?#判斷class屬性是否存在

????True

print(soup.p)? ? #獲取p標簽(第一個)

????The Dormouse's story

print(soup.p.children)? ? #獲取第一個p標簽下的所有子節點

????<list_iterator object at 0x7fe8185261d0>

print(list(soup.p.children))

????[The Dormouse's story]

print(list(soup.p.children)[0])

????The Dormouse's story

print(list(soup.p.children)[0].text)

????"The Dormouse's story"

print(soup.find_all('a'))? ? #獲取所有的a標簽

????[<a class="sister" id="link1">Elsie</a>, <a class="sister" id=a class="sister" id="link3">Tillie</a>]

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python 爬蟲之數據解析模塊bs4基礎

猜你喜歡

中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

Python 爬蟲之數據解析模塊bs4基礎

猜你喜歡

最新資訊

相關推薦

相關標簽