中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

LogParser:一個用于定期增量式解析 Scrapy 爬蟲日志的 Python 庫

發布時間:2020-05-29 16:56:52 來源:網絡 閱讀:950 作者:my8100 欄目:編程語言

GitHub 開源

my8100 / logparser

安裝

  • 通過 pip:

    pip install logparser
  • 通過 git:
    git clone https://github.com/my8100/logparser.git
    cd logparser
    python setup.py install

使用方法

作為 service 運行

  1. 請先確保當前主機已經安裝和啟動 Scrapyd
  2. 通過命令 logparser 啟動 LogParser
  3. 訪問 http://127.0.0.1:6800/logs/stats.json (假設 Scrapyd 運行于端口 6800)
  4. 訪問 http://127.0.0.1:6800/logs/projectname/spidername/jobid.json 以獲取某個爬蟲任務的日志分析詳情

配合 ScrapydWeb 實現爬蟲進度可視化

詳見 my8100 / scrapydweb
LogParser:一個用于定期增量式解析 Scrapy 爬蟲日志的 Python 庫

在 Python 代碼中使用

In [1]: from logparser import parse

In [2]: log = """2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)
   ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
   ...: {'downloader/exception_count': 3,
   ...:  'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3,
   ...:  'downloader/request_bytes': 1336,
   ...:  'downloader/request_count': 7,
   ...:  'downloader/request_method_count/GET': 7,
   ...:  'downloader/response_bytes': 1669,
   ...:  'downloader/response_count': 4,
   ...:  'downloader/response_status_count/200': 2,
   ...:  'downloader/response_status_count/302': 1,
   ...:  'downloader/response_status_count/404': 1,
   ...:  'dupefilter/filtered': 1,
   ...:  'finish_reason': 'finished',
   ...:  'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719),
   ...:  'httperror/response_ignored_count': 1,
   ...:  'httperror/response_ignored_status_count/404': 1,
   ...:  'item_scraped_count': 2,
   ...:  'log_count/CRITICAL': 5,
   ...:  'log_count/DEBUG': 14,
   ...:  'log_count/ERROR': 5,
   ...:  'log_count/INFO': 75,
   ...:  'log_count/WARNING': 3,
   ...:  'offsite/domains': 1,
   ...:  'offsite/filtered': 1,
   ...:  'request_depth_max': 1,
   ...:  'response_received_count': 3,
   ...:  'retry/count': 2,
   ...:  'retry/max_reached': 1,
   ...:  'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2,
   ...:  'scheduler/dequeued': 7,
   ...:  'scheduler/dequeued/memory': 7,
   ...:  'scheduler/enqueued': 7,
   ...:  'scheduler/enqueued/memory': 7,
   ...:  'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)}
   ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"""

In [3]: d = parse(log, headlines=1, taillines=1)

In [4]: d
Out[4]:
OrderedDict([('head',
              '2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),
             ('tail',
              '2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),
             ('first_log_time', '2018-10-23 18:28:34'),
             ('latest_log_time', '2018-10-23 18:29:42'),
             ('elapsed', '0:01:08'),
             ('first_log_timestamp', 1540290514),
             ('latest_log_timestamp', 1540290582),
             ('datas', []),
             ('pages', 3),
             ('items', 2),
             ('latest_matches',
              {'resuming_crawl': '',
               'latest_offsite': '',
               'latest_duplicate': '',
               'latest_crawl': '',
               'latest_scrape': '',
               'latest_item': '',
               'latest_stat': ''}),
             ('latest_crawl_timestamp', 0),
             ('latest_scrape_timestamp', 0),
             ('log_categories',
              {'critical_logs': {'count': 5, 'details': []},
               'error_logs': {'count': 5, 'details': []},
               'warning_logs': {'count': 3, 'details': []},
               'redirect_logs': {'count': 1, 'details': []},
               'retry_logs': {'count': 2, 'details': []},
               'ignore_logs': {'count': 1, 'details': []}}),
             ('shutdown_reason', 'N/A'),
             ('finish_reason', 'finished'),
             ('last_update_timestamp', 1547559048),
             ('last_update_time', '2019-01-15 21:30:48')])

In [5]: d['elapsed']
Out[5]: '0:01:08'

In [6]: d['pages']
Out[6]: 3

In [7]: d['items']
Out[7]: 2

In [8]: d['finish_reason']
Out[8]: 'finished'
向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

金堂县| 嘉兴市| 宁城县| 乐清市| 象山县| 卢龙县| 长武县| 松阳县| 名山县| 保定市| 靖安县| 西宁市| 万安县| 九龙县| 武威市| 肃宁县| 安丘市| 任丘市| 无为县| 宁都县| 宜昌市| 东源县| 文昌市| 泗阳县| 陵川县| 吴旗县| 忻城县| 赣州市| 安图县| 香格里拉县| 贵港市| 勃利县| 扶余县| 黔西县| 九台市| 普定县| 敦化市| 汕尾市| 镇巴县| 土默特左旗| 忻城县|