中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Python基于pandas實現json格式轉換成dataframe的方法

發布時間:2020-09-13 21:54:08 來源:腳本之家 閱讀:532 作者:zn505119020 欄目:開發技術

本文實例講述了Python基于pandas實現json格式轉換成dataframe的方法。分享給大家供大家參考,具體如下:

# -*- coding:utf-8 -*-
#!python3
import re
import json
from bs4 import BeautifulSoup
import pandas as pd
import requests
import os
from pandas.io.json import json_normalize
class image_structs():
  def __init__(self):
    self.picture_url = {
      "image_id": '',
      "picture_url": ''
    }
class data_structs():
  def __init__(self):
    # columns=['title', 'item_url', 'id','picture_url','std_desc','description','information','fitment'])
    self.info={
      "title":'',
      "item_url":'',
      "id":0,
      "picture_url":[],
      "std_desc":'',
      "description":'',
      "information":'',
      "fitment":''
    }
# "https://waldoch.com/store/catalogsearch/result/index/?cat=0&limit=200&p=1&q=nerf+bar"
# https://waldoch.com/store/new-oem-ford-f-150-f150-5-running-boards-nerf-bar-crew-cab-2015-w-brackets-fl34-16451-ge5fm6.html
def get_item_list(outfile):
  result = []
  for i in range(6):
    print(i)
    i = str(i+1)
    url = "https://waldoch.com/store/catalogsearch/result/index/?cat=0&limit=200&p="+i+"&q=nerf+bar"
    web = requests.get(url)
    soup = BeautifulSoup(web.text,"html.parser")
    alink = soup.find_all("a",class_="product-image")
    for a in alink:
      title = a["title"]
      item_url = a["href"]
      result.append([title,item_url])
  df = pd.DataFrame(result,columns=["title","item_url"])
  df = df.drop_duplicates()
  df["id"] =df.index
  df.to_excel(outfile,index=False)
def get_item_info(file,outfile):
  DEFAULT_FALSE = ""
  df = pd.read_excel(file)
  for i in df.index:
    id = df.loc[i,"id"]
    if os.path.exists(str(int(id))+".xlsx"):
      continue
    item_url = df.loc[i,"item_url"]
    url = item_url
    web = requests.get(url)
    soup = BeautifulSoup(web.text, "html.parser")
    # 圖片
    imglink = soup.find_all("img", class_=re.compile("^gallery-image"))
    data = data_structs()
    data.info["title"] = df.loc[i,"title"]
    data.info["id"] = id
    data.info["item_url"] = item_url
    for a in imglink:
      image = image_structs()
      image.picture_url["image_id"] = a["id"]
      image.picture_url["picture_url"]=a["src"]
      print(image.picture_url)
      data.info["picture_url"].append(image.picture_url)
    print(data.info)
    # std_desc
    std_desc = soup.find("div", itemprop="description")
    try:
      strings_desc = []
      for ii in std_desc.stripped_strings:
        strings_desc.append(ii)
      strings_desc = "\n".join(strings_desc)
    except:
      strings_desc=DEFAULT_FALSE
    # description
    try:
      desc = soup.find('h3', text="Description")
      desc = desc.find_next()
    except:
      desc=DEFAULT_FALSE
    description=desc
    # information
    try:
      information = soup.find("h3", text='Information')
      desc = information
      desc = desc.find_next()
    except:
      desc=DEFAULT_FALSE
    information = desc
    # fitment
    try:
      fitment = soup.find('h3', text='Fitment')
      desc = fitment
      desc = desc.find_next()
    except:
      desc=DEFAULT_FALSE
    fitment=desc
    data.info["std_desc"] = strings_desc
    data.info["description"] = str(description)
    data.info["information"] = str(information)
    data.info["fitment"] = str(fitment)
    print(data.info.keys())
    singledf = json_normalize(data.info,"picture_url",['title', 'item_url', 'id', 'std_desc', 'description', 'information', 'fitment'])
    singledf.to_excel("test.xlsx",index=False)
    exit()
    # print(df.ix[i])
  df.to_excel(outfile,index=False)
# get_item_list("item_urls.xlsx")
get_item_info("item_urls.xlsx","item_urls_info.xlsx")

這里涉及到的幾個Python模塊都可以使用pip install命令進行安裝,如:

pip install BeautifulSoup4

pip install xlrd

pip install openpyxl

PS:這里再為大家推薦幾款比較實用的json在線工具供大家參考使用:

在線JSON代碼檢驗、檢驗、美化、格式化工具:
http://tools.jb51.net/code/json

JSON在線格式化工具:
http://tools.jb51.net/code/jsonformat

在線XML/JSON互相轉換工具:
http://tools.jb51.net/code/xmljson

json代碼在線格式化/美化/壓縮/編輯/轉換工具:
http://tools.jb51.net/code/jsoncodeformat

在線json壓縮/轉義工具:
http://tools.jb51.net/code/json_yasuo_trans

更多Python相關內容感興趣的讀者可查看本站專題:《Python操作json技巧總結》、《Python編碼操作技巧總結》、《Python數據結構與算法教程》、《Python函數使用技巧總結》、《Python字符串操作技巧匯總》、《Python入門與進階經典教程》及《Python文件與目錄操作技巧匯總》

希望本文所述對大家Python程序設計有所幫助。

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

沭阳县| 漳浦县| 延川县| 洛川县| 浮山县| 临夏县| 鹿邑县| 平邑县| 宝应县| 弥渡县| 淮北市| 西乡县| 绥中县| 营山县| 军事| 南城县| 澄迈县| 神农架林区| 长垣县| 沧州市| 朝阳县| 合肥市| 利津县| 黔江区| 奇台县| 曲靖市| 宕昌县| 广西| 阜平县| 平谷区| 吐鲁番市| 辽阳市| 广南县| 望江县| 木里| 周口市| 肇东市| 洪雅县| 常德市| 霍州市| 简阳市|