当前位置: 首页>后端>正文

python带你采集商家商品数据信息~带你点外卖不迷路

鍓嶈█

鍡ㄥ柦锛屽ぇ瀹跺ソ鍛€~杩欓噷鏄埍鐪嬬編濂崇殑鑼滆寽鍛?/p>

涓€鍛ㄥ伐浣滐紝蹇欏繖纰岀锛岃韩蹇冪柌鎯紝涓€鍒板懆鏈紝鍊掑ご澶х潯锛屾噿寰楀仛楗紝鐐逛釜澶栧崠銆?/p>

浠婂ぉ鎴戜滑瑕侀噰闆嗙殑缃戠珯鍛紝鏄浗鍐呯煡鍚嶇殑缃戜笂璁㈤骞冲彴~

鎴戝仛鐨勬槸閲囬泦鍟嗗鍟嗗搧鏁版嵁淇℃伅锛屼綘涔熷彲浠ラ噰闆嗗彟澶栨暟鎹摝~

python带你采集商家商品数据信息~带你点外卖不迷路,第1张

鍑嗗宸ヤ綔

涓嬮潰鐨勫敖閲忚窡鎴戜繚鎸佷竴鑷村摝~涓嶇劧鏈夊彲鑳戒細鍙戠敓鎶ラ敊 馃挄

鐜浣跨敤:

  • Python 3.8

  • Pycharm

妯″潡浣跨敤:

  • requests >>> pip install requests

  • csv

濡傛灉瀹夎python绗笁鏂规ā鍧?

  1. win + R 杈撳叆 cmd 鐐瑰嚮纭畾, 杈撳叆瀹夎鍛戒护 pip install 妯″潡鍚?(pip install requests) 鍥炶溅

  2. 鍦╬ycharm涓偣鍑籘erminal(缁堢) 杈撳叆瀹夎鍛戒护

濡備綍閰嶇疆pycharm閲岄潰鐨刾ython瑙i噴鍣?

  1. 閫夋嫨file(鏂囦欢) >>> setting(璁剧疆) >>> Project(椤圭洰) >>> python interpreter(python瑙i噴鍣?

  2. 鐐瑰嚮榻胯疆, 閫夋嫨add

  3. 娣诲姞python瀹夎璺緞

pycharm濡備綍瀹夎鎻掍欢?

  1. 閫夋嫨file(鏂囦欢) >>> setting(璁剧疆) >>> Plugins(鎻掍欢)

  2. 鐐瑰嚮 Marketplace 杈撳叆鎯宠瀹夎鐨勬彃浠跺悕瀛?姣斿:缈昏瘧鎻掍欢 杈撳叆 translation / 姹夊寲鎻掍欢 杈撳叆 Chinese

  3. 閫夋嫨鐩稿簲鐨勬彃浠剁偣鍑?install(瀹夎) 鍗冲彲

  4. 瀹夎鎴愬姛涔嬪悗 鏄細寮瑰嚭 閲嶅惎pycharm鐨勯€夐」 鐐瑰嚮纭畾, 閲嶅惎鍗冲彲鐢熸晥

鍩烘湰娴佺▼: <閫氱敤鐨?gt;

涓€. 鏁版嵁鏉ユ簮鍒嗘瀽

鍒嗘瀽娓呮鑷繁鎯宠鏁版嵁鍐呭, 鏄姹傞偅涓暟鎹寘<url>鍦板潃鍙互寰楀埌鐨?/p>

閫氳繃寮€鍙戣€呭伐鍏疯繘琛屾姄鍖呭垎鏋?..

I. 榧犳爣鍙抽敭鐐瑰嚮妫€鏌ワ紙鎴栨寜F12锛夊脊鍑哄紑鍙戣€呭伐鍏凤紝 閫夋嫨network 鐐瑰嚮绗簩椤垫暟鎹?绗竴涓暟鎹寘灏辨槸鎴戜滑鎯宠鏁版嵁鍐呭

浜? 浠g爜瀹炵幇姝ラ杩囩▼:

  1. 鍙戦€佽姹? 瀵逛簬妯℃嫙娴忚鍣ㄥ浜巙rl鍦板潃鍙戦€佽姹?/p>

  2. 鑾峰彇鏁版嵁, 鑾峰彇鏈嶅姟鍣ㄨ繑鍥炲搷搴旀暟鎹?---> 寮€鍙戣€呭伐鍏烽噷闈esponse

  3. 瑙f瀽鏁版嵁, 鎻愬彇鎴戜滑鎯宠鏁版嵁鍐呭

  4. 淇濆瓨鏁版嵁, 淇濆瓨琛ㄦ牸

棰濆 缈婚〉澶氶〉鏁版嵁閲囬泦

浠g爜灞曠ず

鍥犲鏍稿洜绱狅紝鎴戞妸浠g爜閲岀殑缃戝潃鍒犲幓鍟浣犱滑鍙互鑷娣诲姞涓€涓嬪憪銆?/p>

鍙戦€佽姹?/h3>

瀵逛簬妯℃嫙娴忚鍣ㄥ浜巙rl鍦板潃鍙戦€佽姹?/p>

#     - 濡備綍鎵归噺鏇挎崲鍐呭
#         閫変腑鏇挎崲鍐呭, 鎸塩trl + R 杈撳叆姝e垯琛ㄨ揪寮忓懡浠?
#         (.*?): (.*)
#         '': '',
#     - <Response [403]> 杩斿洖鍝嶅簲瀵硅薄 403 鐘舵€佺爜
#         澶у鏁拌姹? 鍔犻槻鐩楅摼鍗冲彲瑙e喅
for page in range(0, 320, 32):
    time.sleep(1)

纭畾璇锋眰url鍦板潃

    url = 'https://apimobile..com/group/v4/poi/pcsearch/70'

璇锋眰鍙傛暟

瀛楀吀鏁版嵁绫诲瀷, 鏋勫缓瀹屾暣閿€煎

    data = {
        'uuid': '4b9d79d54b524ab5a319.1656309336.1.0.0',
        'userid': '266252179',
        'limit': '32',
        'offset': page,
        'cateId': '-1',
        'q': '浼氭墍',
    }
    # 璇锋眰澶?  浼, 鎶妏ython浠g爜浼(妯℃嫙)鎴愭祻瑙堝櫒鍙戦€佽姹?
    headers = {
        # Referer 闃茬洍閾?鍛婅瘔鏈嶅姟鍣ㄦ垜浠姹倁rl鍦板潃浠庡摢閲岃烦杞繃鏉ョ殑
        'Referer': 'https://chs..com/',
        # 銆€User-Agent銆€鐢ㄦ埛浠g悊銆€琛ㄧず娴忚鍣ㄥ熀鏈韩浠芥爣璇?
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
    }

鍙戦€佽姹?/h4>

<Response [403]> 杩斿洖鍝嶅簲瀵硅薄 403 鐘舵€佺爜 琛ㄧず璇锋眰浣犳病鏈夎闂潈闄?<璇存槑娌¤姹傛垚鍔?gt; 200 琛ㄧず璇锋眰鎴愬姛

    response = requests.get(url=url, params=data, headers=headers)
    print(response)

鑾峰彇鏁版嵁

鑾峰彇鍝嶅簲瀵硅薄json瀛楀吀鏁版嵁

    # print(response.json())
    # pprint.pprint(response.json())

瑙f瀽鏁版嵁

---> 鏍规嵁寰楀埌鏁版嵁, 閫夋嫨鏈€浣宠В鏋愭柟娉?鐩存帴閫氳繃閿€煎鍙栧€兼柟寮?鏍规嵁鍐掑彿宸﹁竟鐨勫唴瀹?code>[閿甝, 鎻愬彇鍐掑彿鍙宠竟鐨勫唴瀹?code>[鍊糫)

    # 鎶婂垪琛ㄩ噷鐨勬暟鎹竴涓竴涓彁鍙栧嚭鏉? 鎬庝箞鎿嶄綔鐢ㄤ粈涔堟柟娉? ---> for寰幆閬嶅巻
    for index in response.json()['data']['searchResult']:
        href = f'https://www..com/xiuxianyule/{index["id"]}/'
        # 濡傛灉鎴戜笉鐐掕偂, 浠栬瑕佽偂绁ㄦ暟鎹? 缁欐垜閽?璁╂垜缁欏ス閲囬泦
        dit = {
            '鏍囬': index['title'],
            '鍟嗗湀': index['areaname'],
            '搴楅摵绫诲瀷': index['backCateName'],
            '璇勫垎': index['avgscore'],
            '浜哄潎娑堣垂': index['avgprice'],
            '璇勮閲?: index['comments'],
            '缁忓害': index['longitude'],
            '缁村害': index['latitude'],
            '璇︽儏椤?: href,
        }
        csv_writer.writerow(dit)
        print(dit)
import requests
import re
url = 'https://www..com/xiuxianyule/1718272961/'
headers = {
    'Host': 'www..com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36',
}
response = requests.get(url=url, headers=headers)
# print(response.text)
shop_info = re.findall('"cityName":"(.*?)","cityPy":"chs","brandName":"","shopName":"(.*?)","score":(.*?),"avgPrice":(.*?),"address":"(.*?)","phone":"(.*?)","openTime":"(.*?)"', response.text)
print(shop_info)

澶氶〉鍒楄〃鏁版嵁閲囬泦:

# 瀵煎叆鏁版嵁璇锋眰妯″潡 ---> 绗笁鏂规ā鍧?闇€瑕?鍦╟md閲岄潰 pip install requests
import time
import requests
# 瀵煎叆鏍煎紡鍖栬緭鍑烘ā鍧? ---> 鍐呯疆妯″潡...
import pprint
# 瀵煎叆csv妯″潡
import csv

# 鍒涘缓鏂囦欢
f = open('1.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    '鏍囬',
    '鍟嗗湀',
    '搴楅摵绫诲瀷',
    '璇勫垎',
    '浜哄潎娑堣垂',
    '璇勮閲?,
    '缁忓害',
    '缁村害',
    '璇︽儏椤?,
])
csv_writer.writeheader()

灏捐 馃挐

鎰熻阿浣犺鐪嬫垜鐨勬枃绔犲憪~鏈鑸彮鍒拌繖閲屽氨缁撴潫鍟? 馃洭

甯屾湜鏈瘒鏂囩珷鏈夊浣犲甫鏉ュ府鍔?馃帀锛屾湁瀛︿範鍒颁竴鐐圭煡璇唦

韬茶捣鏉ョ殑鏄熸槦馃崶涔熷湪鍔姏鍙戝厜锛屼綘涔熻鍔姏鍔犳补锛堣鎴戜滑涓€璧峰姫鍔涘彮锛夈€?/p>

涓嶇煡閬撹瘎璁哄暐鐨勶紝鍗充娇鎵d釜6666涔熸槸瀵瑰崥涓荤殑榧撹垶鍚?馃挒 鎰熻阿 馃拹

python带你采集商家商品数据信息~带你点外卖不迷路,第2张

https://www.xamrdz.com/backend/3y81997481.html

相关文章: