运行爬虫工具需要本地电脑上保证已安装docker,因为该工具是打包为了docker镜像。
1.下载并运行docker容器并指定本地文件路径的映射关系,冒号后面为本地所在的路径,下载好的数据会生成在该路径下(端口号也可以任意指定,只要跟你本地端口不冲突)
docker? run? --name? spidertool? -v? your_local_path:/data -d???windboy/spider_tool_common
2. 进入docker容器内
docker exec -it? spidertool? bash
3.输入python3
from spider_tool_common import?spider_tool_common as spidertool
params_json = {
"spider_name":"zj_jdggzy_bidding",
"loop_num":"10",
"start_index":"1",
"multi_factor":"25",
"pagelist_get_index":"",
"pagelist_groups_resolving":"//table[@class='GridView']//tr[@class='Row']",
"pagelist_url_resolving":".//a/@href",
"detailpage_fields_prvnce_name":"浙江省",
"detailpage_fields_latn_name":"杭州市",
"detailpage_fields_country_name":"建德市",
"detailpage_fields_inter_name":"杭州市公共资源交易中心建德分中心",
"detailpage_fields_table_names":"dict_winbidder_test_01",
"detailpage_fields_inter_type":"2",
}
spidertool.insert_params(params_json)? #写入xpath参数
spidertool.crawl_data()? ?#开始爬取
4.获得爬取结果
进入第二步指定的本地路径下会看到csv结尾的已经爬取数据