https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,...
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/648955494.html,648955494,光谷广场 洪福添美城市广场 3室1厅,距2号线 11号线光谷广场站950米,15㎡,11,3室1卫,朝南,合,1350
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/648955494.html,648955494,光谷广场 洪福添美城市广场 3室1厅,距2号线 11号线光谷广场站950米,15㎡,11,3室1卫,朝南,合,1350
https://www.danke.com/room/151736897.html,151736897,小龟山 群建星城 3室1厅,距2号线 12号线小龟山站250米,13㎡,34,3室1卫,朝南,合,1270
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/648955494.html,648955494,光谷广场 洪福添美城市广场 3室1厅,距2号线 11号线光谷广场站950米,15㎡,11,3室1卫,朝南,合,1350
https://www.danke.com/room/151736897.html,151736897,小龟山 群建星城 3室1厅,距2号线 12号线小龟山站250米,13㎡,34,3室1卫,朝南,合,1270
https://www.danke.com/room/227757325.html,227757325,小龟山 洪山花园 3室1厅,距2号线 12号线小龟山站600米,13㎡,2,3室1卫,朝南,合,1220
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/648955494.html,648955494,光谷广场 洪福添美城市广场 3室1厅,距2号线 11号线光谷广场站950米,15㎡,11,3室1卫,朝南,合,1350
https://www.danke.com/room/151736897.html,151736897,小龟山 群建星城 3室1厅,距2号线 12号线小龟山站250米,13㎡,34,3室1卫,朝南,合,1270
https://www.danke.com/room/227757325.html,227757325,小龟山 洪山花园 3室1厅,距2号线 12号线小龟山站600米,13㎡,2,3室1卫,朝南,合,1220
https://www.danke.com/room/961633801.html,961633801,街道口 未来城 3室1厅,距2号线街道口站100米,11㎡,24,3室1卫,朝南,合,1600
https://www.danke.com/room/1424290139.html,1424290139,十五中 凯乐花园 5室1厅,距12号线十五中站350米,13㎡,18,5室2卫,朝南,合,1340
https://www.danke.com/room/648955494.html,648955494,光谷广场 洪福添美城市广场 3室1厅,距2号线 11号线光谷广场站950米,15㎡,11,3室1卫,朝南,合,1350
https://www.danke.com/room/151736897.html,151736897,小龟山 群建星城 3室1厅,距2号线 12号线小龟山站250米,13㎡,34,3室1卫,朝南,合,1270
https://www.danke.com/room/227757325.html,227757325,小龟山 洪山花园 3室1厅,距2号线 12号线小龟山站600米,13㎡,2,3室1卫,朝南,合,1220
https://www.danke.com/room/961633801.html,961633801,街道口 未来城 3室1厅,距2号线街道口站100米,11㎡,24,3室1卫,朝南,合,1600
import requests
from bs4 import BeautifulSoup
import pymysql
i = 1
while i < 2:
print('*' * 50)
print('有一下几个城市可以选择:')
print('北京,深圳,上海\n杭州,天津,武汉\n南京,广州,成都\n苏州,无锡,西安\n重庆')
print('*' * 50)
city = input('请输入城市:')
if city == '北京':
selection = 'bj'
elif city == '深圳':
selection = 'sz'
elif city == '上海':
selection = 'sh'
elif city == '杭州':
selection = 'hz'
elif city == '天津':
selection = 'tj'
elif city == '武汉':
selection = 'wh'
elif city == '南京':
selection = 'nj'
elif city == '广州':
selection = 'gz'
elif city == '成都':
selection = 'cd'
elif city == '苏州':
selection = 'gs'
elif city == '西安':
selection = 'xa'
elif city == '重庆':
selection = 'cq'
else:
print()
print()
print('请正确输入!!!!')
i = 0
i += 1
# 把数据保存到本地
def save_data_in_iocal(data):
with open('./data/office/公寓.csv', 'a', encoding='utf-8') as file:
# 一行一行的写入
for house in data:
file.write('%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n'%(house['house_URL'],house['house_id'],house['house_name'],house['house_far'],house['house_area'],house['house_high'],house['house_layout'],house['house_orientation'],house['house_share'],house['house_price']))
# 把数据保存到数据库
def save_data_in_mysql(data):
# 连接数据库
db=pymysql.connect('localhost','root','mysql','蛋壳')
# 获得游标对象
cursor= db.cursor()
# 每一个house都要保存到数据库
for house in data:
sql = "insert into house values(\'%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s\')"%(house['house_id'],house['house_URL'],house['house_name'],house['house_far'],house['house_area'],house['house_high'],house['house_layout'],house['house_orientation'],house['house_share'],house['house_price'])
# 执行sql
cursor.execute(sql)
# 提交
db.commit()
pass
# 爬取蛋壳公寓一页的房源
def get_danke_one(page_num,selection):
url = 'https://www.danke.com/room/' + selection
url1 = url +('?page='+str(page_num))
resp=requests.get(url1)
# 获得返回值
html=resp.text
# BS4解析一下
html=BeautifulSoup(html,'html.parser')
# 拿到房屋列表
house_list=html.find_all('div',attrs={'class':'r_ls_box'})[0].find_all('div',attrs={'class':'r_lbx'})
# 一页的所有房子
one_page_house = []
# 获得每个房子
for house in house_list:
# 房源URL
house_URL = house.find_all('div',attrs={'class':'r_lbx_cen'})[0].find_all('div',attrs={'class':'r_lbx_cena'})[0].a['href']
# 房源id
house_id = house_URL.split('/')[-1][:-5]
# 房源名称
house_name =house.find_all('div',attrs={'class':'r_lbx_cen'})[0].find_all('div',attrs={'class':'r_lbx_cena'})[0].a['title']
# 距离各站点的米数
house_far =house.find_all('div',attrs={'class':'r_lbx_cen'})[0].find_all('div',attrs={'class':'r_lbx_cena'})[1].text.strip().replace(',',' ')
house_infos =house.find_all('div',attrs={'class','r_lbx_cen'})[0].find_all('div',attrs={'class','r_lbx_cenb'})[0].text.strip()
info = house_infos.split("|")
# 面积
house_area = info[0].strip()[5:]
# 楼层
house_high = info[1].strip()[:-1]
# 户型
house_layout = info[2].strip()
# 朝向
infos = info[3].strip()
house_orientation = infos[:-1].strip()
# 合租或整租
house_share = infos[-1:]
# print(house_area)
# print(house_high)
# print(house_share)
# print(house_orientation)
# 价格
house_price = house.find_all('div',attrs={'class':'r_lbx_money'})[0].find_all('div',attrs={'class','r_lbx_moneya'})[0].find_all('span')[0].text.replace('\n','')
# 每遍历完一个,添加进去
one_page_house.append({'house_URL':house_URL,'house_id':house_id,'house_name':house_name,'house_far':house_far,'house_area':house_area,'house_high':house_high,'house_layout':house_layout,'house_orientation':house_orientation,'house_share':house_share,'house_price':house_price,})
save_data_in_iocal(one_page_house)
#保存本地 表格数据
# save_data_in_mysql(one_page_house)
print('第{}页进行爬取完毕'.format(page_num))
for i in range(1,500):
try:
get_danke_one(i,selection)
except BaseException as e:
print(e)
continue
蛋壳公寓爬取
前面是爬出来的效果 后面是代码