Python协程异步爬取数据(asyncio+aiohttp)实例
作者:YiYa_咿呀
这篇文章主要为大家介绍了Python协程异步爬取数据(asyncio+aiohttp)实现示例,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪
使用asyncio+aiohttp异步爬取一部小说
思路
里面涉及到异步文件的读写aiofiles,同时在发送请求时涉及到将字典转化为字符串,接受响应时将字符串转化为字典,故这个里面涉及到json库,同时在请求下载链接的cid和title时使用的是同步获取一条请求的响应,故其为同步操作,使用requests库
代码
import requests
import aiohttp
import asyncio
import json
import aiofiles
# url = 'http://dushu.baidu.com/api/pc/getCatalog?data={%22book_id%22:%224306063500%22}'
# bookid = 'http://dushu.baidu.com/api/pc/getChapterContent?data={%22book_id%22:%224306063500%22,%22cid%22:%224306063500|1569782244%22,%22need_bookinfo%22:1'
async def downloadNovel(cid,title,bid):
data2 = {
"book_id": bid,
"cid": f"{bid}|{cid}",
"need_bookinfo": 1
}
# 将字典转化为字符串
data2 = json.dumps(data2)
# 创建请求链接
bookurl = f'http://dushu.baidu.com/api/pc/getChapterContent?data={data2}'
# 这里老是忘记打括号 aiohttp.ClientSession()
async with aiohttp.ClientSession() as session:
async with session.get(bookurl) as resp:
# 等待结果返回
dic = await resp.json()
# 设置编码格式encoding='utf-8'
async with aiofiles.open(f'./articles/{title}',mode = 'w',encoding='utf-8') as f:
# 异步将内容写入文件
await f.write(dic['data']['novel']['content'])
async def getCataList(url):
# 同步爬取所有的章节相关信息
resp = requests.get(url)
# 将返回的字符转化为字典形式
dic = resp.json()
# print(dic)
# 创建一个空对象用于存储异步任务
tasks = []
# 循环创建异步任务并且添加至tasks中
for item in dic['data']['novel']['items']:
title = item['title']
cid = item['cid']
tasks.append(asyncio.create_task(downloadNovel(cid,title,bid)))
print(title,cid)
# 执行异步任务
await asyncio.wait(tasks)
if __name__ == '__main__':
bid = "4306063500"
url = 'http://dushu.baidu.com/api/pc/getCatalog?data={"book_id":"' + bid + '"}'
print(url)
asyncio.run(getCataList(url))效果如下

数据爬取成功
以上就是Python协程异步爬取数据(asyncio+aiohttp)实例的详细内容,更多关于Python协程异步爬取数据的资料请关注脚本之家其它相关文章!
