【问题描述】编写了以下代码抓取网页源代码写入文本文件:
import requests url = 'http://fund.eastmoney.com/company/default.html' res = requests.request('GET',url) if (res.status_code == 200): with open('d:/py/content.txt','w') as f: f.write(res.content) with open('d:/py/text.txt','w') as f: f.write(res.text)
运行后抛出以下2个错误:
TypeError: write() argument must be str, not bytes
UnicodeEncodeError: 'gbk' codec can't encode character '\xef' in position 0: illegal multibyte sequence
【小千解答】修改代码如下:
import requests url = 'http://fund.eastmoney.com/company/default.html' res = requests.request('GET',url) if (res.status_code == 200): with open('d:/py/content.txt','wb') as f: # w改成wb f.write(res.content) with open('d:/py/text.txt','w',encoding=res.encoding) as f: # 设置encoding参数 f.write(res.text)
注解:res.content返回bytes类型,res.text返回str类型,但和文本文件的编码不匹配。
【参考资料】