用Python抓取《长安十二时辰》
简介
19年雷佳音、易烊千玺主演的电视剧
《长安十二时辰》是由曹盾执导,雷佳音、易烊千玺领衔主演的古装悬疑剧。
该剧改编自马伯庸的同名小说,讲述了唐朝上元节前夕,长安城陷入危局,长安死囚张小敬临危受命,与李必携手在十二时辰内拯救长安的故事。
该剧于2019年6月27日在优酷视频播出
看原版小说
- 链接 https://www.luoxia.com/shiershichen/
代码
# coding:utf-8 # desc: 落霞小说网 爬小说 # https://www.luoxia.com/shiershichen/ import requests from lxml import etree s = requests.Session() s.keep_alive = False def getHtml(url): r = s.get(url,headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36'}) #print(r.text) html = etree.HTML(r.text) urls_text = html.xpath("//div[@id='content-list']/div/ul/li/a/text()") urls = html.xpath("//div[@id='content-list']/div/ul/li/a/@href") return urls,urls_text def getContent(url,filename): r = s.get(url,headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36','referer': 'https://www.luoxia.com/shiershichen'}) # print(r.text) html = etree.HTML(r.text) contents = html.xpath("//div[@id='nr1']/p/text()") with open(filename,'a', encoding='utf-8') as f: for x in contents: f.write(x + '\n') if __name__ == "__main__": url = "https://www.luoxia.com/shiershichen/" muluhtml = getHtml(url) filename ='长安十二时辰.txt' for x in range(len(muluhtml[0])): with open(filename,'a', encoding='utf-8') as f: f.write('\n' + muluhtml[1][x] + '\n') getContent(muluhtml[0][x],filename) print("正在抓取 长安十二时辰 %s , 章节名是 %s " % (muluhtml[0][x],muluhtml[1][x]))- 效果

评论已关闭