龙空技术网

python爬取斗图网表情包,再也不怕和沙雕网友斗图了

幺猫折耳鹿 108

前言:

此刻同学们对“python表情包搞笑”都比较重视,小伙伴们都想要了解一些“python表情包搞笑”的相关资讯。那么小编也在网上汇集了一些有关“python表情包搞笑””的相关知识,希望姐妹们能喜欢,我们一起来学习一下吧!

今天也是闲的无聊,看到好多沙雕图,就想爬下来

话不多说,先上代码:

import requests

import urllib

import os

import threading

from bs4 import BeautifulSoup

BASE_PAGE_URL=''

#所有页面URL列表

PAGE_URL_LIST=[]

#所有表情的URL列表

FACE_URL_LIST=[]

#全局锁

glock=threading.Lock()

for x in range(1,101):

url=BASE_PAGE_URL+str(x)

PAGE_URL_LIST.append(url)

def producer():

while True:

glock.acquire()

if len(PAGE_URL_LIST) == 0:

glock.release()

break

else:

url_page=PAGE_URL_LIST.pop()

glock.release()

response = requests.get(url_page)

content = (response.content)

soup = BeautifulSoup(content,'lxml')

img_list=soup.find_all('img',attrs={'class':'img-responsive lazy image_dta'})

glock.acquire()

for img in img_list:

url = img['data-original']

if not url.startswith('http'):

url = 'http:'+url

FACE_URL_LIST.append(url)

glock.release()

def custumer():

while True:

glock.acquire()

if len(FACE_URL_LIST) == 0:

glock.release()

continue

else:

face_url = FACE_URL_LIST.pop()

glock.release()

spit_list = face_url.split('/')

filename = spit_list.pop()[:-4]

path=os.path.join('biaoqing',filename)

urllib.request.urlretrieve(face_url,filename=path)

def main():

#创建两个生产者用于爬取

for x in range(3):

th=threading.Thread(target=producer)

th.start()

#创建4个线程作为消费者,用于下载

for x in range(5):

th=threading.Thread(target=custumer)

th.start()

if __name__ == "__main__":

main()

这是爬完后的文件夹

心情一好就爬了100页哈哈~面对五千多张表情包我也是很快乐的

如果想要表情包去和别人PK,关注公众号“幺猫折耳鹿”,回复“表情包”即可获得。

标签: #python表情包搞笑