龙空技术网

python+selenium爬虫51job时的滑块验证

yangyunjpi 123

前言:

现时我们对“前端jquery滑动验证码案例”大约比较着重,同学们都需要学习一些“前端jquery滑动验证码案例”的相关文章。那么小编在网上收集了一些对于“前端jquery滑动验证码案例””的相关文章,希望我们能喜欢,你们一起来了解一下吧!

1、滑块验证,网上很多,但某数字job网站更新后会导致很多代码不完全可用。我用的是python+selenium,大部分代码来自网上,并在代码中引明了出处。代码在2023年8月6日自动通过验证是极大概率事件。

2、我没有用executable_path,可能与操作系统不同有关。

# driver = webdriver.Chrome(executable_path=DRIVER_PATH, options=chrome_options)driver = webdriver.Chrome(options=chrome_options)

3、有人说需要修改chromedriver可执行文件的特征码,某数字job网站不需要。

4、chrome_options我设置了下面几项,其它网站可能不同。

chrome_options = webdriver.ChromeOptions()chrome_options.add_argument('--start-maximized')chrome_options.add_argument('--disable-gpu')chrome_options.add_experimental_option('useAutomationExtension', False)chrome_options.add_experimental_option("detach", True)

5、通过是否隐藏滑块,直接对比像素值即可计算出滑块需要滑动的距离。网上有些调用opencv处理的,某数字job网站不需要。

6、个人认为拖动滑块时默认duration=250ms需要改小,网上很多说修改库的源代码,完全不需要也不应该,下方代码即可:

self.action_chains = ActionChains(driver=self.driver, duration=50)

7、即使在上面的基础上根据一些策略形成轨迹点,也有一定几率验证不通过。后来查到一篇参照jquery.easing缓动函数形成轨迹点的文章,效果很好!

8、我希望实现一个拦截器,新页面加载完毕后检测是否有滑块验证(担心爬取数据多了后,在爬取过程中来一下)。试了很多方法,一直没有成功。当然可以在每次模拟点击后调用检测函数,但显得不专业:-( 有会的麻烦指点一二,谢谢!

下面给出我修改后的源代码。欢迎讨论!

1)AccessCode.py:

# 代码来自: selenium import webdriverfrom selenium.webdriver import ActionChains  # 破解滑动验证码的时候用的 可以拖动图片from selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom PIL import Imagefrom io import BytesIOimport timeimport easingclass AccessCode(object):    def __init__(self, web_driver):        self.driver = web_driver        # 改变默认的duration(250ms)        self.action_chains = ActionChains(driver=self.driver, duration=50)        self.wait = WebDriverWait(driver, 20)        self.border = 6  # 设置偏差值    def get_position(self):        """        获取验证码位置        :return: 验证码位置元组        """        img = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'geetest_window')))        time.sleep(2)        location = img.location        size = img.size        top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size[            'width']        return (top, bottom, left, right)    def get_screenshot(self):        """        获取网页截图        :return: 截图对象        """        screenshot = self.driver.get_screenshot_as_png()        screenshot = Image.open(BytesIO(screenshot))        return screenshot    def get_image1(self, filename):        '''        获取完整验证码图片        :return: 图片对象        '''        time.sleep(0.2)        js_code = '''document.getElementsByClassName('geetest_canvas_fullbg')[0].style.display="block";'''        time.sleep(1)        self.driver.execute_script(js_code)        # 截取图片        top, bottom, left, right = self.get_position()        screenshot = self.get_screenshot()        # captcha = screenshot.crop((2 * left, 2 * top, 2 * right, 2 * bottom))        # size = 258, 159        captcha = screenshot.crop((left, top, right, bottom))        size = 260, 160        captcha.thumbnail(size)  # 生成缩略图        captcha.save(filename)        return captcha    def get_image2(self, filename):        '''        获取有缺口的验证码图片        :param filename: 图片名称        :return: 有缺口的验证码图片对象        '''        time.sleep(0.2)        js_code = '''document.getElementsByClassName('geetest_canvas_fullbg')[0].style.display="none";'''        self.driver.execute_script(js_code)        time.sleep(1)        # 截取图片        top, bottom, left, right = self.get_position()        screenshot = self.get_screenshot()        # captcha = screenshot.crop((2 * left, 2 * top, 2 * right, 2 * bottom))        # size = 258, 159        captcha = screenshot.crop((left, top, right, bottom))        size = 260, 160        captcha.thumbnail(size)  # 生成缩略图        captcha.save(filename)        return captcha    def get_gap(self, image1, image2):        """        获取缺口偏移量        :param img1: 不带缺口图片        :param img2: 带缺口图片        :return:缺口偏移量        """        left = 60        for i in range(left, image1.size[0]):            for j in range(image1.size[1]):                if not self.is_pixel_equal(image1, image2, i, j):                    left = i                    return left        return left    def is_pixel_equal(self, img1, img2, x, y):        """        判断两个像素是否相同        :param image1: 图片1        :param image2: 图片2        :param x: 位置x        :param y: 位置y        :return: 像素是否相同        """        # 取两个图片的像素点        pixel1 = img1.getpixel((x, y))        pixel2 = img2.getpixel((x, y))        for i in range(0, 3):            if abs(pixel1[i] - pixel2[i]) >= 60:                return False        return True    def get_track(self, distance):        """        根据偏移量获取移动轨迹        :param distance: 偏移量        :return: 移动轨迹        """        # 移动轨迹        track = []        # 当前位移        current = 0        # 减速阈值        mid = distance * 4 / 5        # 计算间隔        t = 0.2        # 初速度        v = 0        while current < distance:            if current < mid:                # 加速度为正2                a = 2            else:                # 加速度为负3                a = -3            # 初速度v0            v0 = v            # 当前速度v = v0 + at            v = v0 + a * t            # 移动距离x = v0t + 1/2 * a * t^2            move = v0 * t + 1 / 2 * a * t * t            # 当前位移            current += move            # 加入轨迹            track.append(round(move))        return track    def move_to_gap(self, slider, track):        """        拖动滑块到缺口处        :param slider: 滑块        :param track: 轨迹        :return:        """        self.action_chains.click_and_hold(slider).perform()        self.action_chains.pause(0.2)        for x in track:            self.action_chains.move_by_offset(xoffset=x, yoffset=0).perform()        # time.sleep(1)        self.action_chains.pause(0.6)        self.action_chains.release().perform()    def get_slider(self):        """        获取滑块        :return: 滑块对象        """        slider = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'geetest_slider_button')))        return slider    def crack(self):        '''验证操作'''        # 1.针对完整的图片进行截取        image1 = self.get_image1('snap_full.png')        # 2.针对有缺口的图片进行截取        image2 = self.get_image2('snap.png')        # 3.对比两张图片,获取滑动距离        distance = self.get_gap(image1, image2)        # 4减去缺口位移        distance -= self.border        print('distance:', distance)        # 5.获取滑块对象        slider = self.get_slider()        # 6.模拟人为滑动轨迹        # track = self.get_track(distance)        offsets, track = easing.get_tracks(distance, 12, 'ease_out_expo')        print('len(track):', len(track))        # 7.拖动滑块        self.move_to_gap(slider, track)        time.sleep(5)        # 8.失败重试        try:            geetest_class = self.driver.find_element_by_xpath(                "//div[@class='geetest_panel geetest_wind']/div[2]").get_attribute("class")            if "geetest_panel_box" == geetest_class:                self.driver.find_element_by_xpath("//div[@class='geetest_panel_error_content']").click()                self.crack()            elif "geetest_panelshowslide geetest_shake" in geetest_class:                time.sleep(3)                self.crack()        except Exception as e:            print("------------登录成功--------------")if __name__ == '__main__':    # import os    # BasePath = os.path.dirname(os.path.abspath(__file__))    # DRIVER_PATH = os.path.join(BasePath, 'conf/chromedriver')    chrome_options = webdriver.ChromeOptions()    chrome_options.add_argument('--start-maximized')  # 指定浏览器分辨率    chrome_options.add_argument('--disable-gpu')  # 谷歌文档提到需要加上这个属性来规避bug    # 下面两行,浏览器不自动关闭    chrome_options.add_experimental_option('useAutomationExtension', False)    chrome_options.add_experimental_option("detach", True)    # # DRIVER_PATH为chromedriver存放路径,自行变更    # # driver = webdriver.Chrome(executable_path=DRIVER_PATH, options=chrome_options)    driver = webdriver.Chrome(options=chrome_options)    # 设置等待超时    wait = WebDriverWait(driver, 20)    crack = AccessCode(driver)    # # 1.打开网页    # driver.get(";)    # driver.maximize_window()  # 窗口最大化    # # 2.输入用户名,username自行补全    # driver.find_element(By.XPATH, "//input[@id='mat-input-0']").send_keys('username')    # # 3.输入密码,password自行补全    # driver.find_element(By.XPATH, "//input[@id='mat-input-1']").send_keys('password')    # # 4.点击登录,弹出验证按钮    # driver.find_element(By.XPATH,    #     "//button[@class='mat-focus-indicator action-button ng-tns-c141-2 mat-flat-button mat-button-base mat-primary']").click()    # # 5.点击验证按钮    # time.sleep(3)    # # 6.调用验证    # crack.crack()    url = ';isjump=0&lang=c&from_domain=i&url=http%3A%2F%2F;    driver.get(url)    driver.maximize_window()    time.sleep(2)    # 登录    driver.find_element(By.ID, 'loginname').send_keys('你的用户名')    driver.find_element(By.ID, 'password').send_keys('你的密码')    driver.find_element(By.ID, 'isread_em').click()    driver.find_element(By.ID, 'login_btn_withPwd').click()    time.sleep(3)    # 6.调用验证    crack.crack()

2)easing.py:

# 代码来自: numpy as npdef ease_out_quad(x):    return 1 - (1 - x) * (1 - x)def ease_out_quart(x):    return 1 - pow(1 - x, 4)def ease_out_expo(x):    if x == 1:        return 1    else:        return 1 - pow(2, -10 * x)def get_tracks(distance, seconds, ease_func):    tracks = [0]    offsets = [0]    for t in np.arange(0.0, seconds, 0.1):        ease = globals()[ease_func]        offset = round(ease(t / seconds) * distance)        tracks.append(offset - offsets[-1])        offsets.append(offset)    return offsets, tracks

标签: #前端jquery滑动验证码案例