[Web擷取功能] CPATCHA 破解基礎方法

6月 23, 2021

##install##

#install tesseract-ocr

#python -m pip install Pillow

#python -m pip install pytesseract

#python -m pip install selenium

##import##

#from PIL import Image

#from pytesseract import image_to_string

#from selenium import webdriver

##create browser driver and save screenshot##
#browser = webdriver.Ie(driver_path)
#browser = webdriver.Chrome(driver_path)
#browser = webdriver.Firefox(driver_path)

#browser.get(url)

#browser.set_window_size(800, 600) # option

driver.save_screenshot(screenshot_filepath)

##get CAPTCHA image element##

#element = browser.find_element_by_xpath('//*[@id="form1"]/img')

location = element.location

size = element.size

##functions##

def get_captcha_text(location, size):

pytesseract.pytesseract.tesseract_cmd = pytesseract_path

img = Image.open(screenshot_filepath)

left = location['x']

top = location['y']

right = location['x'] + size['width']

bottom = location['y'] + size['height']

img = img.crop((left, top, right, bottom))

return image_to_string(img)

##reference##

https://github.com/VineetChaurasiya/scraping_scripts/blob/master/login_captcha_bypass.py

搜尋此網誌

CY's Python DNA

[Web擷取功能] CPATCHA 破解基礎方法

留言

張貼留言

這個網誌中的熱門文章

[環境設定] 應用程式執行別名

[變數] 使用全域變數

[例外處理] 發出例外加訊息