Visual Web Ripper features both semi-automatic and full-automatic data extraction from websites using CAPTCHA protection. Full-automatic data extraction requires an account with a third party CAPTCHA recognition service and a fee is charged for each CAPTCHA image. Semi-automatic data extraction is free, but requires you to manually decode CAPTCHA images while running a data extraction project.
Sometimes the easiest solution to CAPTCHA protected websites is using a list of proxy servers. This is especially true when CAPTCHA pages are displayed randomly after browsing the website for a while. Proxy servers will not help if you always need to pass a CAPTCHA page in order to enter a section of a website.
To configure your data extraction project for semi-automatic CAPTCHA processing, you need to do the following:
When Visual Web Ripper encounters a CAPTCHA element, it will display the CAPTCHA image and request the CAPTCHA code.
Full-automatic CAPTCHA processing requires an account with a third party CAPTCHA recognition service. The third party recognition service must provide a .NET API and you must create a Visual Web Ripper script that uses this API to call the service.
Visual Web Ripper includes the API and standard script to call the following CAPTCHA recognition service.
This CAPTCHA recognition service currently charges US$1.39 per 1000 CAPTCHAs. We are not affiliated with this company and therefore don't charge any additional fees for this service.
To configure your data extraction project for full-automatic CAPTCHA processing, you need to do the following:
A decode CAPTCHA script is used to call a CAPTCHA recognition service. The script gets the CAPTCHA image is an input parameter and should return the decoded CAPTCHA value in string format.
You can add a decode CAPTCHA script to a FormField element by clicking the Decode CAPTCHA script option button in Advanced Options.
The script editor opens after you click the Decode CAPTCHA script button.
The default decode CAPTCHA script is designed to work with the www.deathbycaptcha.com service and if you are using this service, you only need to add your login name and password.
Visual Web Ripper also has easy support for bypasscaptcha.com. If you are using this CAPTCHA service you can use the following code.
string captcha = BypassCaptchaService.DecodeCaptcha(args.ImagePath, "key");
A decode CAPTCHA script can be written in C# or VB.NET.
A decode CAPTCHA script must have one method as shown below.
public static bool DecodeCaptcha(WrDecodeCaptchaArguments args)
The script method DecodeCaptcha must have this exact name and signature, so change only the method body, not the method signature. The method must return decoded CAPTCHA value.
|ImagePath||String||The CAPTCHA image path.|
|Project||WrProject||The current Visual Web Ripper project.|
|DestinationDataSource||WrDataSource||Destination data source configuration.|
|InputDataSource||WrInputDataSource||Input data source configuration.|
|StartTemplate||WrTemplate||The first template in the project.|
An open database connection.
Input parameters for the current project.