Software Manual - Data Extraction Software

The topic below is from the Visual Web Ripper manual.

Using Proxy Servers

When you visit a website with a web browser or a web scraping tool, such as Visual Web Ripper, the website owner can record your IP address and may be able to use this information to identify you or block your access to the website.

If you do not want a website owner to be able to identify you while you visit a website, you can use a proxy server to hide your IP address. When you use a proxy server, you do not visit the target website directly, but instead request that the proxy server visit the website for you.

There are many different types of proxy server, but Visual Web Ripper supports only HTTP proxy servers. It does not support other types of proxy servers, such as SOCKS proxies.

Get a Free Private Proxy Switch Account

Many companies specialize in providing proxy server access for a fee. Proxy servers are also freely available on the web, but they are often slow and unreliable.

We offer a FREE account at Private Proxy Switch for all our customers and trial users. The free account includes 500MB of traffic per month and you can upgrade your account at anytime if you need more traffic.

Private Proxy Switch is a high performance proxy server with a large pool of IP addresses. An IP address is randomly assigned to you when you make a request for a new webpage. This makes your computer completely hidden and makes it very difficult to detect and block your web scraping activity.

Private Proxy Switch rotates 20 proxies every 12 hours. If you get the 20 proxies blocked, you will need to wait up to 12 hours and try again. Please care for the proxies and set random page load delays when at all possible. Treat them well and they will treat you well.

Follow these steps to set up your free Private Proxy Switch account:

  1. Make sure you are running the latest version of Visual Web Ripper. Maintenance for your license must be active, or you must have an active trial license.
  2. Visit the webpage http://www.privateproxyswitch.com/UserAccount/CreateAccount.aspx
  3. Enter your serial key as the Voucher, and register your account
  4. In Visual Web Ripper choose the Proxy Switch option. See the section How to Configure Proxy Servers below.

How to Configure Proxy Servers

After you have purchased proxy server access or found freely available proxy servers on the web, you will receive one or more proxy server IP addresses and possibly a username and password to access the proxies. You need to enter this information into the data extraction project by opening the Project Options screen and selecting the Proxies tab.

This screen allows you select the Proxy Source and Proxy Rotation settings. You can select one of the following proxy sources:

  • No proxies: The project will not use proxy servers.
  • Default proxies: Default proxies are proxies that applies to all projects on your computer. Click the button Configure Default Proxies to configure the default proxies.
  • Proxy list: A specific list of proxies are used for this project. These proxies are not shared with other projects. Click the button Add/Edit Proxy List to specify the proxy list.
  • Proxy configuration script: The project will use a proxy configuration script to get a proxy. Proxy configuration scripts are sometimes used in large corporations to specify a proxy that allows employees to access the Internet.
  • Proxy Switch: Create an account at www.privateproxyswitch.com to use this option. See the section Get a Free Private Proxy Switch Account above.

The Proxy rotation option specifies how to rotate between multiple proxies while extracting data.

  • Rotate per page load: The Rotation interval is the number of page Visual Web Ripper will load before switching to the next proxy.
  • Rotate per minute: The Rotation interval is the number of minutes Visual Web Ripper will load pages before switching to the next proxy.

Proxy List

The Proxy List screen is used to specify a list of proxies and set the proxy verification options.

The Proxy Address & Port must be specified in the following format:

206.118.215.245:60099

In the example above, 206.118.215.245 is the IP address of the proxy server and 60099 is the port number.

Proxy Verification

Visual Web Ripper can automatically verify if a proxy is available before switching to the proxy. This allows a data extracting project to switch to the next proxy if a proxy is not available, and thereby avoid stopping the project prematurely just because a proxy is unavailable.
  • Verify proxy before use: Visual Web Ripper will verify a proxy before switching to the proxy.
  • Verify connectivity only:  Visual Web Ripper will only verify that it can connect to the proxy, and not try and retrieve a webpage with the proxy.
  • Verification timeout: The number of seconds Visual Web Ripper will wait for a successful proxy verification.
  • Skip unavailable proxies: If a proxy fails verification, it will be removed from the proxy list for the rest of a project run.
  • Remove skipped proxies: If a proxy is removed from the proxy list, it will be removed permanently. This option is only available for Default Proxies.

Importing Proxy Servers

If you are using a large number of proxy servers, it can be tedious to add them all to each project. You can use the Import Proxies button to import a list of proxies from a CSV file. The CSV file must have the following format:

Proxy Address, Username, Password

Username and Password are optional columns.

CSV Example 1:

proxy
173.244.220.185:8800
50.21.10.78:8800
134.22.166.242:8800

CSV Example 2:

proxy, username, password
173.244.220.185:8800, user1, pass1
50.21.10.78:8800, user1, pass1
134.22.166.242:8800, user1, pass1

Using the Free TOR Proxy Network

TOR is a free network of proxies that you can use to hide your IP address. This network works well with web scraping, because it switches proxies automatically approximately every 15 minutes. A data extraction project can run for hours and use many different proxy servers automatically. Unfortunately, the TOR network is generally very slow and not suitable for professional use.

To use TOR with Visual Web Ripper, you need to install the TOR software bundle. You can find it at www.torproject.org. When you install the software bundle, make sure you choose to install TOR, Polipo and Vidalia.

Now you can connect to the TOR network by running Vidalia. After connecting to the TOR network, you can configure Visual Web Ripper to use the TOR network by using this proxy address:

localhost:8118

You can also configure Internet Explorer to use the TOR network by setting the proxy server to localhost:8118. If you have already configured Internet Explorer to use TOR, Visual Web Ripper will use the TOR network in WebBrowser mode automatically.