Latest Inquiries - Data Extraction Software

Whitepages Demo

Submitted: 4/9/2012

Hello, I would like to extract contacts from this online directory.  Issues I face is that only 100 contacts per search is displayed some I would like to create a script that can do the search for me automatically.  I've observed that the url format for this site is:

http://www.whitepages.com/name/Sma-Sme/11001?page=2

Sma = firstname
Sme = lastname
11001 = zipcode
2 = page number

I already have a list of seach terms for firstname, lastname and zipcode.  I've imported them to a mysql database.  However, I cannot figure out how to read the data from mysql tables to create the needed start urls.

Please help.

Replied: 4/10/2012 11:24:58 PM
Maybe your CSV format is incorrect when fetching that link 456976 ?
Please attach your CSV file and the project file you used, so we can make a test to figure out what happened.

Also, you can cut the csv file , letting the original 45696 line is as the first line in CSV file, see if VWR still is stuck at the first line?

Replied: 4/10/2012 8:25:57 PM
Thanks, Simon.  I tested the .rip file you gave and it worked.  Then, I took out the zip code and added all the letter combinations for firstname and lastname in the .csv file.  When I tried to run the campaign, it got stuck, saying:

10:37 Processing 456976 start URLs
10:37 Processing 456976 input data rows

and I had running overnight, 14:00 hrs.  I think this might be too much for webripper. 
Anyway, I created form fields and a form submit instead and it processed the list. 
Replied: 4/9/2012 11:52:54 PM
Please put the attached demo project file and input csv file in default Visual Web Ripper projects folder, then you can run this project to extract data.

You will need to change the input csv file as the input source for this project, please you open the csv file in Excel, then you can add / change each line.
If you prefer to use Mysql db as input source, that 's easier also, you can refer to our online manual as below:

Using an input data source

Moreover, the input source is as parameters in template URL, to understand what i said, you can go to Project > Options > Advanced > Start Url section > Add condition script(c#), then you will see that code to read the input fields(firstname, name, zipcode) then populate the full url as satarting url.


Best regards.
Simon


Whitepages.csv
Whitepages.rip