Latest Inquiries - Data Extraction Software

cidadexl

Submitted: 6/10/2012

the landing page has a search link at the top of the screen

Clicking on this leads to a search page with criteria selection url http://cidadexl.com/en/pesquisa.htm

I want to collect all records so clicking directly on the search button within this screen opens all results here

http://cidadexl.com/en/listagem.htm?nat=&typ=&bus=&stt=&ORD=1&ORC=14&twn=&ngh=&zon=&LPR=0&MPR=&LAR=0&MAR=&ref=&enviar=+Search+

I want to collect the data for all of these houses and all subsequent pages

clicking on the first property title monor house leads me to this page

http://cidadexl.com/en/detalhe.htm?RID=4842104

I then want to collect all the data on his page

title, reference, county, town, business, status, gross area, area, land area, price, description and the first 10 photos.

I treid the project myself and only managed to get the first property and then the software seemed to hang - waited 3 hours ust in case it was my interent connection (very slow and I was running the project with debugging) but it didnt get past prperty 1. I got no photos, and the town and county fields were blank.The software does not separate the title "county" from the content "viseu" so both are highlighted and nothing is collected. I can send you my rip file so you can see my poor attempt.

If we can resolve the issues I will buy this product and I would also like to know the approximate cost for you to write this project as we will be needed many similar projects and much of that data will be presented in a similar way to cidadexl.  

Cidadexl.rip

Replied: 6/10/2012 3:43:53 AM
I propose that you make a request project quote to know how much for a specific project that you want to make.

F.Y.I
After you have installed the Visual Web Ripper software, you can request a quote directly from within the software by selecting the menu Help -> Request Project Quote

Request a Project Quote
Replied: 6/16/2012 2:59:35 AM
With the imagelist pagearea template, the option - Count is set to 10 in List tab, therefore, it will iterate max 10 image elements only.
See the attached project..
Cidadexl.rip

Replied: 6/16/2012 1:48:41 AM
In the rip file you sent me, all the images for the property are collected. I only need the first 10-12. Is it possible to limit the number collected? This would reduce the amount of time harvesting the data, uploading to our web server and also processing on our web server.

Also please tell me if it is possible to run this software on a remote server so that we can have the data collected direct onto our web hosting server.

Thank you

Replied: 6/10/2012 3:40:22 AM
I corrected some issues as below:

1) This project can be run on webcrawler agent mode, this will be benefit for performance.
2) The url for each image has been made content transformation. therefore, you will be able to download bigger image.
3) Adding 'next' page navigation template to iterate all properties by pagination.
4) I corrected the XPath to All of elements in property page, and most of elements used content transformation using Regex, therefore, you 're able to get correct content like county, town ..etc.

Please see the attached project..
Cidadexl.rip