Latest Inquiries - Data Extraction Software

Flyers scraping

Submitted: 5/3/2016

The ultimate goal is to get flyer location, vendor, flyer valid dates and links to flyer page images (link to every page image for a given flyer). After landing on the starting page we should be able to select city, then vendor, then flyer and fetch location/vendor/flyer information as well as flyer pages images links for each page (ideally we should be able to download the images). The fetched data can be stored in .csv file for now, in the future it's meant to be MySql.

Please let me know if you have any questions.


Thank you,

Boris

Replied: 5/4/2016 12:59:19 AM
Could you please give me specific guidance how to reach the city, vendor, flyer and location .. information in the target website? if possible, please you attach a fewer screenshots for explain further, then I will start to prepare the demo project for you, thanks
Replied: 5/4/2016 1:48:29 PM

Hello,

1. Go to link http://www.yellowpages.ca/flyers/

2. Click on link Show All at the right side (close to the bottom, Screenshot(1).png). Cities list grouped by provinces will appear

3. Click on any of the cities links (i.e. Ontario, Hamilton, Screenshot(2).png). Flyer thumnails will appear.

4. In section Most Recent Flyers by Store click on any thumbnail (i.e. 2001 Audio Video, Screenshot(3).png). Selected flyer thumbnail will appear. We need to capture the vendor logo above the thumnail and valid dates below the thumnail.

5. Click on the thumbnail. The flyerview will appear (Screenshot(4).png). We need to capture images for every page.

Please let me know if you need furter clarifications.


Screenshot (4).png
Screenshot (1).png
Screenshot (3).png
Screenshot (2).png

Replied: 5/5/2016 2:08:38 AM

Please you check the attached demo project and sample data, you will need to put the project file in default projects folder then run rip project in VWR editor.

I've set a 'group' template where to contain a page navigation template to scroll page down by max. 10 times (i.e, Page count = 10), you can try to increase the value to load more flyers as much as possible, but it will be a limitation at 'out of memory' due to web browser 's internal bug that we cannot fix.

There is existing another 'group2' template to preload those thumb images then extract all of them properly. 'group2' template contains a page navigation template also, it has setup condition scripts in 'total_img_num' to check if it has loads all thumb images completely, then cancelling page navigation.

Yellowpages_flyers.xls
Yellowpages_flyers.rip