Latest Inquiries - Data Extraction Software

Grabbing perfume catalog data with product attributes from Pricegrabber.com

Submitted: 4/4/2013

Hello,

I attach my project.

Basically I'd like to extract all perfume data is available from the target website under "Fragrance" which means 12 attribute values at most per product ( they are the children elements of PerfumeDetailTab in my project )

I could rip the data I need but I had a problem with ripping some attributes and values:

1) When I select elements: ProductMPN and ProductPackage turn green and seems to work correctly but after saving them later it turns yellow ( and doesn't contain any data ).

2) I can't identify correctly some other attributes like: ProductModel, ProductGender, ProductSize, ProductDispenser etc.

Sometimes values are misssing and that can cause shifting, moving values to other attributes.

So pls. either modify my project or create a new one which extracts the 12 attributes of Fragrances correctly.

Thank you,

best regards,

Attila

Pricegrabber Perfumes.rip

Replied: 4/5/2013 6:52:54 PM
Have you tried to use filters selection?

F.Y.I:

Replied: 4/4/2013 6:19:25 PM
Noticed that you 're using proxy switch, I removed it, then I can open start url properly..

Then I followed your templates but the template - 'PerfumeDetailTab' show me yellow not found.

I tried to set the template as optional then open it where some of elements are yellow such as ProductModel, ProductGender, ProductSize, ProductDispenser.

Please give me guidance how to reach the 'PerfumeDetailTab' on page, where are those yellow elements? thanks.
Replied: 4/6/2013 3:37:53 AM
Thank you, I've got it.

I could test, check 3 use cases:
more supplier, there is a Product Detail Tab,
more supplier, theres' no PD Tab,
1 supplier, there is PD Tab
all worked properly.

Since I could extract only 100 elements in trial I wasn't able to check whether NextButton Page Navigation element works well.
Have you checked it?

There is still 1 more problem: if I enabled Proxy Switch's (free)  proxys, I had much less data extracted than proxies were disabled,
which looks really bad for me!
That means for me VWR skips data extraction after a timeout and I loose data...

This inbuilt proxy feature is 1 of the 2 feature were very attractive for me (other is scheduling) comparing to my existing web data grabber.


Replied: 4/6/2013 2:04:01 AM
Well, VWR offered a free project...

I will buy VWR if it is able to handle and solve such data extraction jobs what I described for you.
If it will not be able to do, I will not buy VWR and Iwill use my existing data extracting software...

I just sent you my project to help you and make your project completion quicker.
So pls.  understand I will only learn the advanced ripping technics of VWR if I buy it.

I like your idea of providing a free project for trials, your conversion can be extremely high if you really complete these free projects...

Replied: 4/6/2013 4:01:33 AM
Thank you, I've got it.

I could test, check 3 use cases:
more supplier, there is a Product Detail Tab,
more supplier, theres' no PD Tab,
1 supplier, there is PD Tab
all worked properly.

Since I could extract only 100 elements in trial I wasn't able to check whether NextButton Page Navigation element works well.
Have you checked it?

There is still 1 more problem: if I enabled Proxy Switch's (free)  proxys, I had much less data extracted than proxies were disabled,
which looks really bad for me!
That means for me VWR skips data extraction after a timeout and I loose data...

This inbuilt proxy feature is 1 of the 2 feature were very attractive for me (other is scheduling) comparing to my existing web data grabber.


Replied: 4/6/2013 5:53:03 PM
I understand what you expected,

to get next page navigation template working, you can temporarily capture An link in 'PerfumeDetailLinks' page area template using xpath as below: (I've corrected original once and make it more accuracy to match 'more info' link only)

//DIV[@id='product_results']//A[.='more info'][position(0,1,0)]

if you want to capture all links in each page , just remove [position(0,1,0)] 

See the attached new project that I just changed the xpath as mentioned above.

F.Y.I:

The selection Xpath

The free private proxy switch we provided cannot guarantee that it can work with any of websites, the thing also is rely on the website you 're scraping, sometimes, some websites cannot return response when using proxies. you can attempt to use other proxies as charged..
Pricegrabber Perfumes.rip

Replied: 4/6/2013 1:43:50 AM
Well, VWR offered a free project...

I will buy VWR if it is able to handle and solve such data extraction jobs what I described for you.
If it will not be able to do, I will not buy VWR and Iwill use my existing data extracting software...

I just sent you my project to help you and make your project completion quicker.
So pls.  understand I will only learn the advanced ripping technics of VWR if I buy it.

I like your idea of providing a free project for trials, your conversion can be extremely high if you really complete these free projects...

Replied: 4/6/2013 1:57:31 AM
Please check the attached new project.

I've set to optional template for the template 'PerfumeDetailTab', then corrected the xpath selection of those elements according to filters techinque as metioned before.

Pricegrabber Perfumes.rip

Replied: 4/6/2013 2:06:11 AM
Have you got the last attached project that I just modified?

I attach same one again..
Pricegrabber Perfumes.rip

Replied: 4/5/2013 2:34:42 AM
Hello,

Unfortunately not on every product page has a Product Details tab.
I attach 2 screenshots: 1st where you find the tab and 2nd where you see the product attributes I want to extract.
If there's no Product Details tab we should find an alternative method to grab as many attributes of these 12 attributes:
Product Title
MPN
Description
Size
Dispenser
Model
Gender
Package
Strength
Manufacturer
Lowest Price
Supplier of lowest price
The product page used:
http://health-beauty.pricegrabber.com/womens-perfume/Clinique-HAPPY-WOMEN-SPRAY-OZ/m4016988.html/no_sps=1#tab=details
Thank you,
Attila

Pricegrabber product details tab page 1.png
Pricegrabber product attributes page 2.png