Latest Inquiries - Data Extraction Software

Images

Submitted: 1/19/2016

Hello

I am unable to scrape the images. The first 6 which are visble I get. The next 94 I get as 64bit code and the next 2000 I get nothing.

Also in the browser: the images only appear if I activ scroll down. How can I simulate this in VWR?



Spielzeugauktion.de.64BIT.rip

Replied: 1/20/2016 6:33:12 AM

I'm thinking those image urls have same figure like :

http://www3.spielzeugauktion.de/sp4/site/catalogs/10001/img/list/1.jpg

What you only need to change the number at ending , e.g, 1.jpg, 2.jpg ..etc. , so you can try to populate those image urls as manually by a query or content transformation .

However, the main cause is in the xpath with 'image_url' element ,it does select 99 elements, so I've revised the xpath to select all , all image urls can be extracted properly.

See the attached new project and all sample data.

Spielzeugauktion.de.64BIT.rip
Spielzeugauktion.de.64BIT.csv

Replied: 1/19/2016 2:52:09 PM

You did a 'scroll' link template, you seemed to figure out how to simulate the 'scroll' action for getting more items / images, 

I've found that real-image url is existed in 'data-original' attribute of <img> element. you only need to make content transformation regex scripts for getting full image url further.

See the attached new project.

Spielzeugauktion.de.64BIT.rip

Replied: 1/19/2016 3:37:25 PM

We are a step further - now I get the first 100 images but not the ca. 2300. Image URL stops after 100... All other dates I get until the end.

Any idea?