Latest Inquiries - Data Extraction Software


Submitted: 1/19/2016


I am unable to scrape the images. The first 6 which are visble I get. The next 94 I get as 64bit code and the next 2000 I get nothing.

Also in the browser: the images only appear if I activ scroll down. How can I simulate this in VWR?

Replied: 1/20/2016 6:33:12 AM

I'm thinking those image urls have same figure like :

What you only need to change the number at ending , e.g, 1.jpg, 2.jpg ..etc. , so you can try to populate those image urls as manually by a query or content transformation .

However, the main cause is in the xpath with 'image_url' element ,it does select 99 elements, so I've revised the xpath to select all , all image urls can be extracted properly.

See the attached new project and all sample data.

Replied: 1/19/2016 2:52:09 PM

You did a 'scroll' link template, you seemed to figure out how to simulate the 'scroll' action for getting more items / images, 

I've found that real-image url is existed in 'data-original' attribute of <img> element. you only need to make content transformation regex scripts for getting full image url further.

See the attached new project.

Replied: 1/19/2016 3:37:25 PM

We are a step further - now I get the first 100 images but not the ca. 2300. Image URL stops after 100... All other dates I get until the end.

Any idea?