Latest Inquiries - Data Extraction Software

Category extraction

Submitted: 1/23/2014
Hi,
Request for a demo project with below details.

Target URL: http://buyt.in/search/electronics/mobile%20phones/mobile%20phones

Actually the target url is a category selection of an ecommerce site. 

Facing problem in Ajax. Since the page uses Ajax and has 10000s of products getting loaded on same page so we cant rely on repeted Ajax coz it will endlessly keep loading the page. I believe either we need to transform the page or use pagenavigation template or some kind of combination. In either case the requirement is that every new load happens in a new browser so that we make a PageArea template and subsequent Content Elements can be ripped/downloaded. Then again it should go back or load a new content in a new browser and again our PageArea template do the job. So this can be repeated till we complete all the products and there is no new load on the page.

The data required for each product is :
1) Product image URL (Not the image jpg file but the image url).
2) Description / Name of the product
3) Price
4) Product URL/Link : This is again tricky and need your help. When we click on any product (Not on "See Detail" but any other area of product). Then it opens a Thank You page and then further jumps to the actual product page from the original Ecommece site where the product is listed. The requirement is to capture the URL / Link of this original product listed page. So may be a Link template that can open each product and waits for the page to load the intermediatary page and then again wait for the jump to the final page. Then the Content Element should capture the URL of this final page.

Hope I could spell out the requirement correctly. Basically facing 2 problem:
1) Handling Ajax
2) Capturing URL or Link of the product where it is actually listed which comes after 1 jump.

Pls help and send a demo project. Let me know if need more understanding on the requirements.

Regards,
Mits




Replied: 1/24/2014 1:38:36 AM
See the attached new project, I've added a 'scroll' link template before 'detail' link template, it will first scroll page down untile there is more products.

F.Y.I:

Content loading on scroll

Regarding to 'timeout ' issue, you need to first activate the "Keep loading web page until manual stop" button in toolbar, then open 'detail' link template, until the link redirect to the final page and get 'productTitle' element where I've marked check "wait for element", then you can deactivate the button. by default, VWR editor stops to load page without taking a redirection.

note, if 'productTitle' element couldn't be found ,it will also rasie timeout error at runtime. for this productTitle element, I've enabled condition wait script to wait 15 seconds, you can increase / reduce the timeout depending on your internetion connection, therefore, event the element couldn't be found, it will still resume to process next products.
Buyt.rip

Replied: 1/23/2014 11:46:05 PM
Please check the attached demo project ,

You need to put the project file in the default projects folder, then running the project.
Buyt.rip

Replied: 1/23/2014 10:28:24 PM
Please you attach a fewer screenshots to clarify the point 4), I didn't get it, I cannot find what you said 'Thank you page' and further jump to the actual product page..

It's better that you give me specific guidance who to reach those fields as you needed, then I can start to prepare the demo proejct for you, thanks.
Replied: 1/24/2014 12:56:44 AM
Hi,

Thanks for the demo project.

However I can see that the Link template goes and collects data but there is no way to get data for the entile list of products that was displayed on the main URL. It is just picking for first 20 products.

Pls include some mechanism that takes care of entire data. Pls note that there is AJAX used to load the main URL. It has 1000s of products. The current project is just picking first 20. We cannot use repeated AJAX coz then it will keep loading the main page for too long. Pls let me how do we handle the AJAX call of loading  the entire data. Attached is the screen shot (AJAX loading more products.docx) which shows that at the bottom of screen its loading more products.

Also , its failing for me when I open the Link template. Attached is the screen shot (Link template error.docx)
Link template error.docx
AJAX loading more products.docx

Replied: 1/24/2014 3:36:14 AM
See the attached new project.

I've changed to page navigation type instead of link type for 'scroll' template, therefore, when scrolling page for each time, it will iterate through all  matched items via 'detail' link template, the 'detail' link template has been configured to 'visit each page only once' = true in Advanced tab > Action section, therefore, it will ignore previous duplicated visited links.

the 'scroll' page navigation template can set specific page count in 'Navigation' tab, hope it's what you needed.
Buyt.rip

Replied: 1/23/2014 10:51:15 PM
Pls find enclosed the screen shots for point 4).
Just click on any of the product listed on main URL and you will get to know what I mean by page jumping. Attached word file has screen shots of that. Hope this helps in understanding point 4).

Hope rest all my points are ok.

Page Jumping screen shot.docx

Replied: 1/24/2014 1:35:45 AM
Forgot to add. Along with project rip file pls also send sample data extraction in excel...with some 200 products data extracted using the project rip file that u send.
Replied: 1/24/2014 3:02:30 AM

Hi Simon,

Thanks for your quick help. 

Sorry to say but we don’t need full page scroll to happen first and then data extraction. The reason is that there are 1000s of products and in other categories like Women Clothing there some 50000+ products. So if we use scroll and repeated AJAX first then the page will keep on loading forever.

Need some mechanism where we load the first page and data extraction happens. Then the new Ajax data gets loaded in a new browser window. We again extract the data. Its like Pagination. So would want some mechanism where we transform the page with AJAX load to pagination.

This way the data extraction will happen on each page load. We can run the project for as much time as we want. Say we just want 2000 products out of 10000 products. So we will manually stop running of project in between at some 100 pages. We can then export data of these 100 page loads (20 products per page X 100 pages = 2000 products).

Request your help in transforming the Ajax load of the page to a kind of pagination mechanism where each load happens on a new browser window & we extract data each time before next load happens.

Hope I have made the requirement and reason for it simple for you to understand.

Thanks for your help.