Latest Inquiries - Data Extraction Software

Lost access to VWR demo project thread

Submitted: 12/28/2015

Hello, I was going to answer your reply to my demo project thread, but I lost access after buying the software because I didnt backup the free serial key, so I can no longer login to the thread of the demo project. The thread was called "MercadoPublico AIO" email address "benjamin.yon@gmail.com"

Regards

Benjamin

Replied: 12/30/2015 11:58:30 AM

I seem to understand now, the 'Providers area ' page area template needs to revise xpath for selecting visible items per page, otherwise, it will keep to scrape same 1st page:

//DIV[startswith(@id,'pagina') and not(contains(@style, 'display: none'))]/DIV[@class='accordion-group']/DIV[1] 

Another hand, 'New Template' link template seems to open in a new tab properly, I've marked check 'Start new web browser' option in 'Action' tab, if it's not working for you, please you get back to use 'Javascript + Async. ' action.

F.Y.I:

http://manual.visualwebripper.com/default.aspx?manual_id=348

Mercadopublico.rip

Replied: 1/10/2016 3:03:58 AM

When reaching the 15th page, there are only last 16,17,18 pages left, then the next button becomes disabled, therefore, the current way to navigate pages couldn't be useful, 

See the attached new project, I made a new page navigation template called 'list of pages' where to firstly cycle through current list of page links, then 'New template 1' page navigation template go through next list of page links by clicking 'next' button icon.

Mercadopublico.rip

Replied: 1/20/2016 1:38:24 AM

Yes, I can repeat this page navigation issue now, although I haven't seen exactly what happened on browser, it did retry for 3 times less than what you did 5 times.

Yes, it 's disabled at 15 pages, I might have to click many times , then it will be able to work, or even it doesn't work forever, even sometimes, I saw that it 's diabled at 10 page, It should be a flaw from the website itself, or maybe if it loads next page for a timeout, then the next button becomes disabled.. however, it 's hard to fix this issue in general settings.

I've thought how to resolve this issue again, I've created a 'go to 15 page' group template where to contain a page navigation template to have max. 15 page count , so it firstly go through first 15 pages, afterwards, it returns to 'dummy' template as optional then resume on next templates, hope it helps for you. 

See the attached new project. you can try to tweak the 'next product page' page navigation template inside ' go to 15 page' group template, such as the 'delay after completed action' option in advanced tab, make it loading more next pages as possible, then after it gets back to next templates to extract real data, you will be able to get data after 15 pages. 

If the issue persists, I will deliver this issue to other staff, see if it might help you to extract more data after 15 pages (you will need to make charge for it), but it might won't return a new project to you if there is no solution in VWR.

MercadoLaptop.rip

Replied: 1/20/2016 2:07:43 AM

Simon, you skipped this part in the post before

"I tried to scrape the "Laptop" category, with the "MercadoLaptop" (NOT THE ONE YOU SENT ME) project and it gives me a different error when reaching product 67 (2nd page 17th product), I attach the log"

I need help with that issue

How much is the charge for the other staff support?

Replied: 1/18/2016 2:56:33 AM

The website has very depends on AJAX actions , if either product page or result page doesn't work properly, it will affect next product detail page or page navigation , therefore, you won't be able to resume on more pages / products,

I'm guessing that the website has a limitation of session when you scraped some categories for a long time, therefore, it won't be able to resume on next pages, or you might encountered 'out of memory' issue if you loads too much products through many pages, it's an internal browser issue (it can't be fixed).

if it's not too trouble, is it possible to divide those categories for multiple runs? also, you can try to isolate the issue to make sure that it's not a session issue from the website, have you tried to just scrape on the specific category who has 20 pages ? 

However, you can attach your log file for diagnose further, you can tell me which specific category has 600 products with 20 pages, then I will try to take a look exactly why it stopped at 15 pages, 

For extracting all proovedores , can you please give me  a screenshot which specific product where it has 67 proovedores  but it only extracts 57 ? 

Since the website takes complex AJAX actions to load contents through each template, debugging on that will be more hardly, by my past experiences, it usually is not a problem from VWR, It might take fewer days for diagnose above issues, in case it might not a free support , you will need to make charge for it.

Replied: 1/14/2016 11:52:45 PM
How can you set a new template to start from page 16? Also you forgot to attach the project
Replied: 12/30/2015 9:16:44 PM

Hello, I tested the configuration you sent me and it didn't work, as you can see in the log I've sent you. It is worth noting that now it opened the products on a new browser tab, which in turn allowed the catalog page navigator to work, but what happened this time, is that the products page didnt open as they should, this is visible in the log im sending.


We're in an urge to fix this ASAP as we have 5 servers waiting for this to work, as well as because our clients are demanding our scrapping.


Thanks again for your time and help.

MercadopublicoAIO_info_15_12_30.log

Replied: 12/31/2015 3:51:31 PM
Now it only opened the first product, then it finished. It also didnt open the product page in a new tab.
MercadopublicoAIO_info_15_12_31.log

Replied: 1/19/2016 12:28:27 AM

Yes it is possible to divide for multiple runs, but I dont know how. I havent tried to scrape the 600 products category before.

I thought it was clear that the reason it kept stopping at page 15 in the all in one category (ordered by 10 products per page) was the page navigation not working, I attached the project for this it is the "Mercado All in one 18 pages", and the attachment "Log of the stop" which says an error of the pagenavigation. The category that has 600 products and 13 pages is "Laptop" and is in "productos" then "hardware", then "computadoras" and select "Laptop".

The product that has 67 proovedores but only extracts 57 is in the "all in one" category, ordering with 50 products per page it is the 14th product from top to bottom, it was using the "mercadoAIO 4 pages catalog" project. Screenshot is "67 proovedores product", product name "ALL IN ONE LENOVO THINKCENTRE E73Z (10DB00N0CB) UNIDAD" code: "1179872"

There is no problem with the charge, I will pay.

Thanks for your help

Benjamin

67 proovedores product.jpg
Log of the stop.log
mercadoAIO 4 pages catalog.rip
Mercado All in one 18 pages.rip

Replied: 1/12/2016 4:50:28 AM

Simon, I tested it, and it doesn't work as you intended, when it reaches page 15 it finishes. Also you forgot about the "proveedores" issue I mentioned in the past post.

Your help will be appreciated, thanks

Benjamin

Replied: 1/21/2016 12:05:40 PM
Unfortunately, it has stopped at 1st page > 11th product, Log file tells me that one element couldn't be fetched as you set 'wait for element', the element has xpath - //H1[@class=' azul'], I found it's the 'Title' element inside 'New template' link template. so the issue could be due to a connection issue or the website couldn't loaded the specific detail page in time , then it raised the expected error then stop the project as expected.

To resolve this issue, you should enable 'Wait scripts' for 'Title' element (i.e, Element wait condition in Misc tab ), or you still can setup wait for 'Title' element at parent template - 'New template' as AJAX async. action - Title@New Template , 

After I've made this change, then I rerun the new project again, it has go through first page, then reaching 2nd page, now, I can repeat this issue , agent process has stopped at 2nd page > 17th product:

LAPTOP HP 340 G1 ARRIENDO 36 MESES

I've manually activate 'Navigate in browser' button in toolbar, then navigating to open the 17th product in 2nd page, it show me a page differ from usual one, the MAIN REASON is found now. see attached screenshot. if detail page exists a problem not displaying normally and no a 'back' link , it's certainlly failure to resume on next products / pages.

See the attached new project, for 'New template' link template , I've created xpath transformation scripts checking if it's at 2nd page > 17th product , then skipping the bad product page. (I've also created a 'go to 2nd page' template' for a quick testing, you can remove it , if the new project works for you).

using System;
using VisualWebRipper;
public class Script
{
    //See help for a definition of WrXpathTransformationArguments.
    public static string TransformXpath(WrXpathTransformationArguments args)
    {
        try
        {
            //Place your transformation code here.
            //This example just returns the input data
            string cur_page = args.InternalDataRow["cur_page"].ToString();
            if (cur_page == "2"){
                return "//DIV[@id='bloq_resultados']/DIV[@class='row dotedborder'][position() !=17]/DIV[2]/DIV[1]/DIV[1]/LABEL[1]";
            }
            return args.Xpath;
        }
        catch(Exception exp)
        {
            //Place error handling here
            args.WriteDebug("Custom script error: " + exp.Message);
            return "Custom script error";
        }
    }
}


MercadoLaptopV3.rip

Replied: 1/4/2016 11:35:33 AM

Please check the attached new project.

I've changed back to 'Javascript + Async.' action for 'New template' link template, it has open product page in same tab as well , and I've enabled the 'back' template for navigating back to last result page.

Mercadopublico.rip

Replied: 1/5/2016 8:55:30 PM

The project worked fine extracting the catalog first page (10 products), but the pagenavigation doesnt work and instead of going to page 2, it repeats page 1, and this happens over and over. Could you please fix it so the project navigates through the 18 pages of the catalog?


PS: Im attaching the log.

MercadopublicoLT_info_16_01_05.log

Replied: 12/31/2015 1:03:01 AM
Have you tried to set 'Javascript + Async.' action for 'New template' link template?
Replied: 1/19/2016 1:23:14 PM

See the attached new project, I've fixed 'proovedores' issue, it can extract all 67 items on the 14th product, I've set index starting from the 14th product, the main reason is that I ever set a wrong xpath for 'next' page navigation template (see below):

//DIV[@id='divPaginador']//A[.!= '1' and .!='<<' and .!= '>>']

Actually, when running the project, it has already ignored the first page '1', if we setup to ignore page '1' link in xpath, agent process will ignore one more page, that causes a problem, so the xpath should be corrected as below:

//DIV[@id='divPaginador']//A[.!='<<' and .!= '>>']


For 'All in one' project, I have double checked the templates step by step (in VWR editor), I find out that when reaching 14th page, then I have to click '>' button repeately for 3 or 4 times, then 15 page will be opened properly, guess it could be a bug from the website itself. moreover, I noticed that you still has setup 'full page load' action for 'New template 1' page navigation tremplate, it looks like good to me, but you should try 'Javascript + Async.' action as mentioned before.

However, I will run the project over night on our U.S test server, see what happens on tomorrow.

Then you can firstly try 'max retries.' = 3 (or more you prefer) for 'New Template 1' page navigation template (in advanced tab > action section), see if it might be helpful going through 15 pages more.

Attached is the two new projects according to above ..

mercadoAIO 4 pages catalog.rip
Mercado All in one 18 pages.rip

Replied: 1/1/2016 4:00:24 AM
That's correct, usually, javascript link couldn't be opened in a new tab, please you keep to do in same tab.
Replied: 1/1/2016 7:07:49 PM

Simon, the problem here is that your project is still not working, did you review the functionality? I need to solve this because the only reason why we bought this software was to scrape this site, at the moment couldnt show that it can afford that. Please before you send a new solution review that it is really working.

Thanks for your support, regards

Benjamin

Replied: 1/12/2016 5:01:26 AM

I propose that you firstly extract the first 15 pages , then manually setting a new template to first starting from 16 page , that will be more easily, otherwise, I will have to run the project over days (not sure this point), even it couldn't be repeated or it could be hard to diagnose why it is.

In regards to 'proveedores', I've added new page navigation template that it should be able to collect more than 5 pages at left panel in specific detail page.

Replied: 1/8/2016 2:27:41 AM

Please check the attached new project.

Simply, for example page with "ALL IN ONE HP PAVILION 18-G1 ", there is having more than 5 pages, you can activate the 'Navigate in browser' button in toolbar, then manually input the page url in address bar then open the page for adding a new page navigation template (you will need to deactivate 'Navigate in browser' back to edit mode).

For another page navigatio template through products page, you can try 'Javascript + Async.' action , see if it can go far than 15 pages? if not possible, you can try to increase the 'Delay after completed action' for the page navigation template. attached new project ,I've also increased the delay for 'back' template , it couldn't be required depending on how fast your internet connection is.

Mercadopublico.rip

Replied: 1/20/2016 6:48:39 PM

Simon, it was a different project, it is for a different category, and it gives me that error, on the same product everytime, page 2 17th product..

Thanks for your help

Benjamin

MercadoLaptopV3.rip
MercadoLaptopV3_info_16_01_20.log

Replied: 12/29/2015 11:09:57 PM

Still cant access, can we continue the thread here?

This was your last message:

VWR cannot open 2 web browser, actually, I'm not sure why you would like 2 more web browsers at run-time, VWR has ability to open multiple tabs in same browser, that will be more efficiently.

I don't quite understand what you mean by 'it went back and forth betweet 1st and 2nd product', can you please attach your info.log file for clarify again? 

 My reply:

Im sorry , I meant 2 tabs in the same browser, one for the catalog, and another for the products, that way it worked in mozenda.

With back and forth I meant that it opened 1st product, then second, then tried to go to next page but couldnt, and then started at page 1 again. So it got stuck on page 1 and didnt navigate to other page.

Mercadopublico_info_15_12_29.log

Replied: 1/19/2016 8:01:48 PM

I set Javascript + Async, and put max retries 5, and it didnt work. I think the reason it doesnt work is because once it gets to page 15 the next page button becomes disabled on the page. that happens when navigating the webpage, since it cant show more pages it becomes disabled, and maybe that is why the error happens, because the VWR expects the > button to work.. I suggest you to go to the page and see what im talking about so you can understand.

The proovedores issue is fixed.

I tried to scrape the "Laptop" category, with the "MercadoLaptop" project and it gives me a different error when reaching product 67 (2nd page 17th product), I attach the log

Thanks again

Benjamin


MercadoLaptop_info_16_01_19.log
MercadoLaptop.rip

Replied: 12/29/2015 2:24:03 AM

I seem to find the related inquiry for you:

http://support.visualwebripper.com/Display.aspx?si=dbdea811-b6d1-402d-8373-500c15ab56e8

I attach the demo project and sample data again.

Mercadopublico.xls
Mercadopublico.rip

Replied: 1/7/2016 4:20:08 AM

Hello Simon,

Now the page navigation is working until page 15, not for the total 18. It sends an error that you can see in the log.

Other thing that happens is that the project extracts maximum 50 "proveedores", if a product has more it ignores them. I send you a picture, where you can see that when a product has more than 50 "proveedores" it appears a next page button that doesnt display in the others with less than 50.


Thanks for your help

providers.jpg
MercadopublicoLT2_info_16_01_06.log

Replied: 1/8/2016 6:28:58 AM

Remember to enable the group template in order to extract the proveedores. What I meant with the issue of the "proveedores" was that the "proveedores" page navigation  needs a fix, because when a product has more than 5 pages of "proveedores" it appears a next page button (the button is shown in the picture I attached before) that the project is ignoring, so the project only extracts the first five pages, so I need help fixing the pagenavigation.

The page navigation template of products page didn't work, I think it doesn't work because when you reach page 15, the next page button stops working (it becomes just an icon), so the project wont go into pages 16 17 18, can you please fix that?

Replied: 1/15/2016 6:01:57 AM

Sorry, I'm thinking it 's hard to reach the page 16 directly.

But there is a way to load 50 products per page, therefore, you might not required to iterate through more than 15 pages, actually, it 's only 4 pages if you load 50 products per page.

Attached new project has created a new template - 'New Template 2' firstly selecting 50 products per page, it takes a long delay for 15 seconds in advanced options, it works fine for me in run-time. hope it's helpful for you.

Mercadopublico.rip

Replied: 1/20/2016 6:59:17 AM
16-01-19 15:48:50 Processing Back template back. HTML: <a class="btn btn-primary" id="ctrl_FichaProducto_LinkButton1" onclick="javascript:history.go(-1);return false;" href="javascript:__doPostBack('ctrl_FichaProducto$LinkButton1','')">&lt; Volver</a>
16-01-19 15:48:55 Processing Link template New Template (17 of 50). Text: LAPTOP HP 340 G1 ARRIENDO 36 MESES. URL: javascript:verFicha('1075816',5800179); return false;
16-01-19 15:49:26 Navigation error (JavaScript Click). Timeout waiting for JavaScript call to complete. Wait element path did not exist (//H1[@class=' azul']. Action:javascript:__doPostBack('ctrl_FichaProducto$LinkButton1','')
16-01-19 15:49:26 Error performing AJAX call: New Template. Timeout waiting for JavaScript call to complete. Wait element path did not exist (//H1[@class=' azul'])
16-01-19 15:49:26 Processing single link PageNavigation template New Template 1
16-01-19 15:49:26 Updating error reporting...
16-01-19 15:49:27 Starting export...
16-01-19 15:49:27 Generating export data...
16-01-19 15:49:27 Data extraction completed

At the end of log lines, it looks like that 'New template' has failed to open next page by AJAX action, I ever ran the project 'All_in_one_18_pages' on U.S test server, It stopped at 18 pages, not the 2nd page 17th products, I' m guessing that you met a internet connection exception or the website take a long delay than usual, you can try to set 'max retries' option for 'New template' link template ,see if it works ?

AJAX site has more complicated actions and being diffcult to extract full data such as your case, I will deliver your issues . see how much does it take.

Replied: 1/15/2016 9:15:18 PM

The 50 products per page, is a temporary fix because I have other categories with 600 products that will have around 20 catalog pages. So please help with that page navigation, even if it takes time I can wait.

I tried the 50 products per page and it works great as it has 4 pages, im attaching you the project because there is still problems when the product has more than 50 proveedores, this time it goes to the extra page but when I see the exported data it has 10 less than it should, for example one product has 67 proovedores, and it goes into all the proovedores pages (using view browser), but it still extracts 57 proovedores. Im attaching you the project below.

mercadoAIO 4 pages catalog.rip

Replied: 1/21/2016 5:03:58 AM
OK, I will run the project on U.S server. then see if this issue can be repeated for diagnose further.
Replied: 1/6/2016 8:27:17 AM

Which page navigation template does repeated page 1 always?

I suppose that you're saying 'New template 1' page navigation template where to load product list by pagination, I've tried to disable the 'group' template as temporarily, then running the project for a testing, it has broke at 2nd page, and I seem to see the first product title in 2nd page has same as 1st page.

I've revised to 'Full page load' action for page navigation template and setting 'Delay after complected action' = 5000 (5 seconds) in advanced tab, then page navigation seems to work properly, then next page won't generate same product records,

See the attached new project and sample data. if you haven't seen duplicates, then you will need to enable 'group' template by unchecking 'Disable template' option, then see if everything is ok.

Mercadopublico.xls
Mercadopublico.rip