Latest Inquiries - Data Extraction Software

Program stopping working after some time

Submitted: 2/25/2016

Hi,

My program extracts data from several pages of a website, and stops working at page 2, while I have 32 pages to scrape. I don't understand why it stops working, almost always at the same moment, while it perfectly works for the first page and the beginning of the second page. I tried to start the scraping at page 3 and 4, but it also stops working in the middle of the page.

Could you please tell me what's wrong with my program? You will find it attached.

Also, I would like to have the URLs of each page I scrape the data from written in the final sheet where are all my data. How can I do that?

Thank you very much.

Audrey

Timeshighereducation.rip

Replied: 2/26/2016 2:58:50 AM

See the attached new project.

For page navigation template ,you only need to click on '>' link itearting through all pages, and you should set AJAX action instead of auto-detect. Moreover, this site seems to expire current session if you don't interactive with any page for a long time, then it won't be able to resume on next page also, so I made optimization to just remain 'Execute javascripts' option in Project > Project options > Browser tab, this will speed up scraping on detail pages for avoding session expiration .

For 'Rank' link  template, I've also revised xpath to select <a> link to open detail page in a new tab, 

'URL' is the new element with 'PageAttribute' type inside 'Rank' link template, it does select 'URL' option in 'Page attribute' dropdown for capture current detail page url.

I tested to run the project throug 5 pages more, it works fine, but sometimes , it might encounters 'Out of memory' issue, but it seems to carry on last broken page as well after restarting browser automatically.

Timeshighereducation.rip