Latest Inquiries - Data Extraction Software

Cochrane library

Submitted: 7/9/2012

Hi

I have unstitutional access to the cochrane collaboration webite and wish to download all the data files for research purposes.
However, for each meta-analysis I need to tick a terms and conditions box before downloading the specific dataset.
Considering there are hundreds or thousands of these I don't want to have to do manually.
If your software can download the datasets for me I'll by very happy to buy a full version.

Thank you

Untitled.jpg

Replied: 7/10/2012 3:02:07 AM
http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD007277.pub2/full 

My page is differ from your screen-shot, it asks me a email and password to login.
is it this reason that i cannot see the download link?
Replied: 7/10/2012 7:02:12 PM
Unfortunately, we cannot make the demo project for you, unless you can tell us how to reach the specific university gateway then accessing that target site.
Replied: 7/13/2012 12:37:06 AM
Thank you Soeren
That's perfectly understandable (limited with what you can do in a demo project).
Seems to be doing almost everything I requested this.
Will it be possible to also extract the directory tree names and the study names from that page (plz see attached) in an xml file?
Then I can combine with the other xml you sent easily. You don't have to do it now, if you know it can be done you can help me with it after I purchase the software. That's the last item on my agenda.

Untitled.jpg

Replied: 7/11/2012 1:38:57 AM
I think I can narrow it down to this question
is there a way to use the custom javascript override under project options to automatically "tick" the terms and conditions box (see attached) and click the download button?

<input type="checkbox" id="tAndCs" name="tAndCs" class="cb" /> <label for="tAndCs">I agree to these terms and conditions</label> </fieldset><input value="Download Data" id="accept" type="submit" class="submit" /></form> </div>
Desktop.zip

Replied: 7/10/2012 11:29:47 PM
Unfortunately I can't replicate what you see, since the whole of England seems to have access:
that's not the case for the US.

can you give me an answer if what I ask can be done using the screenshots and the information I provided? the terms and condition box needs to be ticked and the link to the file to be downloaded is encrypted.
Replied: 7/13/2012 2:03:14 AM
Or a TOC of this page if it's simpler:
thank you
Replied: 7/12/2012 7:49:36 PM
I've attached demo project and sample data extract (Excel XML). I've used the alphabetical list of articles, because the topic list is difficult to navigate and beyond what we can do in a free demo project.

The software cannot download a web page as a HTML file, but it can extract the HTML to the data file (Excel file in this case), which is what the project is doing.

You are attempting to extract a huge amount of data and that requires special considerations. You can read more about that here:

Wiley.xml
Wiley.rip
Files.zip

Replied: 7/10/2012 12:46:57 AM
That's fine please see attached
no3.jpg
no2.jpg
no1.jpg

Replied: 7/9/2012 3:55:07 AM

I'm attaching the discussion with another web ripper developer:

>> Thank you for your email.

>> Unfortunately yes. Each full article webpage contains a link to a dataset webpage. On that page there is an encrypted link to the dataset to download which you need to click the terms and conditions box.

>> So there isn't anything you can suggest?

 


>> Hi Evan,


>> Thank you for your interest in Web2Disk. I took a quick look at this

>> site and I don't think it will be possible to copy it (if it is

>> possible, it won't be easy).

 

>> Each time you download an article, do you need to agree to the terms

>> and conditions? If so, then Web2Disk won't be able to copy the site.

>> You can configure it to click a specific textbox (say, if there was

>> only one prompt for the whole site), but there's no way to configure

>> it to automatically tick the other check boxes.


>> The second hurdle with this site is that some of the content is

>> hosted on a different domain (onlinelibrary.wiley.com). You can get

>> around this using Domain Aliases (or Additional Root URLs), but these

>> sites are probably absolutely massive and not practical to copy.

 

>>> Hello


>>> I'm interested in accessing information on a website for which my institution has purchased a licence.

>>> The website is :

>>> http://www.thecochranelibrary.com<http://www.thecochranelibrary.com/

>>> I want to download everything, in particular the datasets which are hundreds if not thousands. Then I plan to analyse using statistical software.

>>> The problem is that the links to the datasets are encrypted and in order to download even one you need to tick the terms and conditions box (see attached picture).

>>> Hence I'm not sure a web downloader can download these. If you can provide a solution to this, I will be happy to purchase a licence through my institution.

>>> Best wishes

>>> Evan

Replied: 7/9/2012 7:57:12 AM

To access the website you might need higher education institutional access - from a uni for example. So some of the below links might not work for you.

The starting location is: http://www.thecochranelibrary.com/

In there, there are numerous systematic reviews and they can all be found here (alphabetically):

http://onlinelibrary.wiley.com/book/10.1002/14651858/titles?searchKey=1cdbc5fc-7282-4f0c-8282-8fbe0681d0b3&uuid=1cdbc5fc-7282-4f0c-8282-8fbe0681d0b3

 

clicking on a review (protocols shouldn't have links to data attached to them) takes us here for example:

http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD007277.pub2/full

 

under 'data and analysis'  there's a link to the statistical data ('Download statistical data').

Once you follow this link the terms and conditions web page comes up and after ticking to accept you can click on the encrypted link to download.

 

Obviously I want to repeat the process for all the reviews that have data attached to them. And then repeat every other month or so to update my data collection.

 

Best Wishes

 

Evan

Replied: 7/10/2012 3:24:56 AM
unfortunately, yes  
like i said before i have institutional login access through the uni of manchester
Replied: 7/11/2012 2:32:19 AM
Please check the attached demo project, it can give you a better thought to download file.
you will need to change the start url ,, currently, i 'm using local html as you sent.

the first link template to simulately click on 'agree' check box, then the following form submit template - 'download' will start to download the file , when you edit the 'download' form submit template, you can take a look at the config section - 'file download on form submit' in Advanced tab, i assumed that the fixed extension is pdf .
form_download_demo.rip

Replied: 7/13/2012 11:35:15 PM
It's very difficult to process AJAX trees and it's beyond what we can help you with as part of standard support. The cost of building such a project would be at least twice the cost of the software itself.

It would be much easier to extract data from http://onlinelibrary.wiley.com/book/10.1002/14651858/homepage/crglist.html, but I'm not sure how that would give you the same data as the topic tree.
Replied: 7/9/2012 10:37:20 PM
I cannot find the 'Download statistical data' link in  http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD007277.pub2/full

can you please attach fewer screen-shots to indicate how to reach each page?
and which content are you interested to collect ?

Thanks.
Replied: 7/9/2012 5:49:22 AM
Please lead me how to reach that download page, so i can make the demo project to download that dataset file.. thanks.
Replied: 7/11/2012 3:54:52 AM
Thank you for that, that actually worked and it seems the software can do what I am after.
But I will need to do that for hundreds of webpages. the starting location is: 

and the information on all directories and subdirectories I want to be stored (e.g. in an excel file) and linked with the file names I will download in the final step you explained in your previous reply.

for example see the screenshot and navigate to that full article please (although you won't be able to see the full details hence the other attachment)
the full pdf (http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD004401.pub2/pdf) 
and of course the data file (the extension is rm5) in the manner you explained below (IF there's a "Download statistical data" link on the /full page for that article.

I really appreciate your help with this and once I manage to get the template working with the trial version I'll put in a purchase order with my institution.
Desktop.zip
Untitled.jpg