Latest Inquiries - Data Extraction Software

Automate download of documents from website (list of gov purchases)

Submitted: 7/10/2013
The problem: I have to compile a list (EXCEL or CSV) of all public purchases of the Brazilian Government from July 2012 to July 2013. This is public info but the process to get all the data manually might take me months. I need to automate it.

The base URL is:
https://www.comprasnet.gov.br/ConsultaLicitacoes/conslicitacao_relacao.asp?numprp=&dt_publ_ini=01/07/2012&dt_publ_fim=15/07/2012&chkModalidade=1,2,3,5&chk_concor=31&chk_pregao=1,2,3,4&chk_rdc=&optTpPesqMat=M&optTpPesqServ=S&chkTodos=&chk_concorTodos=&chk_pregaoTodos=-1&txtlstUf=&txtlstMunicipio=&txtlstUasg=&txtlstGrpMaterial=&txtlstClasMaterial=&txtlstMaterial=5521,13811,14176,13167,9726,9725&txtlstGrpServico=&txtlstServico=&txtObjeto=&numpag=



- First step: This URL is already filtering the type of docs I need. However, I can only list docs published 15 days at a time (see the start and end dates in the URL above). So Visual Web Ripper will have to query 15 days at a time, automatically;

- Second step: Each listing of 15 days will redirect us to an unknown number of pages, with 10 items in each page. Visual Web Ripper must click each button labeled "Items and Download", for all 10 items in all pages. There's a button named "Avan├žar" which means "Next" by the end of the page;

- Third step: For each "Items and download" button we clicked in the previous item, we go to another page, in which we must click the button named "Download" (it's by the end of the page). 

- Fourth step: And once we do click this "download" button, a popup asking for a captcha appears. I'd like to send that to DeathByCaptcha.com. 

- Last Step: FInally, once we solve the captcha, a download of doc I need starts. 

We must do this over and over to get all files, 15 days at a time, from July 2012 to July 2013. 

I've been told Visual Web Ripper can easily do that in a snap and am willing to buy it fast (I neeed this data ASAP, my job is on the line).

Can you help me create this project?


Replied: 7/10/2013 10:02:32 PM
Please check the attached demo project, you need to place the project file in default Projects folder then run it in VWR.

You can manually add more start urls with different date parameter values (e.g, dt_publ_ini=01/07/2012&dt_publ_fim=15/07/2012) for this demo project, therefore, this will fetch the future results , of course, you can use script to generate the start urls with different date parameters values, but it's a little of complex for novice.

With the decaptcha when downloading the zip archive files, currently, you have to manually type in the captcha then vwr will proceed downloads, to fully automatically process on downloads, you need to purchase 3rd party service then passing user/password in decaptcha script.

F.Y.I:

Feeding start urls

CAPTCHA Protection
Comprasnet.rip