Latest Inquiries - Data Extraction Software

FEMA Change Detection

Submitted: 8/8/2013

My company is looking to utilize a web scraping service for a project we’re working on, and we’re exploring possible vendors for this service. We were wondering if your service might be able to meet our project’s requirements.

 

What we are looking to do is web-scrape the FEMA.gov website, the Disaster Declarations section of the website in particular (http://www.fema.gov/disasters).

 

We want to scrape every hyperlink to the Disaster Declarations, and then scrap information on a 2nd hyperlink within each of those disaster declaration hyperlinks, with the 2nd hyperlink titled “Designated Counties”, to scrape the following disaster information (name, dates, counties designated for Public Assistance, counties designated for Individual Assistance).

 

The primary challenge we are trying to work through is we don’t want to pull a report of all that information every day.

 

What we want to pull every day, is only disasters where a change has occurred on the page, that change being primarily new counties being added/designated for FEMA Assistance on the disaster declaration page.  We don’t want to pull disasters where there has been no change in the designated counties between yesterday and today.

 

If possible, we’d like the scrape to produce the change results in Excel format if you have that capability, however if Excel format isn’t possible, we might be able to utilize other formats.

Replied: 8/19/2013 10:38:40 AM

I finally had some time to test this agent out. Would I be able to have columns for all the individual PA_Titles (such as PA, PA-A, PA-B, etc.)?

Currently there are columns for Individual Assistance, but the various Public Assistance subcategories are actually pulling by rows. I'd like to pull the PA subcategories by columns if possible.

Also how would I be able to utilize some sort of Change Detection. Where if a particular record in my Export Date has changed since the last time I scraped the website, I could pull a report where only these types of records with changes would export?

Thanks,

Replied: 8/19/2013 6:40:37 PM
Please check the attached new project.

PA , PA-A, PA-B,etc. now are placed as column, this can get done by setting AddColumnsInParentTable for PublicAssistance page area template, and specifying to export MultipleColumns for "PA_Content" element.

F.Y.I:

Controlling export data structures

Regarding to detect the changed record, you need to set "Add To existingd data" in Project > Project options > Project data tab, and specifying which one element is the key to be changed as possible, therefore, you can choose to export the new data only.

F.Y.I:

Incremental web scraping
Fema.rip

Replied: 8/9/2013 6:08:28 AM

Attached are the screenshots per request. I would use the name of the 1st hyperlink you click for each disaster, which is circled in Red on the screenshot "Untitled2". The first primary key for each record would be "Wisconsin Severe Storms, Flooding, and Mudslides (DR-4141)", the second would be "New Hampshire Severe Storms, Flooding, and Landslides (DR-4139)", third would be "Florida Severe Storms and Flooding (DR-4138)", and so on.

 

Please let me know if you need anything else from me! Thanks!

Untitled3.jpg
Untitled4.jpg
Untitled2.jpg

Replied: 8/8/2013 9:08:47 PM
Please you attach a fewer screenshots to clarify how to reach the 1st link, 2nd hyperlink and those fields (name, dates, counties ..)..etc. like you said, I'm unable to find out exactly what you needed on the website although I followed your instructions, thanks.

Which field(s) can be the primary key to validate that old record from last run is changed ?
Replied: 8/9/2013 7:22:50 PM
Please check the attached demo project.

You need to place the project file in default projects folder, then run this project in VWR program.
Fema.rip