Latest Inquiries - Data Extraction Software

Selecting between HTML tags

Submitted: 7/4/2017

Hi,

Is there a way to select all data between HTML tags?

In the attached project, can I select all the data between BOLD names as individual records:

Example: (This would be 1 record) - and then have multiple Addresses for this record.


Al-Atrash, Dr. Wafa'a - 50245
120 Susie Lake Cr. #21-22
Bayers Lake Business Park
Halifax, NS  B3S 1C7
Tel: 902-450-5701  Fax: 902-450-5702
 
745 Sackville Drive
Lower Sackville, NS  B4E 2R2
 
3045 Robie Street
Halifax, NS  B3K 4P6
Tel: 902-454-2043  Fax:
 
6132 Quinpool Rd
Halifax, NS  B3L 1A3
Tel: 902-422-7835  Fax:
 
10-2 Cumberland Drive
Dartmouth, NS  B2V 2T6
Tel: 902-462-3847  Fax:
 
210 Chain Lake Drive
Halifax, NS  B4V 1B3
Tel: 902-450-5317  Fax:

Thanks.



OONS.rip

Replied: 7/12/2017 4:11:51 AM

Hi,

It's all inside the transformation script and XPath. The first transformation will remove all elements before the data and this gives a way for the second transformation to manipulate the actual data.

The second transformation has a lot of things to do. It marks the names and lines in the address before removing all HTML element tags. Then it will make a new HTML tag and attributes to each name and addresses. Name and addresses have different attributes. This will help us to easily capture the data in XPath. 

After the transformation, we can now select properly. First, we will select all addresses (XPath: //BODY/DIV[@id="eachline"]) in pagearea and capture the address easily and refine them using content transformation. 

Since the name is not part of address pagearea, we need a custom XPath for this: 

preceding-sibling::DIV[@id="name"][1]

which says look backward (preceding-sibling::) and find div with id "name" (DIV[@id="name"]) and capture only the first encounter ([1]). Try to removed the "[1]" and it will capture multiple names because it will look and capture for all names backward. And so "[1]" will capture only the first name it encounters.


Best regards,



Replied: 7/11/2017 5:21:04 PM
Thanks for your help. It works. Can you explain how you made it work?
Replied: 7/6/2017 6:17:05 AM

Hi,

Try attached agent.

Best regards,


OONS.rip