It's all inside the transformation script and XPath. The first transformation will remove all elements before the data and this gives a way for the second transformation to manipulate the actual data.
The second transformation has a lot of things to do. It marks the names and lines in the address before removing all HTML element tags. Then it will make a new HTML tag and attributes to each name and addresses. Name and addresses have different attributes. This will help us to easily capture the data in XPath.
After the transformation, we can now select properly. First, we will select all addresses (XPath: //BODY/DIV[@id="eachline"]) in pagearea and capture the address easily and refine them using content transformation.
Since the name is not part of address pagearea, we need a custom XPath for this:
which says look backward (preceding-sibling::) and find div with id "name" (DIV[@id="name"]) and capture only the first encounter (). Try to removed the "" and it will capture multiple names because it will look and capture for all names backward. And so "" will capture only the first name it encounters.