Latest Inquiries - Data Extraction Software

Data Extraction (Email and Website) from Reach Local

Submitted: 10/14/2013

Hello, 

I want am trying to capture Email Id an Website link from Below link:

Link: http://business.reachlocal.com/search/weddings-personal-services/wedding-planning-wedding-services/CA/Encino/%2522Best_Universal_Transportation__Inc.%2522/1ee2d62/

I have tried to capture but I a getting below inner html code :

<A onclick="print_popup('mailto:reservations@butlimo.com',1,1)" href="/search/weddings-personal-services/wedding-planning-wedding-services/CA/Encino/%2522Best_Universal_Transportation__Inc.%2522/1ee2d62/?evt=2"><IMG src="/images/sendmail.gif"></A>

In email I just want reservations@butlimo.com not any other extra field.

Same way I want website.

Please suggest me what should I do  ?

Thanks,

D Son

 

 

Replied: 10/16/2013 5:42:38 PM
Basically, the both need to convert HTML text then parse out the correct value using Regex script, you also can choose 'Html' option in Misc tab without marking check "Use Html as input string" option in script editor.

F.Y.I:

Replied: 10/15/2013 5:57:52 PM
Please check the attached demo project.

You need to place the project file in default projects folder, then run this project in VWR program.

With the both of email & website, you need to use Content Transformation Regex script to extract.
Reachlocal.rip

Replied: 10/16/2013 12:22:25 PM
Hello,

Is there any other way to get input HTML string or transformation script in content transformation without selecting "Use HTML as input string" option ?

Any other way to do that with regex ? I am trying to understand functionality with regex.

Thanks

Replied: 10/16/2013 1:29:15 AM
Hello,

Thanks for your support. After checking your project, I have tried to create my project. I have created 4 elements to capture data.

1. Title
2. Phone
3. Email
4. Website

When I am executing file, I am getting Title and Phone but not getting email and website link.

Please check my project and let me suggest the changes required.

Thanks.

Regards,
Dhaval Soni

ReachlocalTRIAL-Dhaval.rip

Replied: 10/14/2013 5:24:15 PM
I'm unable to access the target url that you gave, it give me error page:
Error
The page you have attempted to visit is temporarily unavailable. Please try again in a couple of minutes.

Please contact our support team at 1-877-318-2180 or email secops@reachlocal.com if you are still unable to access the page after a second attempt. You will be asked to provide the Event ID and Session ID below. We apologise for the inconvenience. 

The Event ID is: 8502016439936728680.
The Session ID is: N/A.

maybe I need to first sign in the website?
Replied: 10/16/2013 12:25:28 PM
When I am selectng link, I get transformation script, but when I select text, it is blank.

pls suggest me.

Replied: 10/15/2013 11:43:58 AM
Ok,

You can open that url by:

1st: http://business.reachlocal.com/
2nd: Click on "Personal Services (Weddings, Cleaners, etc."
3rd: Click on 1st record or 3rd record which is "Best Universal Transportation, Inc.".
4th: on Detail page of "Best Universal Transportation, Inc.", you will see "Send Email" and "View Website" button.



Replied: 10/16/2013 5:46:29 AM
See the attached new project.

I've mark check "Use HTML as regex input" in Content Transformation script editor for the both of email & website, therefore, it will take the HTML text as the input value for getting correct output value.

For website element, I've also revised the Regex script.

If you still cannot get it, please you try to check for your log file in default log folder, then see what specific url couldn't find email & website but they 're existing in there, then you activate the "Navigating in Browser mode" button in toolbar in VWR editor, furthermore, you can make the change of Regex script to extract the values of email & website.
ReachlocalTRIAL-Dhaval.rip