Latest Inquiries - Data Extraction Software

Problem with extracting data from google translate

Submitted: 11/15/2018
I need help,
I have some problems in extracted data
I have a project to extract data from Google Translate results.
I translated website page from chinese language. Then I translated into english using google translate.

So I will extract the data from the translation url from google translate.
Example, I want to scrape data from https://www.piaotian.com/html/1/1705/index.html
and this is the translation url

https://translate.google.com/translate?hl=id&sl=auto&tl=en&u=https%3A%2F%2Fwww.piaotian.com%2Fhtml%2F1%2F1705%2Findex.html

So I will use the url above to extract the data.

Here I have a problem with the results of scrape.
The results of scraping data there are two different languages.
One is Chinese (original source) and one is English (translation result data).
Well, how to just extract the translation data (English) without the original data (Chinese)?
This problem appear on all websites that are translated by google translate.

so, how to extract English data only?
please modify my file...

Thank you


2.png
3.png
1.png
Translated site.rip

Replied: 11/16/2018 6:30:08 AM

Hi,

Find the attached agent for the Translated Content alone.

And coming to your Regular Expression say for ex.

<span><span>Author: Park Saenal (??? )</span> Penulis: Park Saenal (?? ?)</span>

If you are trying to extract data using Regular Expression make sure you end the HTML tag properly. For example. when you try to extract data between <span> and </span> tag make sure you end it properly if you start it. i.e.; use "<span>(.*)</span>"

You can check the attached Transformation code Snippet for your Reference.


Best Regards,


RE.PNG
Translated site.rip

Replied: 11/16/2018 12:19:48 AM
 Hi, how to extract between</span> and  </span>

<span><span>Title: Overgeared (??)</span> Title: Overgeared (??)</span>
<span><span>Author: Park Saenal (??? )</span> Penulis: Park Saenal (?? ?)</span>
<span><span>Status: 911 chapters (Ongoing)</span> Status: 911 bab (Sedang berlangsung)</span>
<span><span>Translator: Rainbow Turtle</span> Penerjemah: Rainbow Turtle</span>
<span><span>Editors: LD and Superposhposh</span> Editor: LD dan Superposhposh</span>
<span><span>Schedule: 10 chapters a week</span> Jadwal: 10 bab seminggu</span>

Hi, how to extract between </span> data extract  </span>
for example:
<span><span>Author: Park Saenal (??? )</span> Penulis: Park Saenal (?? ?)</span>
<span><span>Status: 911 chapters (Ongoing)</span> Status: 911 bab (Sedang berlangsung)</span>
<span><span>Translator: Rainbow Turtle</span> Penerjemah: Rainbow Turtle</span>
<span><span>Editors: LD and Superposhposh</span> Editor: LD dan Superposhposh</span>
<span><span>Schedule: 10 chapters a week</span> Jadwal: 10 bab seminggu</span>

so the result is --->  
Penulis: Park Saenal (?? ?)
Status: 911 bab (Sedang berlangsung)

Editor: LD dan Superposhposh
Penerjemah: Rainbow Turtle
Jadwal: 10 bab seminggu

thanks...
Replied: 11/15/2018 11:58:32 PM
use this file if the above doesn't work
thanks
Translated site.rip