Improving translators performance by using Jsoup

in #utopian-io7 years ago (edited)

What Will I Learn?

Greetings, the purpose of this tutorial is to show how to gather data from turkish,english,polish and german dictionary sites by implementing Jsoup library on your Java project. By doing it you may prepare your own data pool and a unique output to your users. You will be index any website’s data and manage to analyze it. For example, you can get a table from Wikipedia and let the user make a search then according to his search you can pull the data and display it in a user friendly output panel.

Requirements

  • IDE is required to test the code (preferably Eclipse IDE for java developers)
  • Basic knowledge on Java.
  • Basic knowledge about Jsoup library.

Difficulty

This tutorial is prepared for indivuduals who have a prior knowledge about Java classes, libraries and programming languages,

  • Intermediate

Tutorial Contents

In this tutorial we will pull our data's from tureng and pl.pons then process it according to our needs. There are quite a lot of methods and ways to index a webpage in java but the fastest and accurate one is to use api of the desired page if its possible. Firstly we should go to the page that we want to get datas. Then we should find the div class that we want to pull and after processing the data we will be able to get the below outputs,

43.png

Before starting its good to remember what we have done in the previous tutorial since this tutorial will focus on imroving and adding more features to the previos tutorial. In the end of this tutroail you will learn to get multiple data from different websites, compare their outcomes and limit the output according to the input. (feedback loop creation)

Initially, before calling our function we must add the libraries that we are going to use in the project.

The first librarty that we need to locate is the java.io.IOException which is capable of showing/displaying detailed errors when user enters an unexpected input. Briefly it is used to optimize input/output (i/o) relationship,

import java.io.IOException;

We can then procceed on adding our Jsoup library which is capable of generating,tracking tracking the html codes of the desired sites

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


Now we should add one last library that will help us to get the user entered values,

import java.util.Scanner

Then we can declare our class

public class

You may name the class as you wish but it should be same with the file name.And we need to define our method by saying public static void, we mean that the code is visible,no return value and a class type.

public static void main(String[] args) throws IOException

Now we should proceed on showing the location of the file that we want to index
Here we will use the appropriate Jsoup command for connecting the site we should use Jsoup connect for parsing data we should use Jsoup parse

Jsoup.connect(yourwebsite).get();

Now after declaration and brief summary of what we ve done in the previous tutorail we can now add polish dictionary to our translator. In order to add another language support or data pull from any other website the very first thing we need to do is to check whether the site has api and if not we can use the url to gather the user searched results. In the site which are going to use the url changes according to the user inputs. To illustrate,

When there is no search the home page of the site is,

https://pl.pons.com/t%C5%82umaczenie

When a polish word was searched output becomes (for sample polish 'dzień' word,

https://pl.pons.com/t%C5%82umaczenie?q=dzien&l=enpl&in=pl&lf=pl

As it can be seen the 'q=dzien&l=enpl&in=pl&lf=pl' tag comes along with the url after the user makes a search. So in our application we can ask the user to enter a word and then change the url into,

 https://pl.pons.com/t%C5%82umaczenie?q='user-entered-word'&l=enpl&in=pl&lf=pl 

Now since we know which site to go we should declare the div class where our application should focus and got its value. Here we have few options, we can either pick div-target , div-inner or temple classses. Since the other two involes other values than the desired word the most suitable one is picking div-target class.

23.png

In the above picture how to pick the desired tag is shown. By picking the div-target tag its easy to display and get the results for the user entered word. In your design you can right click the object you want to use and then tracke the object to get the appropriate div tag.

Elements initialtable = doc.select("div.target");

This div.target tag will return all the values inside the tag of the website. Keep in mind that sometimes the formwt cant be readable to change that in eclipse you can change the text type of the prorject from its properties. Now since we have the elements we have to arrange in to a user friendly output. There are several ways to design it. You may want to add some more text or data , feel free to adjust the code according to your needs.

        initialtable.remove(0);
        String dr = "";
        System.out.println("");
        System.out.println("Polski lub Angeliski ");
        System.out.println("---------------------------------------");

This above code will print some messages to look have a better user experience. Now we can proceed on getting the elements and showing them according to the user's search. In order to do that we shall get the elements make them string and display them by giving a new line in each output. Thereby a for loop is used to convert tag elements into text field display.

    for (Element d : initialtable) {
                dr = d.text();
                i++;
                System.out.println(i + ". " +dr);  
                    if (i>5) {
                        break; 
                    }   
           }        
        }

This above code will give the five best matched polish translation of the user entered english word. Moreover for german same procedure repeated by using another translation site named Tureng. For german http://termbank.com/tr/almanca-ingilizce/'your-searched-word url and td.en.tm tag used. Below are the overall code and sample outputs for the improved language support of the translator project.

Overall code,

 import java.io.IOException;
import java.net.URL;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;
import java.util.Scanner;

public class curric {
 public static void main(String[] args) throws IOException  {
 Scanner keyboard2 =new Scanner (System.in);
 System.out.println("EN / TR / PL / DE ? ");
 String kang2 =keyboard2.next();
   if (kang2.equals("EN")) {
    Scanner keyboard =new Scanner (System.in);
    System.out.println("Which word you want to trasnlate?");
    String string=keyboard.next();
    String link  = "http://tureng.com/tr/turkce-ingilizce/" + string;
    Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.tr.ts");
    initialtable.remove(0);
    String dr = "";
    System.out.println("");
    int i = 0;
        for (Element d : initialtable) {
            dr = d.text();
            i++;
            System.out.println(i + ". " +dr);  
                if (i>10) {
                    break;  
                }
        }
     }
  else if(kang2.equals("TR")) {
    Scanner keyboard =new Scanner (System.in);
    System.out.println("Hangi türkçe kelimeyi çevirmek istiyorsunuz?"); 
    String string=keyboard.next();
    String link  = "http://tureng.com/tr/turkce-ingilizce/" + string;
    Document doc = Jsoup.connect(link).get();
    Elements initialtable = doc.select("td.en.tm");
    String dr = "";
    System.out.println("");
    System.out.println("Aradığınız keimenin ingilizce karşılığı");
    System.out.println("---------------------------------------");
    int i = 0;
        for (Element d : initialtable) {
            dr = d.text();
            i++;
            System.out.println(i + ". " +dr);  
                if (i>10) {
                    break; 
                }   
       }        
    }
     
  else if(kang2.equals("PL")) {
        Scanner keyboard =new Scanner (System.in);
        System.out.println("Polski lub angelisku?");    
        String string=keyboard.next();
        String link  = "https://pl.pons.com/t%C5%82umaczenie?q=" + string + "&l=enpl&in=&lf=pl";
        Document doc = Jsoup.connect(link).get();
        Elements initialtable = doc.select("div.target");
        initialtable.remove(0);
        String dr = "";
        System.out.println("");
        System.out.println("Polski lub Angeliski ");
        System.out.println("---------------------------------------");
        int i = 0;
            for (Element d : initialtable) {
                dr = d.text();
                i++;
                System.out.println(i + ". " +dr);  
                    if (i>5) {
                        break; 
                    }   
           }        
        }
   
  else if(kang2.equals("DE")) {
        Scanner keyboard =new Scanner (System.in);
        System.out.println("Welches Wort möchtest du übersetzen?"); 
        String string=keyboard.next();
        String link  = "http://termbank.com/tr/almanca-ingilizce/" + string ;
        Document doc = Jsoup.connect(link).get();
        Elements initialtable = doc.select("td.en.tm");
        initialtable.remove(0);
        String dr = "";
        System.out.println("");
        System.out.println(string + " nach Englisch ")
        System.out.println("---------------------------------------");
        int i = 0;
            for (Element d : initialtable) {
                dr = d.text();
                i++;
                System.out.println(i + ". " +dr);  
                    if (i>5) {
                        break; 
                    }   
           }        
        }
  
   else {
       System.out.println("EN - English, TR - Türkçe , PL - Polski , DE - Deutsch");
        }

   
  } 
}

For Polish word search,

1.png


2.png


For German word search,

23.png

For improved Turkish word search,

1.png

Curriculum



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

Hey @cha0s0000, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!

@wodsuz, Upvote for supporting you.

Hey @wodsuz I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Suggestions

  • Contribute more often to get higher and higher rewards. I wish to see you often!
  • Work on your followers to increase the votes/rewards. I follow what humans do and my vote is mainly based on that. Good luck!

Get Noticed!

  • Did you know project owners can manually vote with their own voting power or by voting power delegated to their projects? Ask the project owner to review your contributions!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Coin Marketplace

STEEM 0.24
TRX 0.25
JST 0.040
BTC 94127.71
ETH 3400.37
USDT 1.00
SBD 3.38