Getting Started with JSOUP in Android

Damilola Omoyiwola
3 min readNov 22, 2017

Jsoup is a Java html parser. It is a Java library that is used to parse html documents. Jsoup gives programming interface to concentrate and control information from URL or HTML documents. It utilizes DOM, CSS and Jquery-like systems for concentrating and controlling records.

In this tutorial, you will get to know few steps to start with in parsing html document in an android application interface using Jsoup.

Update

Kindly note that this implementation might not be useful anymore. You can check out for articles that explain this with latest technologies. Thanks! :)

Example

This simple android application shows details of Firebase with Jsoup used to parse the logo and title from the web page.

Let’s get started. Create a new android project with an Empty Activity.

Step 1:

Add Jsoup dependency to the app level of your build.gradle file since this is an external library.

Step 2:

Add Internet permission to the Android Manifest file for internet access.

Step 3:

Prepare a layout to display the data that will be fetched from the web page. For example, logo and title.

Step 4:

Go to your MainActivity.java class, in the OnCreate() method, initialize your views. Create an AsyncTask class that will be used to fetch the data in the background before displaying it on the main thread.

Let me explain some lines of codes and elements of Jsoup before calling them in the AsyncTask class.

//This is the Firebase URL where data will be fetched from
String url = "https://firebase.google.com/";
//Connect to website
Document document = Jsoup.connect(url).get();

//Get the logo source of the website
Element img = document.select("img").first();
// Locate the src attribute
String imgSrc = img.absUrl("src");
// Download image from URL
InputStream input = new java.net.URL(imgSrc).openStream();
// Decode Bitmap
bitmap = BitmapFactory.decodeStream(input);

//Get the title of the website
title = document.title();

In the above codes;

Document document = Jsoup.connect(url).get(): Document is a Jsoup node API element used in connecting to the website.

Element img = document.select(“img”).first() : allows the program check through the webpage to get the first <img> since logo is usually placed at the very beginning of the code.

String imgSrc = img.absUrl(“src”): allows the program check through absolute attribute ‘src’ of <img> and get the respective URL.

InputStream input = new java.net.URL().openStream(imgSrc): this downloads the logo from the url.

Bitmap bitmap = BitmapFactory.decodeStream(input): this code creates the logo bitmap.

String title = document.title(): this automatically gets the title of the website.

These lines of codes work at the background process in the doInBackground() method and the respective results are displayed in the onPostexecute() method. Then the AsyncTask class is called to execute in the onCreate() method.

Step 5:

Running the application gives;

This brings us to the end of the tutorial… So far, Jsoup provides a very convenient API for extracting and manipulating data.

--

--