XML and Java

XML is a simple markup language, but learning how to interface with an XML file in java is difficult (everything in java is difficult).

First let’s look at the structure of an xml file. A file named myxml.xml could contain only a single line like this and be a valid xml file.

<title>mytitle</title>

Each thing between the brackets is called an element. Elements can be nested like…

<book><title>mytitle</title></book>

The nesting can be on separate lines for legibility; it doesn’t matter.

Within java we need to read this file into memory. We do this by way of a class called DocumentBuilderFactory which acts as a data type so we can make an object we can manipulate that has the data from the file in it.

A line like…

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();

will serve us well. We would have an object called f that we can use to set some parameters like whether we want to ignore comments within the file, but we’ll ignore those for now. Once the parameters are set we call another class called DocumentBuilder to make the actual DOM object we need and set a few more parameters. A line like…

DocumentBuilder b = f.newDocumentBuilder();

will give us the built document called b. This however requires a try/catch statement, so don’t forget that.

Now you would have an object called b. We use a method within documentbuilder to parse it, which makes it an actual thing we can make use of. Since b is an object it already contains that method and we can call it with a line like…

return b.parse(new InputSource(filename));

where filename would be the name of the xml file. Pretty complicated huh?

In code it would all look like this…

import javax.xml.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.text.*;
import javax.xml.parsers.*;

public class XmlReader {

public static void main(String[] args) {

Document d = getDocument(“myxml.xml”);

}

private static Document getDocument(String filename)
{
try{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
DocumentBuilder b = f.newDocumentBuilder();
return b.parse(new InputSource(filename));

}
catch (Exception e)
{

}
return null;
}

Now we have a tree of nodes we can work with in our program. So how do we access this data? Well we created an object called d and so it contains all the nodes made from the elements from the file. We have data types called element and node we can use so we can grab data out of our file called d and assign the values to our own variables like…

Element e = d.getDocumentElement();

Then  to work with a node…

Node n = e.getFirstChild();

And we can display the text within the first child element like…

System.out.println(n);

and advance through them in sibling mode like…

Node n2 = e.getNextSibling();

And so in code it appears like…
import javax.xml.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.text.*;
import javax.xml.parsers.*;

public class XmlReader {

public static void main(String[] args) {

Document d = getDocument(“bible.xml”);
Element e = d.getDocumentElement();
Node n = e.getFirstChild();
//e.getNextSibling();
System.out.println(n);
}

private static Document getDocument(String filename)
{
try{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
DocumentBuilder b = f.newDocumentBuilder();
return b.parse(new InputSource(filename));

}
catch (Exception e)
{

}
return null;
}
}

or more properly

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {

    public static void main(String[] args) {
        
    Document d = getDocument(“bible.xml”);
    Element e = d.getDocumentElement();
    Node n = e.getFirstChild();
    n = n.getNextSibling();
    n = n.getLastChild();
    System.out.println(n);
    }

    private static Document getDocument(String filename)
    {
        try{
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            f.setIgnoringComments(true);
            f.setIgnoringElementContentWhitespace(true);
            DocumentBuilder b = f.newDocumentBuilder();
            return b.parse(new InputSource(filename));
            
        }
        catch (Exception e)
        {
            
        }
        return null;
    }
}

At this point a little review would be good since we actually have to navigate the nodes we’ll need to know how they work.

Let’s define some terms

XML – Extended Markup Language and the format of the file we want to read

DTD – Document Type Definition – Info within the XML file that defines it’s type. Unnecessary, but good coding practice. It will appear on the first line in the format of <xml version=”1.0″ encoding=”UTF-8″?> and then you’ll have a series of elements listed like <!ELEMENT Movies (Movie*)> etc. This gives you an idea of how to properly write the file and the computer what data types are being used for particular elements. pg 767 of Java for Dummies book.

DOM – Document Object Model – One way to process XML files. Reads the whole file into memory and stores it as a node tree allowing the nodes to be manipulated and the file rewritten if desired.

SAX – Simple API for XML – Another way to process XML files. Reads elements of an XML document from file and react to them as they come. Less memory consumption.

DocumentBuilderFactory – an abstract java class (doesn’t have a constructor).

Document – The whole file represented as a virtual file as a data type object with nodes instead of elements. It contains methods like getDocumentElement which returns an Element object that represents the document’s root node.

Node – Just an element in the document builder object. By using methods like Node getFirstChild(); or String getNodeValue() you can access stuff inside the nodes. The syntax would look like… Node n = e.getFirstChild();

Using DocumentBuilderFactory… To make the object from it use….

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();

Now you have an object instance for setting parameters and it contains a method called newInstance to use as a final constructor after parameters are set. These are common parameter lines…

f.setIgnoringComments(true);

f.setIgnoringElementContentWhitespace(true);

f.setValidating(true);

After doing that you can call the constructor method like this…

DocumentBuilder b = f.newDocumentBuilder();

And now you have b as your actual Document Builder which can read XML files. Pain in the ass.

To read a XML file into the builder use the line…

return b.parse(new InputSource(name));

where name is the filename in quotes in the proper directory structure like…

return b.parse(new InputSource(“myxml.xml”));

or whatever. Or you can make a method to read files at will with lines like…

Document d = getDocument(“myxml.xml”));

But you have to use import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*;

if you want access to the Document data type and  the getDocument method since it is not a primitive data type.

Now in our samples to this point we read the node directly to the output screen and that produces a format like [#text: King James Bible] or whatever and that’s clunky. We want to read the actual string within the node and not display the data type most of the time. So we create a String data type variable and read the data and store it there and then display that like this…

String s = n.getNodeValue();
System.out.println(s);

The whole code would look something like…

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {

    public static void main(String[] args) {
        
    Document d = getDocument(“bible.xml”);
    Element e = d.getDocumentElement();
    Node n = e.getFirstChild();
    String s = n.getNodeValue();
    System.out.println(s);
    //NodeList nl = n.getChildNodes();
    //System.out.println(nl);

    
    
    }

    private static Document getDocument(String filename)
    {
        try{
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            f.setIgnoringComments(true);
            f.setIgnoringElementContentWhitespace(true);
            DocumentBuilder b = f.newDocumentBuilder();
            return b.parse(new InputSource(filename));
            
        }
        catch (Exception e)
        {
            
        }
        return null;
    }
}

And we are golden.

The problem is navigation. We can’t move around inside nodes all day. Since each node is basically a tree we can use the nodes to define the data within a custom class. Let’s create a class called bible verse to make a bible reading program. It must be a private static class and contain four data types. The book of the Bible in which it is found, the chapter, the verse, and the text itself. Those datas should be declared variables public within the private class. And we should have a get and set method to return the value or we can just make a constructor that serves that purpose. The class will look like this…

private static class BibleVerse
{
public String book;
public int chapter,verse;
public String text;

public BibleVerse(String book, int chapter, int verse, String text){
this.book = book;
this.chapter = chapter;
this.verse = verse;
this.text = text;
}
}

Our xml file is going to have one verse at this point. We could do it like this…

<Bible>
<vs>
<book>Genesis</book>
<chapter>1</chapter>
<verse>1</verse>
<text>In the beginning God created the heaven and the earth.</text>
</vs>
</Bible>

But that’s rather verbose and the Bible is a long book. That would take forever. Since the order in which the verses appear never changes we can truncate it using a simple numerical tag like…

<Bible>
<vs number = “1”>
<text>In the beginning God created the heaven and the earth.</text>
</vs>
</Bible>

And we can add the verse number and even the chapter number with a semicolon separating them before the text so that it always displays making future programming easier, but that will add length to our file, but it’s an easy solution so we’ll do that. We could also include the book title, but that’s probably too much for our purposes so far.

<Bible>
<vs number = “1”>
<text>1:1 In the beginning God created the heaven and the earth.</text>
</vs>
</Bible>

And now we can add other verses and in fact eliminate the verse number element as it will be unnecessary because of programming we’ll use. We’ll have a pretty efficient xml file after all…

<Bible>
<v1>1:1 In the beginning God created the heaven and the earth.</v2>
<v2>1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.</v2>
</Bible>

Now our program can read the first verse if we write it like this…

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {

public static void main(String[] args) {

Document d = getDocument(“bible.xml”);
Element e = d.getDocumentElement();
//Node n = e.getFirstChild();
//String s = n.getNodeValue();

Node n = e.getFirstChild();
n = n.getNextSibling();
n = n.getFirstChild();
String s = n.getNodeValue();
System.out.println(s);

//n = n.getNextSibling();

//s = n.getNodeValue();
//System.out.println(s);
//NodeList nl = n.getChildNodes();
//System.out.println(nl);

}

private static Document getDocument(String filename)
{
try{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
DocumentBuilder b = f.newDocumentBuilder();
return b.parse(new InputSource(filename));

}
catch (Exception e)
{

}
return null;
}

private static class BibleVerse
{
public String book;
public int chapter,verse;
public String text;

public BibleVerse(String book, int chapter, int verse, String text){
this.book = book;
this.chapter = chapter;
this.verse = verse;
this.text = text;
}
}
}

The problem again is navigation. How do we read the next verse as well? The way it has to be done is a pain, as usual. Basically DOM can only read going forward. Going backwards requires retreating to the parent first and then reading forward again. Each horizontal movement through the verses has to be accomplished by advancing through siblings in a confusing way. The text of the verse itself is a node, as is the ending tag </v1> as is the space between </v1> and <v2> as is <v2> itself and then to get to the text within the <v2></v2> tag set we have to drop into their child. If we don’t drop into the child and still try to advance in siblings eventually we run to the end of the siblings and get an error when the siblings are all used up, whenever that may be. In this case pretty soon since there are only two verses.

Obviously with that much coding you’ll want to write a method to advance one or retreat one verse at a time and a method simply for displaying the current node’s text where you pass the node into the method and use it to display the text. Something like…

private static void showtext(Node n){
System.out.println(“This is the current node’s text.” + n);
}

Then you could call it at any time after the Node was declared and defined with showtext(n);

After showing the first verse by advancing through the siblings three times and then entering the first child we can easily advance like this…

n = n.getNextSibling();//gets to /v1 tag
n = n.getNextSibling();//gets to empty space between v1 and v2 tag sets
n = n.getNextSibling();//gets to v2 tag
n = n.getFirstChild();//drops into the text inside the v2 tag set

Our code is cleaning up nicely…

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {

public static void main(String[] args) {

Document d = getDocument(“bible.xml”); //the document itself
Element e = d.getDocumentElement();//the first tag <bible>

Node n = e.getFirstChild();//this is the blank area before v1

startatfirstverse(n);
advanceoneverse(n);

}

private static void showtext(Node n){
System.out.println(n.getNodeValue());
}

private static Node startatfirstverse(Node n){
n = n.getNextSibling();//enters
n = n.getFirstChild();
showtext(n);
return n;
}

private static Node advanceoneverse(Node n){
//n = n.getParentNode();
n = n.getNextSibling();
n = n.getNextSibling();
n = n.getNextSibling();
n = n.getFirstChild();
System.out.println(n.getNodeValue());
return n;
}
private static Document getDocument(String filename)
{
try{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
DocumentBuilder b = f.newDocumentBuilder();
return b.parse(new InputSource(filename));

}
catch (Exception e)
{

}
return null;
}

private static class BibleVerse
{
public String book;
public int chapter,verse;
public String text;

public BibleVerse(String book, int chapter, int verse, String text){
this.book = book;
this.chapter = chapter;
this.verse = verse;
this.text = text;
}
}
}

Now we need a method to back up one verse. The problem is that DOM reads only forward. So storing the verses’ texts in an array would make things a lot easier to move back and forth wouldn’t it? Of course, and that’s how it would be done normally when reading a text file. XML is really not all that great after all. It’s just a text file right? And lots of tags to increase file size, but the text is tagged and so easier to organize. And it is good to learn xml since android and web applications rely on them so let’s continue…

Now again the problem is navigation. While we can use our routine to advance once, it really doesn’t work more than once, because every time we advance through the file we are starting over.

The easiest solution is to control how many siblings we advance through at will. That means we’ll have to create a variable to pass into the advance routine and write the advance routine to use it to control how many times we advance through siblings. Each target sibling is 2 away from the last one so our method can be rewritten like this…

    private static Node advancexsiblings(Node n, int x){

for(int i = 1; i <x; i++){
n = n.getNextSibling();}

n = n.getFirstChild();
showtext(n);
return n;
}

and then we can call it like this to show the fist, second, and third verses properly…

startatfirstverse(n);
advancexsiblings(n,4);
advancexsiblings(n,6);

It’s a bit clunky of a solution, but it works for now just as long as we use an even number and don’t extend past the end of the siblings. But it helps us understand what we need to do. We need to write a routine that keeps track of where we are at in reading, hold a number for our place, and add or subtract by even numbers, and check to make sure we don’t go over the end of the siblings available.

We’ll need a variable to store our place as a number. int ourplace = 0; should do it. That’s our starting number. We need our method to advance this number every time a new verse is read. This variable must exist outside any method so it has global scope and it must declared static. This code will work to display four verses now.

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {
public static int ourplace = 0;

    public static void main(String[] args) {
        
    Document d = getDocument(“bible.xml”); //the document itself
    Element e = d.getDocumentElement();//the first tag <bible>
    
    Node n = e.getFirstChild();//this is the blank area before v1
    
    startatfirstverse(n);
        
    advancexsiblings(n);
    advancexsiblings(n);
    advancexsiblings(n);
                
    }
    
    private static void showtext(Node n){
        System.out.println(n.getNodeValue());
    }

    private static Node startatfirstverse(Node n){
        n = n.getNextSibling();//enters
        n = n.getFirstChild();
        showtext(n);
        addtoourplace();
        return n;
    }
    private static void addtoourplace(){
        ourplace = ourplace + 2;
    }
    
    private static Node advancexsiblings(Node n){
        int x = ourplace + 2;
        for(int i = 1; i <x; i++){
        n = n.getNextSibling();}
        
        n = n.getFirstChild();
        showtext(n);
        addtoourplace();
        return n;
    }
    private static Document getDocument(String filename)
    {
        try{
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            f.setIgnoringComments(true);
            f.setIgnoringElementContentWhitespace(true);
            DocumentBuilder b = f.newDocumentBuilder();
            return b.parse(new InputSource(filename));
            
        }
        catch (Exception e)
        {
            
        }
        return null;
    }
    
    private static class BibleVerse
    {
        public String book;
        public int chapter,verse;
        public String text;
        
        public BibleVerse(String book, int chapter, int verse, String text){
            this.book = book;
            this.chapter = chapter;
            this.verse = verse;
            this.text = text;
        }
    }
}

But it is still clunky. We need to display maybe 10 verses at all times and we have to check to make sure we don’t overrun the end of the chapter. There are a lot of ways to do this, but rewriting our xml file to include book and chapter divisions will go a long way in helping us out.

A rewritten bible.xml file now named bible2.xml would look like this…

<Bible>
<Book title=”Genesis”>
<C1>
<v1>1:1 In the beginning God created the heaven and the earth.</v1>
<v2>1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.</v2>
<v3>1:3 And God said, Let there be light: and there was light.</v3>
<v4>1:4 And God saw the light, that it was good: and God divided the light from the darkness.</v4>
</C1>
</Book>

<Book title=”Exodus”>
<C1>
</C1>
</Book>

</Bible>

And we can advance through it manually like this…

Document d = getDocument(“bible2.xml”); //the document itself
Element e = d.getDocumentElement();//the first tag <bible>
System.out.println(e.getNodeName());//shows Bible tag

Node n = e.getFirstChild();//this is the root, proceeds the first tag
//System.out.println(n.getNodeName());//no reason to show comment out

n = n.getNextSibling();//first tag
System.out.println(n.getNodeName());//shows the bible tag

n = n.getFirstChild();//advances to the text area in the bible tag which is blank
//System.out.println(n.getNodeName());//no reason to show blank area comment out

n = n.getNextSibling();//advances to the text tag of the first chapter
System.out.println(n.getNodeName());//shows chapter tag

n = n.getFirstChild();//advances to blank area of text in chapter tag
//System.out.println(n.getNodeName());//no reason to show comment out

n = n.getNextSibling();//advances to the verse tag
System.out.println(n.getNodeName());//shows the verse tag

n = n.getFirstChild();//advances to the text of the verse
System.out.println(n.getNodeValue());//shows the text of the verse

Now we are getting somewhere. We can refine the code to…

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {
public static int ourplace = 0;

    public static void main(String[] args) {
        
    Document d = getDocument(“bible2.xml”); //the document itself
    Element e = d.getDocumentElement();//the first tag <bible>
    //System.out.println(e.getNodeName());//shows Bible tag
    
    Node n = e.getFirstChild();//this is the root, proceeds the first tag
    //System.out.println(n.getNodeName());//no reason to show comment out
    
    advancetofirstverse(n);
    displaynextverse(n);
    displaynextverse(n);
    displaynextverse(n);
    
    
                
    }
    private static void displaynextverse(Node n) {
        
        
        n = n.getNextSibling();//first tag
        //System.out.println(n.getNodeName());//shows the bible tag
        
        n = n.getFirstChild();//advances to the text area in the bible tag which is blank
        //System.out.println(n.getNodeName());//no reason to show blank area comment out
        
        n = n.getNextSibling();//advances to the text tag of the first chapter
        //System.out.println(n.getNodeName());//shows chapter tag
        
        n = n.getFirstChild();//advances to blank area of text in chapter tag
        //System.out.println(n.getNodeName());//no reason to show comment out
        
        n = n.getNextSibling();//advances to the verse tag
        //System.out.println(n.getNodeName());//shows the verse tag
        
        int x = ourplace + 2; //this advances by 2 per count at where our place is
        for(int i = 1; i <x; i++){
        n = n.getNextSibling();}
                
        n = n.getFirstChild();//advances to the text of the verse
        //n = n.getFirstChild();
        showtext(n);
        advanceourplace();
        
    }
    
    private static void advanceourplace(){
         ourplace = ourplace + 2;
    }
    
    
    private static void advancetofirstverse(Node n){
        n = n.getNextSibling();//first tag
        //System.out.println(n.getNodeName());//shows the bible tag
        
        n = n.getFirstChild();//advances to the text area in the bible tag which is blank
        //System.out.println(n.getNodeName());//no reason to show blank area comment out
        
        n = n.getNextSibling();//advances to the text tag of the first chapter
        //System.out.println(n.getNodeName());//shows chapter tag
        
        n = n.getFirstChild();//advances to blank area of text in chapter tag
        //System.out.println(n.getNodeName());//no reason to show comment out
        
        n = n.getNextSibling();//advances to the verse tag
        //System.out.println(n.getNodeName());//shows the verse tag
        
        n = n.getFirstChild();//advances to the text of the verse
        showtext(n);//shows the text of the verse
        ourplace = ourplace +1;
    }
    
    private static void showtext(Node n){
        System.out.println(n.getNodeValue());
    }

    
    
    private static Document getDocument(String filename)
    {
        try{
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            f.setIgnoringComments(true);
            f.setIgnoringElementContentWhitespace(true);
            DocumentBuilder b = f.newDocumentBuilder();
            return b.parse(new InputSource(filename));
            
        }
        catch (Exception e)
        {
            
        }
        return null;
    }
        
}

Much more elegant and less cluttered. But let’s write it so you can specify which verse you want to display by way of a method that checks to see if the first verse is being asked for, and whether or not the verse number is even or odd, choosing two different techniques for advancing to the proper node depending on whether the verse whether it is the first verse and even or odd.

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;

public class XmlReader {
public static int ourplace = 0;

    public static void main(String[] args) {
        
    Document d = getDocument(“bible2.xml”); //the document itself
    Element e = d.getDocumentElement();//the first tag <bible>
    //System.out.println(e.getNodeName());//shows Bible tag
    
    Node n = e.getFirstChild();//this is the root, proceeds the first tag
    //System.out.println(n.getNodeName());//no reason to show comment out
    
    
    displayversebynumber(n,4);//n passes the current node in, the integer is the
    //verse you want to display
    
    
                
    }
    
    
    
    private static void displayversebynumber(Node n, int x) {
        n = n.getNextSibling();//first tag
        //System.out.println(n.getNodeName());//shows the bible tag
        
        n = n.getFirstChild();//advances to the text area in the bible tag which is blank
        //System.out.println(n.getNodeName());//no reason to show blank area comment out
        
        n = n.getNextSibling();//advances to the text tag of the first chapter
        //System.out.println(n.getNodeName());//shows chapter tag
        
        n = n.getFirstChild();//advances to blank area of text in chapter tag
        //System.out.println(n.getNodeName());//no reason to show comment out
        
        n = n.getNextSibling();//advances to the verse tag
        //System.out.println(n.getNodeName());//shows the verse tag
        
        //this advances by 2 per count at where our place is
        
        if((x == 1)){
            for(int i = 1; i <x; i++){
            n = n.getNextSibling();
                }
                    
            n = n.getFirstChild();//advances to the text of the verse
            //n = n.getFirstChild();
            showtext(n);}
        
        if((x%2)==0){
            for(int i = 1; i <x; i++){
            n = n.getNextSibling();n = n.getNextSibling();
                }
            n = n.getFirstChild();//advances to the text of the verse
            //n = n.getFirstChild();
            showtext(n);}
        
        
        if ((x !=1)){
            
            
            if((x%2)!=0){
            n.getParentNode();
            for(int i = 1; i <x; i++){
                
            n = n.getNextSibling();n = n.getNextSibling();
                }
            n = n.getFirstChild();//advances to the text of the verse
            //n = n.getFirstChild();
            showtext(n);}
            
        }
        
                    
        
        
        }
        
        
    

    private static void showtext(Node n){
        System.out.println(n.getNodeValue());
    }

    
    
    private static Document getDocument(String filename)
    {
        try{
            DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
            f.setIgnoringComments(true);
            f.setIgnoringElementContentWhitespace(true);
            DocumentBuilder b = f.newDocumentBuilder();
            return b.parse(new InputSource(filename));
            
        }
        catch (Exception e)
        {
            
        }
        return null;
    }
        
}