RSS Reader Tutorial

XTOM Basics

First lets introduce the main components of XTOM Parser. There are only three class that you as a programmer will have to work with. These are :

  • Parser
  • XMLTree
  • Element

Parser - this class is responsible for parsing the XML input stream or string into an XMLTree.

XMLTree - is responsible for containing the reference to the root element. It does not do much right now except for holding the referecen, but this might change in the future.

Element - this is the most important class of XTOM. An Element represents a Node in an XML Document. It does all of the work behind the navigation of the tree of elements. A programmer will do most of his/her work using objects of this class.

RSS Reader Implementation

This example RSS Reader consists of four files. Three of these files, mainly Feed, Channel, and Item are just data holders. RSSReader performs all of the parsing and data retreival work. We will be examining RSSReader in this tutorial. For future reference review the structure of the Feed, Channel and Item before proceeding further. The RSSReader class can be found here.

The RSSReader has two parts. First part is reading the RSS Feed from the web and second one is parsing the feed. Reading the RSS Feed is simple:

String xml = "";

/**
* Reads the URL for the rss feed.
* @param url The url of the RSS Feed.
*/
public RSSReader(URL url) {
  if (url == null) throw new IllegalArgumentException("URL cannot be NULL");

  try {
    URLConnection conn = url.openConnection();
    BufferedReader in = new BufferedReader(new     InputStreamReader(conn.getInputStream()));
    String inputLine;

    while ((inputLine = in.readLine()) != null)
      xml += inputLine ;

    in.close();
  } catch (Exception e){
    e.printStackTrace();
    throw new IllegalStateException(e.getMessage());
  }
}

The above segment of code opens a URL connection and reads the data into a 'xml' String instance which is a class level variable.

Next the RSSReader implements the parsing of the Feed in the parse() method:

// create new XTOM Parser
Parser p = new Parser(xml);

// parse the XML and create an XMLTree.
XMLTree tree = p.parse();

// now check if the root element is <rss>
Element root = tree.getRootElement();

if (root.getName() != "rss")
throw new RuntimeException("The root element was not <rss>. Can't parse this RSS Feed");

// now we are ready to parse the rss feed. By calling the parseFeed method.
return parseFeed(root);


The parse() method does three things. First it creates a new Parser used for parsing the XML data. Then actual pasing of the XML is done by calling the parse() method of the Parser object. This method returns an instance of the XMLTree class. The last thing that is done is the check to see that the XML data is the actual RSS feed.

The last statement calls the parseFeed method passing the reference of the root element. The iimportant thing to keep in mind here is that the XMLTree only holds the reference to the Root Element of the XML Tree.

Next we will be navigating the XML Element Tree using methods 'getElementByPath' and 'getElementsByPath' and populating the Feed, Channel and Item instances. This is done in the parseFeed(rootElement) method:

// create a new Feed instance
Feed feed = new Feed();

if (root.getAttribute("version") != null)
  feed.setVersion(root.getAttribute("version").getValue());

// get the channel element
Element ch = root.getElementByPath("channel");

// get the title element of the Channel,
// get the value and set the title of the Channel.
Channel channel = new Channel(root.getElementByPath("channel/title").getValue());

First a new instance of Feeder is created. Then using the getAttribute(name) method of the Element class we try to extract the version attribute from the root Element. If the attribute is not present then the attribute is ignored.

Next we get the Element corresponding to the 'channel' node of the RSS feed by calling the 'getElementByPath('path/to/element') method of the root Element. In this case we only need to specify the name 'channel'. Remember: The path is always relative to the element you are calling getElementByPath method from.

Once we have the Element for the 'channel' node we can start filling up the Channel and Item iinstances for the feed. First we create the Channel class instance and give it a title by extracting the title from the 'title' node of the 'channel' node. We do this by calling the getElementByPath('channel/title') on the root element. The path specified is relative to the root element.

Next we extract the data for the Channel instance:

// populate the RSS Channel
channel.setCopyright(ch.getElementByPath("copyright").getValue());
channel.setDescription(ch.getElementByPath("description").getValue());
channel.setGenerator(ch.getElementByPath("generator").getValue());
channel.setLastBuildDate(ch.getElementByPath("lastBuildDate").getValue());
channel.setLink(ch.getElementByPath("link").getValue());
channel.setManagingEditor(ch.getElementByPath("managingEditor").getValue());
channel.setWebMaster(ch.getElementByPath("webMaster").getValue());

Here the extraction of the data for the Channel instnace is done through the Element representing the 'channel' node in the XML. For each property of the Channel class an Element representing that property is extracted through the channel element and its value stored in the corresponding property.

The second last thing that needs to be done is the extraction of Item nodes:

// while adding them to the Channel object.
Element[] items = ch.getElementsByPath("item");
// a Collection of items;
LinkedList itemsList = new LinkedList();

for (int i = 0; i < items.length; i++){
  Item item = new Item(items[i].getElementByPath("title").getValue());
  item.setCategory(items[i].getElementByPath("category").getValue());
  item.setDescription(items[i].getElementByPath("description").getValue());
  item.setGuid(items[i].getElementByPath("guid").getValue());
  item.setPermaLink(items[i].getElementByPath("guid").getAttribute("isPermaLink").getValueAsBoolean());
  item.setPubDate(items[i].getElementByPath("pubDate").getValue());

  // add this item to the list.
  itemsList.add(item);
}

Here we retreive the list of Elements corresponding to the list of 'item' nodes in the RSS feed by calling a 'getElementsByPath('item') on the channel Element instance. This returns an array of Elements representing the 'items'. Then we go through each element and extract the information, which is then stored as properties of an Item object.

One thing to notice here the is use of getValueAsBoolean method. An Elelement class can extract the data and parse it as any of the primitive types or a String. Helper methods are provided by default.

Now once the data extraction is complete all that is left to do is add teh List of items to the Channel instance, and the Channel instance to the Feed instance and return the Feed instance.

// add the list of items to the Channel, and add the Channel to the feed.
channel.setItems(itemsList);
feed.setChannel(channel);

// we are done, return the feed.
return feed;

That's it we're done. Check out more tutorials for advanced topics such how XTOM Exceptions are thrown and when to handle them.

copyright © 2003-2004. Taras Danylak
Best viewed with 1024x768 screen resolution.
Made with Firefox users in mind.