Thursday, March 10, 2011

Crazy date conversion in java for RSS feeds

Here's a solution I hated at the time--and still really hate.

This was a "solution" a few years back while processing RSS feeds in Java for their date/time format. The goal was to convert the various time formats found in random RSS feeds into the common epoch format.



Date/Time formats in delivered via RSS feeds can be in pretty much any format. And if the date needs be converted to a general common form (i.e. seconds since epoch so that RSS different feeds can be compared by their published date), something like the code below was the necessary evil...

try {
  pub_date = sb.substring(start, match.start());
  pub_date = clean_cdata(pub_date);
  pub_date = pub_date.trim();
  SimpleDateFormat RFC822 = 
    new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss z");
  time = new Long(RFC822.parse(pub_date).getTime());
}
catch (ParseException pe) {
  //attempting second format parse:
  try {
    //Fri Mar 2 12:00 EST 2007
    SimpleDateFormat secondtry = 
      new SimpleDateFormat("EEE MMM d HH:mm z yyyy");
    pub_date = sb.substring(start, match.start());
    time = new Long(secondtry.parse(pub_date).getTime());
  }
  catch (ParseException pe2) {
    try {
      //try different format: 2008-08-16T12:52:21-04:00
      SimpleDateFormat thirdtry = 
        new SimpleDateFormat("dd MMM yyyy HH:mm:ss z");
      pub_date = sb.substring(start, match.start());
      time = new Long(thirdtry.parse(pub_date).getTime());
    }
    catch (ParseException pe3) {
      try {
        //try different format: 15 aug 2008 21:14:03 pdt
        SimpleDateFormat fourthtry = 
          new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
        pub_date = sb.substring(start, match.start());
        time = new Long(fourthtry.parse(pub_date).getTime());
      }
      catch (ParseException pe4) {
        if (dbgout != null) {
          dbgout.println(Common.getDateTime() + 
            "ParseException thrown: " + pe4);
        }
      }
    }
  }
}

Nice. Cascading exception statements. And it could potentially cascade on and on with each format variation...

I can almost see how this happened with this library. The expectation was that by using the SimpleDateFormat object you would know the format of the date string you trying to convert. And in this case that's not going to work.

It's these cascading exceptions that I'm not fond of--a conditional test is better, but in this case kinda pointless. Throwing exceptions is expensive too. I suppose I could test the format before trying to instantiate the object, but then you are really are just duplicating code outside of the SimpleDateFormat object--and what's the point of that? But really, this format check should be encapsulated in the SimpleDateFormat object. If it were it might look something like:


try {
    //Fri Mar 2 12:00 EST 2007
    SimpleDateFormat secondtry = 
      new SimpleDateFormat();
    pub_date = sb.substring(start, match.start());
    time = new Long(secondtry.parse(pub_date).getTime());
  }
  catch (ParseException pe) {

Where the constructor would attempt a change without any format argument. An exception would be thrown on a failed attempted conversion. The constructor with the format would still be supported, if the exact format beforehand was known. Obviously to support the current empty default constructor the local default format would be attempted first.

And hopefully these are ordered from most common to least common in occurrence. Alas, in the end software engineering is often about getting it to work in a reasonable fashion.

NOTE: Good news--since this work was done there now is a joda-time library which looks to encapsulate some of ugliness below. Downside is that it's not part of the standard distribution.

No comments:

Post a Comment