Got more questions? Find advice on: ASP | SQL | Regular Expressions | Windows
in Search
Welcome to XmlAdvice Sign in | Join | Help

Kirk Allen Evans' XML Blog

.NET From a Markup Perspective

Well-Formed XML in .NET, and Postel's Rebuttal

When I first read Mark Pilgrim's arguments about Postel's law and consuming malformed RSS feeds, I disagreed.  XML has so few basic rules that it should not be a stretch for RSS feeds to be well-formed.  After some convincing arguments from Mark, I came to the conclusion that SharpReader et al should be liberal with what they consume.  This is in the limited context of the feeds that I consume today, including weblogs.asp.net and similar development-focused feeds.

Tim Bray then counters the argument showing that Postel's law does have exceptions, citing an RSS feed from a financial institution.  What a great idea... get a daily RSS feed from your bank that shows your financial transactions.  Of course, there are many other issues that would have to be resolved, such as authentication issues and not caching sensitive content from secure feeds.  But it is very feasible to see this as an option from a financial institution soon.

At any rate, Tim successfully convinced me that aggregators should not have the dubious task of “correcting“ feeds or displaying feeds that are not well-formed. 

Yet I still have a concern about Tim's post, concerning XmlWriter and well-formedness:

PostScript: I just did the first proof on the first draft of this article. It had a mismatched tag and wasn’t well-formed. The publication script runs an XML parser over the draft and it told me the problem and I fixed it. It took less time than writing this postscript.

PPS: Putting My Money Where My Mouth Is - If you’re programming in .NET, there’s a decent-looking XmlWriter class.

The problem is that it is quite possible to emit content using the XmlWriter that is not well-formed. From MSDN online's “Customized XML Writer Creation“ topic:

  • The XmlTextWriter does not verify that element or attribute names are valid.
  • The XmlTextWriter writes Unicode characters in the range 0x0 to 0x20, and the characters 0xFFFE and 0xFFFF, which are not XML characters.
  • The XmlTextWriter does not detect duplicate attributes. It will write duplicate attributes without throwing an exception.

Even using the custom XmlWriter implementation that is mentioned in the MSDN article does not remove the possibility of a developer circumventing the writing process:

using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Collections;

namespace MyWriter.ConformWriter
{
  class Class1
  {
    [STAThread]
    static void Main(string[] args)
    {
      System.IO.StringWriter sw = new System.IO.StringWriter();
      ConformWriter writer = new ConformWriter(sw);
      writer.WriteStartDocument();
      writer.WriteStartElement("rss");
      writer.WriteAttributeString("version","2.0");
      writer.WriteStartElement("channel");
      writer.Flush();
      Console.WriteLine(sw.ToString());
    }
  }

  public class ConformWriter : XmlTextWriter
  {
    internal void CheckUnicodeString(String value)
    {
      for (int i=0; i < value.Length; ++i)
      {
        if (value[i] > 0xFFFD)
        {
          throw new Exception("Invalid Unicode");
        }
        else if (value[i] < 0x20 && value[i] != '\t' & value[i] != '\n' & value[i] != '\r')
        {
          throw new Exception("Invalid Xml Characters");
        }
      }
    }

    public ConformWriter(System.IO.TextWriter writer):base(writer){}

    public ConformWriter(String fileName, Encoding encoding):base(fileName, encoding) {}

    public override void WriteString(String value)
    {
      CheckUnicodeString(value);
      base.WriteString(value);
    }

    public override void WriteStartElement(string prefix, string localName, string ns)
    {
      base.WriteStartElement(prefix, XmlConvert.EncodeLocalName(localName), ns);
    }
  }
}

The result is:

<?xml version="1.0" encoding="utf-16"?>

<RSS version="2.0">

<CHANNEL>

 

While using the XmlWriter (more likely, the ConformWriter mentioned here) can go far to assisting the developer create well-formed XML, it does not alleviate the possibility of bugs in the creation process (or, less PC... the “Bozo Factor”). 

The System.Xml.XmlDocument type in .NET goes further to ensuring well-formed XML creation.  In fact, I will throw a stake in the ground and say that you cannot create a document using XmlDocument in .NET that is not well-formed.

Sponsor
Published Tuesday, January 13, 2004 4:50 PM by kaevans
Filed under: ,

Comments

 

kaevans said:

Dang it, Kirk, you've got me all twitterpatted about XPathDocument2 with this one.
January 13, 2004 9:25 PM
 

TrackBack said:

January 13, 2004 9:04 PM
 

TrackBack said:

April 19, 2004 8:19 PM
 

TrackBack said:

April 19, 2004 8:26 PM
 

TrackBack said:

April 19, 2004 11:11 PM
Anonymous comments are disabled

This Blog

Syndication

News

Looking for a place to talk about XML? Tired of the "main feed police" cracking about your interests in football and politics? Sign up for a free web log on XMLAdvice.com.