< Reading & Writing XML 2 | Main | Reading & Writing XML 4 >


 

 

Reading and Writing XML 3

 

 

What we have in this page?

 

  1. Verifying Well-Formed XML

  2. Handling Attributes

  3. Parsing XML with Validation

 

 

Verifying Well-Formed XML

 

XML that is correctly constructed is called well-formed XML, which means that elements will be correctly nested and that every element tag will have a matching end element tag. If the XmlTextReader encounters badly formed XML, it will throw an XmlException to tell you what it thinks is wrong. As with all parsing errors, the place where it’s reported might be some distance from the real site of the error.

 

Handling Attributes

 

XML elements can include attributes, which consist of name/value pairs and are always string data. In the sample XML file, the volcano element has a name attribute, and the height element has value and unit attributes.

 

10.     To process the attributes on an element, add code to the Element case in the switch statement so that it looks like this:

case XmlNodeType::Element:

   Console::WriteLine(L"-> Element node, name={0}", rdr->Name);

   if (rdr->AttributeCount > 0)

   {

         Console::Write("  ");

         while (rdr->MoveToNextAttribute())

         Console::Write(" {0}={1}", rdr->Name, rdr->Value);

         Console::WriteLine();

   }

   break;

The AttributeCount property will tell you how many attributes an element has, and the MoveToNextAttribute method will let you iterate over the collection of elements, each of which has a name and a value. Alternatively, you can use the MoveToAttribute function to position the reader on a particular attribute by specifying either a name or a zero-based index. Attributes are read along with the element node that they’re part of. When reading attributes, you can use the MoveToElement method to position the reader back to the parent element.

 

11.     When you run the code, you should see output similar to this for nodes that have attributes:

-> Element node, name=height

   value=3794 unit=m

 

 

XmlTextReader program output example with name and value attributes

 

Parsing XML with Validation

 

XML documents can be checked for validity in a number of ways, and the XmlValidatingReader lets you validate XML using the three most common standards:

  1. DTDs.

  2. W3C schemas.

  3. XDR schemas.

XmlValidatingReader has the same set of methods and properties as XmlTextReader, with a few additional properties to support validation, which are listed in the following table. However this class already obsolete in .NET 2.0. Expect compiler warning when using this class.

 

Property

Description

CanResolveEntity

Returns a value indicating whether this reader can resolve entities. XmlValidatingReader always returns true.

Depth

Gets the depth of the current node in the XML document.

EntityHandling

Specifies the type of entity handling: whether to expand all entities (the default) or expand character entities and return general entities as nodes.

Reader

A pointer to the underlying XmlReader.

Schemas

Returns the collection of schemas used for validation.

SchemaType

Gets a schema type object for the element currently being read. This property returns a null reference if it’s called when validation is performed using a DTD.

ValidationType

Specifies the type of validation to perform: none, DTD, Schema, XDR, or Auto. The default is Auto, which will determine the type of validation required from data in the file.

 

Table 9.

 

There’s one extra method over and above those supported by XmlTextReader, ReadTypedValue, which gets a .NET common language runtime (CLR) type corresponding to a type in validated XML. You can create an XmlValidatingReader to parse XML document fragments from a string or a stream, but it’s most common to base the validating reader on an underlying XmlTextReader object.

The following exercise modifies the XmlTextReader program to validate the XML as it’s parsed. To perform validation, you need to have a DTD or a schema to validate against. Here’s a DTD for the volcano XML data, which have been stored in a file named mydtd.dtd and put it under the project’s debug directory.

<!ELEMENT geology (volcano)+>

<!ELEMENT volcano (location,height,type,eruption+,magma,comment?)>

<!ATTLIST volcano name CDATA #IMPLIED>

<!ELEMENT location (#PCDATA)>

<!ELEMENT height EMPTY>

<!ATTLIST height value CDATA #IMPLIED unit CDATA #IMPLIED>

<!ELEMENT type (#PCDATA)>

<!ELEMENT eruption (#PCDATA)>

<!ELEMENT magma (#PCDATA)>

<!ELEMENT comment (#PCDATA)>

mydtd.dtd, a document type definition - DTD file content

 

 

 

 

DTD file put under the project debug directory

 

We have used a DTD for simplicity, but a schema can be used in exactly the same way.

 

12.     Edit the myxml.xml file to add a DOCTYPE reference at the top of the file.

<?xml version="1.0" ?>

<!DOCTYPE geology SYSTEM "mydtd.dtd">

<!-- Volcano data -->

Adding the DOCTYPE to the XML file

 

If you check the sample XML document against the DTD, you’ll notice that there’s a problem. The element ordering for the second volcano, Hekla, is location-type-height rather than the location- height-type order demanded by the DTD. So, when you parse this XML with validation, you’d expect a validation error from the parser.

 

13.     Add a using declaration to the top of the CppXmlTextReader.cpp, as shown here:

using namespace System::Xml::Schema;

 

Adding a using declaration to the top of the CppXmlTextReader.cpp

 

Some of the classes and enumerations are part of the System::Xml::Schema namespace and the inclusion of the using declaration will make it easier to refer to them in code.

 

14.     Create an XmlValidatingReader based on the existing XmlTextReader, like the following in the try block:

// Create the validating reader and set the validation type

XmlValidatingReader^ xvr = gcnew XmlValidatingReader(rdr);

xvr->ValidationType = ValidationType::Auto;

Creating an XmlValidatingReader based on the existing XmlTextReader.

 

The constructor for the XmlValidatingReader takes a reference to the XmlTextReader, which it uses to perform the basic parsing tasks. The last line sets the validation type to Auto, which means that the XmlValidatingReader will decide for itself what type of validation to use, based on the references to DTDs or schemas it finds in the XML document. Take note that it’s not really necessary to set the ValidationType in this case because Auto is the default, but we included it to show you how to control the validation.

When you compile at this stage the following warning should be expected.

1>------ Build started: Project: CppXmlTextReader, Configuration: Debug Win32 ------

1>Compiling...

1>CppXmlTextReader.cpp

1>.\CppXmlTextReader.cpp(31) : warning C4947: 'System::Xml::XmlValidatingReader' : marked as obsolete

1>        Message: 'Use XmlReader created by XmlReader.Create() method using appropriate XmlReaderSettings instead. http://go.microsoft.com/fwlink/?linkid=14202'

1>.\CppXmlTextReader.cpp(31) : warning C4947: 'System::Xml::XmlValidatingReader' : marked as obsolete

1>        Message: 'Use XmlReader created by XmlReader.Create() method using appropriate XmlReaderSettings instead. http://go.microsoft.com/fwlink/?linkid=14202'

1>.\CppXmlTextReader.cpp(31) : warning C4996: 'System::Xml::XmlValidatingReader::XmlValidatingReader' was declared deprecated

1>        c:\windows\microsoft.net\framework\v2.0.50727\system.xml.dll : see declaration of 'System::Xml::XmlValidatingReader::XmlValidatingReader'

1>Linking...

1>Embedding manifest...

1>Build log was saved at "file://i:\vc2005project\CppXmlTextReader\CppXmlTextReader\Debug\BuildLog.htm"

1>CppXmlTextReader - 0 error(s), 3 warning(s)

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

15.     Edit all the code that parses the XML to use the XmlValidatingReader xvr rather than the XmlTextReader rdr, as follows (in try block):

// Read nodes from the XmlValidatingReader

while (xvr->Read())

{

      switch (xvr->NodeType)

      {

            case XmlNodeType::XmlDeclaration:

                  Console::WriteLine(L"-> XML declaration");

                  break;

            case XmlNodeType::Document:

                  Console::WriteLine(L"-> Document node");

                  break;

            case XmlNodeType::Element:

                  Console::WriteLine(L"-> Element node, name={0}", xvr->Name);

                  break;

            case XmlNodeType::EndElement:

                  Console::WriteLine(L"-> End element node, name={0}", xvr->Name);

                  break;

            case XmlNodeType::Text:

                  Console::WriteLine(L"-> Text node, value={0}", xvr->Value);

                  break;

            case XmlNodeType::Comment:

                  Console::WriteLine(L"-> Comment node, name={0}, value={1}", xvr->Name, xvr->Value);

                  break;

            case XmlNodeType::Whitespace:

                  break;

            default:

                  Console::WriteLine(L"** Unknown node type");

                  break;

      }

}

Because XmlValidatingReader provides a superset of the XmlTextReader functionality, it’s a simple matter to swap between the two.

 

16.     If you now build and run the program, it should throw an exception when it finds the invalid element ordering in the document, plus several more lines of stack trace information.

System.Xml.Schema.XmlSchemaValidationException: The element 'volcano' has invalid child element 'type'. List of possible elements expected: 'height'.

   at System.Xml.XmlValidatingReaderImpl.InternalValidationCallback(Object sender, ValidationEventArgs e)

   at System.Xml.Schema.XmlSchemaValidator.SendValidationEvent(ValidationEventHandler eventHandler,

Object sender, XmlSchemaValidationException e, XmlSeverityType severity)

   at System.Xml.Schema.XmlSchemaValidator.ElementValidationError(XmlQualifiedName name, ValidationState context, ValidationEventHandler eventHandler, Object sender, String sourceUri, Int32 lineNo, Int32 linePos, Boolean getParticles)

   at System.Xml.Schema.DtdValidator.ValidateChildElement()

   at System.Xml.Schema.DtdValidator.ValidateElement()

   at System.Xml.Schema.DtdValidator.Validate()

   at System.Xml.XmlValidatingReaderImpl.ProcessCoreReaderEvent()

   at System.Xml.XmlValidatingReaderImpl.Read()

   at System.Xml.XmlValidatingReader.Read()

   at main() in i:\vc2005project\cppxmltextreader\cppxmltextreader\cppxmltextreader.cpp:line 35

Note that if you have typed in the XML file using different formatting, you might get different line and character numbers. By default, the parser will throw an exception if it finds a validation error, and if you don’t handle it, the program will terminate. You can improve on this error handling by installing an event handler. The parser fires a ValidationEvent whenever it finds something to report to you, and if you install a handler for this event, you’ll be able to handle the validation errors yourself and take appropriate action.

 

17.     Event handler functions must be members of a managed class, so create a new class to host a static handler function. Add this code before the main() function:

// Validation handler class

ref class ValHandler

{

   public:

   static void ValidationHandler(Object* pSender, ValidationEventArgs* pe)

   {

         Console::WriteLine(L"Validation Event: {0}", pe->Message);

   }

};

 

Creating a new class to host a static handler function

 

The ValHandler class contains one static member, which is the handler for a ValidationEvent. As usual, the handler has two arguments: a pointer to the object that fired the event, and an argument object. In this case, the handler is passed a ValidationEventArgs object that contains details about the parser validation error. This sample code isn’t doing anything except printing the error message, but in practice, you’d decide what action to take based on the Severity property of the ValidationEventArgs object.

 

18.     Link up the handler to the XmlValidatingReader in the usual way:

// Set the handler

xvr->ValidationEventHandler += gcnew ValidationEventHandler(&ValHandler::ValidationHandler);

 

Linking up the exception function handler to the XmlValidatingReader

 

Make sure that you set up the handler before you call Read to start parsing the XML.

 

19.     Build and run the program. This time, you won’t get the exception message and stack trace, but you will see the messages printed out from the event handler as it finds validation problems.

I:\vc2005project\CppXmlTextReader\debug>cppxmltextreader myxml.xml

Xml reader is created...

-> XML declaration

** Unknown node type

-> Comment node, name=, value= Volcano data

-> Element node, name=geology

-> Element node, name=volcano

[Trimmed]

-> End element node, name=volcano

-> Element node, name=volcano

-> Element node, name=location

-> Text node, value=Iceland

-> End element node, name=location

Validation Event: The element 'volcano' has invalid child element 'type'. List of possible elements expected: 'height'.

-> Element node, name=type

-> Text node, value=stratovolcano

-> End element node, name=type

[Trimmed]

-> Text node, value=calcalkaline

-> End element node, name=magma

-> Element node, name=comment

-> Text node, value=The type is actually intermediate between crater row and stratovolcano types

-> End element node, name=comment

-> End element node, name=volcano

-> Element node, name=volcano

-> Element node, name=location

-> Text node, value=Hawaii

-> End element node, name=location

Validation Event: The element 'volcano' has invalid child element 'type'. List of possible elements expected: 'height'.

-> Element node, name=type

-> Text node, value=shield

-> End element node, name=type

-> Element node, name=height

-> Element node, name=eruption

-> Text node, value=1984

-> End element node, name=eruption

-> Element node, name=magma

-> Text node, value=basaltic

-> End element node, name=magma

-> End element node, name=volcano

-> End element node, name=geology

 

I:\vc2005project\CppXmlTextReader\debug>

 

 

 

 

20.     Correct the ordering of the elements in the XML file, and run the program again. You shouldn’t see any validation messages this time through. The corrected myxml.xml is shown below.

<?xml version="1.0" encoding="utf-8" ?>

<!DOCTYPE geology SYSTEM "mydtd.dtd">

<!-- Volcano data -->

<geology>

    <volcano name="Erebus">

        <location>Ross Island, Antarctica</location>

        <height value="3794" unit="m"/>

        <type>stratovolcano</type>

        <eruption>constant activity</eruption>

        <magma>basanite to trachyte</magma>

    </volcano>

    <volcano name="Hekla">

        <location>Iceland</location>

        <height value="1491" unit="m"/>

        <type>stratovolcano</type>

        <eruption>1970</eruption>

        <eruption>1980</eruption>

        <eruption>1991</eruption>

        <magma>calcalkaline</magma>

        <comment>The type is actually intermediate between crater row and stratovolcano types</comment>

    </volcano>

    <volcano name="Mauna Loa">

        <location>Hawaii</location>

        <height value="13677" unit="ft"/>

        <type>shield</type>

        <eruption>1984</eruption>

        <magma>basaltic</magma>

    </volcano>

</geology>

 

I:\vc2005project\CppXmlTextReader\debug>cppxmltextreader myxml.xml

Xml reader is created...

-> XML declaration

** Unknown node type

-> Comment node, name=, value= Volcano data

-> Element node, name=geology

-> Element node, name=volcano

-> Element node, name=location

-> Text node, value=Ross Island, Antarctica

-> End element node, name=location

-> Element node, name=height

-> Element node, name=type

-> Text node, value=stratovolcano

-> End element node, name=type

-> Element node, name=eruption

-> Text node, value=constant activity

-> End element node, name=eruption

-> Element node, name=magma

-> Text node, value=basanite to trachyte

-> End element node, name=magma

-> End element node, name=volcano

-> Element node, name=volcano

-> Element node, name=location

-> Text node, value=Iceland

-> End element node, name=location

-> Element node, name=height

-> Element node, name=type

-> Text node, value=stratovolcano

-> End element node, name=type

-> Element node, name=eruption

-> Text node, value=1970

-> End element node, name=eruption

-> Element node, name=eruption

-> Text node, value=1980

-> End element node, name=eruption

-> Element node, name=eruption

-> Text node, value=1991

-> End element node, name=eruption

-> Element node, name=magma

-> Text node, value=calcalkaline

-> End element node, name=magma

-> Element node, name=comment

-> Text node, value=The type is actually intermediate between crater row and stratovolcano types

-> End element node, name=comment

-> End element node, name=volcano

-> Element node, name=volcano

-> Element node, name=location

-> Text node, value=Hawaii

-> End element node, name=location

-> Element node, name=height

-> Element node, name=type

-> Text node, value=shield

-> End element node, name=type

-> Element node, name=eruption

-> Text node, value=1984

-> End element node, name=eruption

-> Element node, name=magma

-> Text node, value=basaltic

-> End element node, name=magma

-> End element node, name=volcano

-> End element node, name=geology

 

I:\vc2005project\CppXmlTextReader\debug>

 

 

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7

 

 


< Reading & Writing XML 2 | Main | Reading & Writing XML 4 >