< C++ .Net System Prog. 1 | Main | C++ .Net System Prog. 3 >


 

 

Early Stages of the C++ .Net 16

(Managed Extensions for C++)

 

 

 

The following are the topics available in this page.

  1. Metadata Directory

  2. Reading Metadata

 

Metadata Directory

 

One field in the CLI header is the RVA for the metadata directory, which gives access to all the metadata used by the assembly. The metadata directory starts with the string BSJB and has information about the version of the metadata and the version of the .NET Framework (as a string) that was used to create the assembly (documented as the IMAGE_COR20_HEADER structure in corhdr.h). After the header, the directory has information about the metadata streams that are used in the assembly. A metadata stream is a table holding information used by your code. The Microsoft (Machine Symbol) intermediate language (MSIL) code in your types uses metadata tokens to identify elements that can be held in metadata (such as type names, member names, and user strings). A metadata token identifies which stream the metadata is held in and the location of the metadata in the stream. The various metadata streams that can be generated by the C++ compiler are given in Table 1.

 

Stream

Description

#~

Optimized stream of the metadata tables .NET also defines a stream called #-, which is a non-optimized stream of metadata tables. Current tools only generate the optimized stream.

#Blob

Holds internal metadata binary objects

#Guid

Holds GUIDs

#Strings

Holds the names of metadata items

#US

User Strings, holds user-defined strings

 

Table 1:   Metadata Streams

 

When your code is compiled, the compiler will generate a metadata directory in the .obj file, and when the assembly is created, the linker will amalgamate the metadata directories from the various .obj files in your project. The compiler will add entries to the #~ stream for metadata items such as class definitions, class members, and references to externally defined classes. The actual names of these items are stored in the #Strings stream. You can see the entries that are stored in these streams by turning on the display of tokens in ILDASM. You turn on the display of tokens using the /tokens switch, or using the /advanced switch and selecting Show Token Values from the View menu.

 

 

 

Invoking the ILDASM

 

Figure 7

 

ILDASM view menus

 

Figure 8

 

Metadata tokens identify both the stream and the location of the item in the stream. The top byte identifies the metadata table (one of the CorTokenType enumerated types documented in the corhdr.h). All of these tables except mdtString can be found in the #~ stream; the mdtString items are located in the #US stream. The lower three bytes of tokens for items in the #~ stream give the record ID (RID) of the item in the stream. In contrast, the lower three bytes of tokens for items in the #US stream are an offset from the beginning of the stream of the item. For example, the code:

Test* t = new Test;

String* str1 = S"Test1";

String* str2 = S"Test2";

will generate the following MSIL:

newobj instance void Test/* 02000003 */::.ctor() /* 06000008 */

stloc.2

ldstr "Test1" /* 70000001 */

stloc.1

ldstr "Test2" /* 7000000D */

stloc.0

The string Test1 is stored as the first item in the #US stream. (All streams are indexed from 1.) The string is stored as a Unicode string (0xa bytes long) prefixed with the length of the entire entry. Metadata uses a compressed format for the length of the string so that strings with a short length will use a single byte for the length, which is the case for the Test1 string: it has a length of 0x0b (0xa + 1). This layout means that the second string in the #US stream will be at location 0xd, which is the reason that the string Test2 has the token 0x7000000d (a top byte of 0x70 is a user string). Here is the actual data held in the #US stream:

71c8 00 00 00 00 00 0b 54 00  ......T.

71d0 65 00 73 00 74 00 31 00  e.s.t.1.

71d8 00 0b 54 00 65 00 73 00  ..T.e.s.

71e0 74 00 32 00 00 00 00 00  t.2.....

The class Test is defined in this assembly, and it is the third definition, whereas the constructor is the eighth method defined in the assembly. The definition of a type will be accessed through the #~ stream. We won’t go into the fine details of how to obtain the definition of the type because the type is documented in the ECMA specification, which you can find in the \Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Tool Developers Guide\docs\Partition II Metadata.doc file for Visual C++ .Net 2003.

 

.Net CLI documentations

 

Figure 9

 

The various definitions that you can have in an assembly will be accessed through a table. The current specification defines 35 different tables and the format of those tables. The header to the #~ stream contains a 64-bit bit mask where each bit specifies whether a corresponding table is used in the assembly. The header is followed by an array of 32bit integers giving the number of entries in each of the tables. This data is then followed by the actual table. The metadata table for each type of metadata is different, so each entry for the Module table has 10 bytes, and each entry for the TypeDef table (used to give information about type definitions) has 20 bytes. The ECMA specification contains the schema for each table. A type definition contains an index into the #Strings stream for the name of the type and the namespace. The definition also gives an index into the Field and Method tables for the fields and methods implemented by the type. Each entry in the Field table has the RVA of the implementation of the method (the MSIL for the method).

 

Reading Metadata

 

The physical layout of assemblies and metadata is documented in the ECMA specification. The ECMA specification also documents the format of each IL opcode, so if you choose, you can write unmanaged (or managed) code to read an assembly, get information about the types implemented in the assembly and the types that the assembly uses, and dump the IL of those types. Of course, this process would be rather tedious, so Microsoft has provided two APIs to get access to metadata: reflection and the unmanaged metadata API. The reflection API is a high-level managed API. It presents a logical view of metadata and is accessible from any .NET language. Reflection is concerned with metadata, the description of types, so it does not give access to MSIL. However, the API does allow you to invoke a method of a type, as shown in the following code:

// reflinvoke.cpp

String* str = "Hello";

Type* t = str->GetType();

Type* params[] = new Type*[0];

 

// Get the overload of ToUpper that has no parameters.

MethodInfo* mi = t->GetMethod("ToUpper", params);

// Invoke the method. We know that the return value is a String*.

String* str2 = static_cast<String*>(mi->Invoke(str, 0));

Console::WriteLine(str2);

The unmanaged API is far closer to the physical layout of metadata in the PE file. This API is documented in the Tools Developers Guide supplied with the .NET Framework SDK (the Metadata Unmanaged API.doc and the Assembly Metadata Unmanaged API.doc files in the \Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Tool Developers Guide\docs directory for Visual C++ .Net 2003) and is provided through COM objects. The interfaces and CLSIDs for these objects are declared in cor.h, and the types and enumerations used to describe metadata are declared in corhdr.h. The .NET Framework SDK comes with an example named metainfo that shows how to use these interfaces.

 

Tool developer guides .NET sample programs

 

Figure 10

 

This tool is also useful for probing into how metadata is stored in assemblies. The /heaps switch for metainfo dumps the entries in the #Strings, #US, and #Blobs streams, and it will give information about the metadata tables that are present in the #~ stream. The /raw switch will dump the entries in each table in the #~ stream and the schema of each table. Some of the entries in a metadata table will be an index into one of the other streams or into another table, but the information provided by this tool gives you enough information to determine the items in the assembly. The metadata API is straightforward to use. The first stage is to access the metadata dispenser object, as shown here:

// dumptypes.cpp

IMetaDataDispenserEx* pDispenser;

CoCreateInstance(CLSID_CorMetaDataDispenser, NULL, CLSCTX_INPROC_SERVER, IID_IMetaDataDispenserEx, (void**)&pDispenser);

This is the gateway to the other metadata APIs. There are three metadata interfaces: IMetaDataImport and IMetaDataAssemblyImport and the lower-level interface IMetaDataTables. These are implemented by a separate object named the scope object, as the following code shows:

// dumptypes.cpp

IMetaDataImport* pImport;

pDispenser->OpenScope(strFile, 0, IID_IMetaDataImport, (LPUNKNOWN*)&pImport);

The OpenScope method returns an interface for an assembly in a file, and there is a version that returns the interface for an in-memory assembly. You can request IMetaDataImport, ImetaDataAssemblyImport, or IMetaDataTables from this method because they are all implemented on the scope object. You can get information about an individual item through its metadata token. The methods on IMetaDataImport will use the token to locate the item in the appropriate table in the #~ stream. You can get a token for an item either by requesting the item by name or by enumerating the items of a particular type. If you use an enumerator, you must free it once the enumeration has completed. When you have a token, you can call a method to get information about the specified object. Here is an example:

 

// dumptypes.cpp

HRESULT hr;

HCORENUM hEnum = 0;

mdTypeDef typeDefs[5];

ULONG count = 0;

 

do

{

   hr = pImport->EnumTypeDefs(&hEnum, typeDefs, sizeof(typeDefs)/sizeof(mdTypeDef), &count);

   for (ULONG idx = 0; idx < count; idx++)

   {

      ULONG size = 0;

      // Get the size of the name.

      pImport->GetTypeDefProps(typeDefs[idx], 0, 0, &size, 0, 0);

      LPWSTR strName = new WCHAR[size];

      DWORD flags;

      mdToken baseClass = 0;

      pImport->GetTypeDefProps(typeDefs[idx], strName, size, 0, &flags, &baseClass);

      LPCWSTR strType = TypeOfType(pImport, flags, baseClass);

      // If the class is nested, get the full name by

      // repeatedly accessing the name of the encloser class.

      if (IsTdNested(flags))

      {

         LPWSTR strEncloser = 0;

         mdTypeDef nestedType = typeDefs[idx];

         while (true)

         {

            // Get the token of the enclosing class.

            mdTypeDef encloser;

            pImport->GetNestedClassProps(nestedType, &encloser);

            LPWSTR str = GetTypeName(pImport, encloser);

            if (strEncloser == 0)

               strEncloser = str;

            else

            {

               // Prefix the name with the enclosing class.

               LPWSTR strTemp;

               strTemp = new WCHAR[lstrlen(strEncloser) + lstrlen(str) + 3];

               wcscpy(strTemp, str);

               wcscat(strTemp, L"::");

               wcscat(strTemp, strEncloser);

               delete [] strEncloser;

               delete [] str;

               strEncloser = strTemp;

            }

            // See if the encloser class is a nested class.

            pImport->GetTypeDefProps(encloser, 0, 0, 0, &flags, 0);

            if (!IsTdNested(flags)) break;

            nestedType = encloser;

         }

         wprintf(L"%s %s::%s;\n", strType, strEncloser, strName);

         delete [] strEncloser;

      }

      else

         wprintf(L"%s %s;\n", strType, strName);

      delete [] strName;

   }

} while (count > 0);

if (hEnum) pImport->CloseEnum(hEnum);

EnumTypeDefs is called repeatedly until the enumeration is exhausted. The first time that the method is called it is passed zero as the first parameter. A handle to the enumeration is returned, and this handle is passed to EnumTypeDefs on subsequent calls. The method will attempt to fill the array with tokens and return the count of tokens that were returned. After the enumeration has completed, CloseEnum is called to clean up resources allocated for the enumeration.

For each token, we call GetTypeDefProps to get information about the type. This method can return the name, a token for the base class of the type, and a flags value that will return one of the values from the CorTypeAttr enumeration. The corhdr.h file defines various macros to check for various flags in this enumeration, and we will concentrate on just two flags (tdInterface and tdClass) and three macros (IsTdInterface, IsTdClass, and IsTdNested). If the class is nested within another class, the IsTdNested macro will return true, and to get the token of the enclosing class, you can call GetNestedClassProps. Because classes can be nested to multiple levels, we loop until we get to the top-level class. EnumTypeDefs will return tokens of __value and __gc types; __value types can be classes or enums, and __gc types can be classes or interfaces, so the code needs to determine which of these four types the token refers to. The tdInterface flag is a nonzero flag that makes the positive assertion that the type is an interface. The tdClass flag is zero, so you check to see whether a type is an interface; otherwise, it is a noninterface type. However, there is no flag for value types or enumerations. The only way to check for these is to test the base class for the type. This test is the purpose of the TypeOfType method that we have defined, as shown here:

// dumptypes.cpp

LPCWSTR TypeOfType(IMetaDataImport* pImport, DWORD flags, mdToken baseClass)

{

   static LPCWSTR types[] =

   {

      L"__gc __interface",

      L"__gc class",

      L"__value class",

      L"__value enum"

   };

   int type = 0;

   if (!IsTdInterface(flags))

   {

      LPWSTR name = GetTypeName(pImport, baseClass);

      if (name != 0)

      {

         if (wcscmp(name, L"System.ValueType") == 0) type = 2;

         else if (wcscmp(name, L"System.Enum") == 0) type = 3;

         else type = 1;

         delete [] name;

      }

   }

   return types[type];

}

Here we use a static array of the types that you can have in .NET. If IsTdInterface returns true, the type is an interface. Otherwise, the method obtains the base class name and uses this to determine whether the type is an enum, a __value type, or a __gc class type. The GetTypeName method returns the name of the base type. This process is a little more complex than calling GetTypeDefProps because GetTypeDefProps returns the properties of a type defined in the current scope, but the base class type might have been defined in another assembly, in which case you do not want to get the properties of a type definition, but of a type reference. Here is our implementation of GetTypeName:

// dumptypes.cpp

// This method returns a string allocated with the C++ new

// operator, so you must delete the value when you have finished

// with it.

LPWSTR GetTypeName(IMetaDataImport* pImport, mdToken baseClass)

{

   ULONG size = 0;

   LPWSTR name = 0;

   pImport->GetTypeDefProps(baseClass, 0, 0, &size, 0, 0);

   if (size == 0)

   {

      // Since the size is zero, we attempt to see if the token is a type reference.

      pImport->GetTypeRefProps(baseClass, 0, 0, 0, &size);

      // Interfaces return a NUL character as the base class name

      // when they have no base interface.

      if (size > 1)

      {

         name = new WCHAR[size];

         pImport->GetTypeRefProps(baseClass, 0, name, size, 0);

         return name;

      }

      else

      {

         // There is no name.

         return 0;

      }

   }

   else

   {

      // The token is a type definition.

      name = new WCHAR[size];

      pImport->GetTypeDefProps(baseClass, name, size, 0, 0, 0);

      return name;

   }

   return 0;

}

Once you have a token to a type definition, you can get access to the members of the type. EnumMembers will return the methods and fields, or you can call EnumMethods, EnumFields, EnumProperties, or EnumEvents to get the specific members of the type. If the type implements interfaces, you can call Enum­InterfaceImpls to get the tokens of these interfaces. Interfaces are the only types in .NET that do not have to have a base class. However, GetTypeDefProps will return a non-NULL token for the base class, and the name from this base class will be the single NUL character. This is why we test for this situation in GetTypeName.

 

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8

 


 

< C++ .Net System Prog. 1 | Main | C++ .Net System Prog. 3 >