< C++ .Net Early Stages 7 | Main | C++ .Net Early Stages 9 >

Early Stages of the C++ .Net 8

(Managed Extensions for C++)

Note: The code snippets used in this module are for Visual C++ .Net 2003 and if compiled using Visual C++ .Net 2005 you need to use the /clr:oldSyntax option). Here you will learn new keywords used in the C++ .Net though in VC++ 2005 these keywords already deprecated. However we just want to see how the whole thing changed to the managed C++ .Net from native, unmanaged C++ used in COM, MFC etc.

Managed Strings

Managed Strings

In .NET, strings are managed objects. The System::String class encapsulates most of the actions that you will want to do on a string: compare strings; test for substrings and individual characters; create new strings by concatenating strings; split up existing strings; add padding spaces or trim them; and insert, replace, or remove substrings. However, it is important to realize that a System::String is immutable. If you call any of its methods that change a string, you do not get back the original string modified; instead, you get a completely new string. For example, if you call ToLower on a string, you do not affect the string that you are calling. Instead, you get a new string that has the lowercase characters.

If you want to create a string buffer that can be modified, you should use a StringBuilder object (in the System::Text namespace), which has methods to insert, remove, and replace substrings in a buffer and add the string representations of various primitive types to the end of the buffer. The String class holds data as Unicode characters. Each one is a Char data type. You can access each character through the Chars indexed property, as shown here:

String* str = S"Hello";

// Get the fourth character.

Char c = str->Chars[3];

The String class implements the Chars property so that the first character in the string is at index zero. Also notice the syntax for declaring a literal string. The S prefix indicates that the string is a managed string. The String class has constructors that take an unmanaged pointer to a char buffer (String(SByte*)) and an unmanaged pointer to a wide char buffer (String(Char*)), which will convert the strings to the managed string. However, to do so requires the compiler to generate extra code, so if possible, you should always use managed string literals. For example, this code:

String* str = S"Hello";

generates this IL:

ldsflda valuetype $ArrayType$0xe68a7113 '?A0xcfbb78fe.unnamed-global-0'

newobj instance void [mscorlib]System.String::.ctor(char*)

stloc.0

The first line loads the address of a static, global field named ?A0xcfbb78fe.unnamed-global-0. This array is passed to the String constructor that takes an unmanaged pointer to a wide char buffer. The constructor string (in this case) is stored as the local variable 0. The static field looks like this:

.field public static valuetype $ArrayType$0xe68a7113 '?A0xcfbb78fe.unnamed-global-0' at D_00008030

The type of the field is $ArrayType$0xe68a7113, another compiler-generated name that looks like this:

.class private explicit ansi sealed $ArrayType$0xe68a7113 extends [mscorlib]System.ValueType

{

   .pack 1

   .size 12

   // Other items omitted

}

The type has no code and no members. It merely indicates that the type takes up 12 bytes the size of the literal string in Unicode characters. The static field ?A0xcfbb78fe.unnamed-global-0 is stored in the initialized data section of the PE file at location 0x8030.

.data D_00008030 = bytearray (48 00 65 00 6C 00 6C 00 6F 00 00 00) // H.e.l.l.o...

A similar data item and field will be created if you initialize the string with an ANSI string. So in both of these cases, you will have a static field initialized with data in the initialized data section of the PE (Portable Executable) file and this field is then passed to the constructor of System::String. Compare this to the situation when the literal is a managed string. The IL generated is this:

ldstr "Hello" /* 70000001 */

stloc.0

In other words, the string is stored in the user strings section (#US stream). This is the ‘user string’ metadata stream held within the PE file; items in this stream are identified by metadata tokens. MSIL is composed of opcodes and metadata tokens. The various metadata streams are documented in the ECMA spec, “Partition II Metadata,” Chapter 23, “Metadata Physical Layout.”) of the metadata section of the PE file (this is part of the PE .text section), which is loaded as a managed string and pushed onto the stack all in one IL statement. The value in comments after the string literal is the metadata token for the literal (which you can view by using the /token switch of ILDASM). The token is actually a 1-based index into a table. The top byte (0x70) indicates that the metadata table is the string table. Consider this code:

String* s1 = S"Hello";

String* s2 = S"Hello";

The two references are initialized with the same literal string. The IL looks like this:

ldstr "Hello" /* 70000001 */

stloc.1

ldstr "Hello" /* 70000001 */

stloc.0

As you can see, the compiler has noticed that the same literal is used for both, so the metadata #US stream has only one copy. Furthermore, the object references will be the same (Object::Equals will return true) because when the first string is created, the runtime will intern the managed string. The next time a string with the same metadata token is loaded, the runtime will return the same interned object. You have to be wary about comparing strings especially if, like me, you write some code in C#. There are several ways to compare strings: some compare string references, some compare the actual strings, and some compare both. The C++ == operator, when used with string references, tests to see if they are the same reference that is, the operator is the same as in unmanaged C++. You get a comparison of the references and not a comparison of what the objects contain. Be careful here because in C# the operator == for System::String checks both the references for equality, and if they are not the same object, the operator checks the value of the objects for equality. Thus, the code

// Managed C++

String* s1 = S"X";

String* s2 = S"XX";

// Get substring so we do not get the interned string

String* s3 = s2->Substring(0, 1);

if (s1 == s3) Console::WriteLine(S"same");

else Console::WriteLine(S"not the same");

will indicate that the strings are not the same because the comparison of the string references fails. The C# code:

// C#

string s1 = "X";

string s2 = "XX";

// get substring so we do not get the interned string

string s3 = s2.Substring(0, 1);

if (s1 == s3)

Console.WriteLine("same");

else

Console.WriteLine("not the same");

indicates that the strings are the same. The reason is because == in C# actually calls the String::op_Equality method, which calls the static String::Equals(String*, String*) method. You can call this method in C++ explicitly.

if (String::Equals(s1, s3))

Console::WriteLine(S"same");

else

Console::WriteLine(S"not the same");

The == operator first checks to see whether the references are the same (which is a quick check for equality), and if they are not, it checks to see whether either is a null pointer. If this check fails, the operator calls the instance method String::Equals(String*), which does the more costly comparison of the values of the strings. It is better to call the static Equals rather than the instance Equals because the former is potentially faster if you are likely to compare strings that could be the same object reference. String::Equals does a case-sensitive comparison. If you want to do a case-insensitive comparison, call the static String::Compare overload that takes two strings and a Boolean. A value of true for the Boolean does a case-insensitive comparison. However, be wary of this method (and the associated CompareOrdinal and CompareTo) because the return value is not a Boolean; it is an integer with a similar meaning as the integer returned from the CRT strcmp. So if the strings are the same, CompareTo will return zero, which, of course, C++ will treat as a Boolean false.

You cannot pass a managed string to a C++ standard library or a CRT function. Instead, you can use the Marshal::StringToHGlobalUni method in the System::Runtime::InteropServices namespace to convert a managed string to a Unicode native string allocated on the LocalAlloc heap. After using the string, you must free the string with a call to FreeHGlobal. For better performance, Visual C++ provides the following function to give access to the internal buffer of wchar_t characters in a managed string:

// From vcclr.h

inline const System::Char * PtrToStringChars(const System::String *s)

{

   // Pin to avoid one-instruction GC hole in reinterpret_cast.

   const System::String __pin*pps = s;

   const System::Byte __pin*bp = reinterpret_cast<const System::Byte*>(s);

   if (bp != 0)

   {

      unsigned offset = System::Runtime::CompilerServices::RuntimeHelpers::OffsetToStringData;

      bp += offset;

   }

   return reinterpret_cast<const System::Char*>(bp);

}

Each managed string has a character buffer at a fixed offset from the beginning of the object. The OffsetToStringData property has this offset value, so the function pins the object and obtains a pointer to the first byte of the string object. The function then increments this byte pointer by the standard offset, which will give access to the character buffer. The function returns a Char_gc* pointer because the pinning only lasts as long as the scope of the function. The code that uses OffsetToStringData has to pin the return value before passing it to an unmanaged function, as shown here:

String* s = S"hello";

const Char __pin* p= PtrToStringChars(s);

_putws(p);

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8

< C++ .Net Early Stages 7 | Main | C++ .Net Early Stages 9 >