Prompted by a conversation with some colleagues where-in we collectively speculated about the implementation details of a generic class and what impact – if any – this might have on performance vs a “traditional” polymorphic equivalent, I threw together a quick performance test case in my Smoketest framework, and as a result discovered a couple of significant changes in Delphi 2009 that created some unexpected problems.

1. FastStrings

As I feared, the immensely useful FastStrings library is broken by the switch to Unicode.  Bye-bye FastCharPos, FastReplace etc etc.

This is going to be a major headache for me in my day-job as I work on a 2 million line code base that makes extensive use of FastStrings; string performance is critical to the overall performance of the application as a whole (just one reason we are somewhat wary about the universal nature of the Unicode change in Delphi 2009).

In the lead up to the Delphi 2009 release I repeatedly asked for some indication of what impact the Unicode transition would have on FastStrings and none was ever forthcoming.

I guess I have my answer now.

As I understand it, work on the library stopped some time ago.  I will need to try to get in touch with Peter Morris to see if he has any plans at all to return to it and update it for Delphi 2009 or if he is aware of anyone that is.

I don’t know at this stage what – if anything – can be done to make FastStrings usable with Delphi 2009, other than changing the declarations in the library to AnsiString so that the compiler won’t let you call them with UnicodeString (currently they are declared as just String, which of course now is UnicodeString).

I’ve done some initial testing of string performance in Delphi 2009 and have actually been pleasantly surprised so far.  Simple string concatenation of UnicodeString seems for some reason to be actually faster than ANSIStrings, and faster even than ANSIString handling in Delphi 7 (with FastMM).

I find this puzzling given that I was sure I had seen the new TStringBuilder class being recommended for performance reasons.  For simple string concentation at least, the exact opposite appears to be the case.

ANSIString performance on the other hand I think will suffer quite badly in Delphi 2009, most – if not all – of which I think is the result of an assumption of Unicode in the RTL.

There is no ANSIString version of IntToStr() for example – there is only one IntToStr() and it returns String, i.e. a UnicodeString.  This means that use of IntToStr() (or any of a number of common RTL string routines) when working with ANSIString variables will inevitably incurr ANSI/Unicode conversions that previously they did not.

With this assumption propogated throughout the RTL, using ANSIString in a Delphi 2009 application could actually prove costly if string performance is a concern.

I don’t quite understand why CodeGear could not have provided ANSI versions of these RTL routines (IntToStrA, for example) alongside the newly Unicode capable “default” implementations.

2. News FLASH!  The VMT Has Changed

After fixing compilation problems with FastStrings (i.e. removing FastStrings from the code!) upon attempting to run my performance test case I immediately ran into an access violation in the code that identifies the published methods in my test case classes.

The reason appears to be that the layout of the VMT (Virtual Method Table) has changed.

The help files proved to be rather begrudging in giving up their secrets in this area.  The Class Types topic itself that describes the VMT layout clearly has not been updated (and took some finding, being now a section buried in an Internal Data Formats topic) as it still describes the previous VMT layout.

Ironically the deprecated vmtXXX constants do seem to have been updated – these deprecated constants seem to include some all new, instantly deprecated ones.

So for example I can see that vmtEquals (a new entry at -44) occupies the position in the VMT that previously contained a pointer to the class name, and the entry that used to contain the instance size is now occupied by a pointer to the GetHashCode method (vmtGetHashCode).

As far as I can tell from the various new vmtXXX constants the changes are three additional method pointers inserted between the vmtParent and vmtSafecallException entries:

    Equals            : Pointer;                // -44
    GetHashCode       : Pointer;                // -40
    ToString          : Pointer;                // -36

So all offsets previously at -36 and above (below?) are affected.  There are also some new initial entries in the “user” virtual methods area, i.e. at positive offsets:

    QueryInterface    : Pointer;                // 0
    AddRef            : Pointer;                // +4
    Release           : Pointer;                // +8
    CreateObject      : Pointer;                // +12

Now obviously the VMT is an internal implementation detail of Delphi, and everyone knows that relying on it’s layout is dangerous – I even have a warning to myself to this effect in my own code!

So don’t mistake this for a complaint at the layout having changed – it’s just a heads-up to anyone who themselves may have used the previous documentation and/or other works describing the previous VMT layout (which up to now has proven remarkably stable) and a caution that the Delphi 2009 documentation itself does not appear to have been completely updated.

For my own routines using the VMT, that wasn’t the end of the story….

3. PointerMath

Delphi 2009 now supports pointer math on all pointer types! YAY!  🙂

Seriously – this is HUGE!

Previously pointer math was only supported on PChar pointers. You could of course Inc() any typed pointer by the size of the type pointed to, but actual addition of an arbitrary offset (among other things) simply wasn’t allowed.  So to increment a pointer you would have to write something like, e.g:

  ptr := Pointer(Cardinal(ptr) + offset);

Which is cumbersome to say the least. So instead I was naively exploiting the fact that PChar was a pointer to a single byte-size data type:

  Inc(PChar(ptr), offset);

Can you see the problem?

PChar in Delphi 2009 is no longer a pointer to a single byte-sized Char, it’s a pointer to a WideChar, which of course is TWO bytes. Oops.

“Aha!” I thought “Pointer math to the rescue?”

The new pointer math capability is subject to a compiler switch (is OFF by default) so I could now write:

  {$pointermath ON}
  ptr := ptr + offset;
  {$pointermath OFF}

But there are two problems with this.

First, if ptr is a typed pointer then the resulting increment of ptr is offset * type size. In other words, this is exactly equivalent to a call that you could previously have made using Inc().  This is perhaps to be expected, but it might catch you out – as it initially did me.

The second problem is this of course won’t compile in older versions of Delphi; wrapping all the necessary conditionals and compiler options to enable compatibility with those older compilers whilst preserving the state of the switch in any current project quickly gets out of hand, and in any case I will still need old-style pointer adjustment code for those older compilers.

So in the end I just changed my PChar to a PByte and all was again right with the world.

  Inc(PByte(ptr), offset);

Just another example of “say what you mean if you really mean it”.

In this case, I want to increment in byte-size chunks, so use a PByte already!  Not some other pointer that happened – at the time – to also point to a one-byte size data type.

Pointer math is undeniably a useful addition to the language, but when it comes to advancing typed pointers I’d suggest sticking to Inc() calls – it’s portable between Delphi versions, is directly equivalent to the new alternative, but will be more familiar to the widest audience since the Inc() behaviour is more likely to be already understood where plain addition of ordinal values to pointer types is new and therefore not so familiar.

It’s also far cleaner to coerce byte offsets from an Inc() call (where required) than it is using pointer math:

  Inc(PByte(ptr), offset);

vs

  ptr := PValueType(PByte(ptr) + offset);

4. The Frustation of the New

Pointer Math does again highlight an inevitable frustration with new language features.

Apart from the simple job of learning how the new features behave in code, the ability to use them in code destined (or at least intended) to be shared with the wider Delphi community is severely limited by the uptake of the latest version of Delphi.  At the very least it means having to wait some time before incorporating them in code until the latest version has gained traction in the community, and even then you either have to duplicate code subject to compilation directives or close the door on users of older versions of Delphi.

This has always been, and will always be, a problem.  There is no easy answer.

But in the case of Delphi 2009 that problem is exacerbated in my view by the fact that Unicode is so tightly coupled to Delphi 2009.  The likelihood that Delphi 2009 will become the ubiquitous Delphi version any time soon is – imho – going to be limited by the fact that anyone wishing or needing to maintain ANSI applications are likely to be running Delphi 2009 alongside their older Delphi compilers.

5. Generics are FAST!

I eventually fixed my problems and got my performance test case running.

It was not an exhaustive test, I simply drafted two routines that added the integers 1 thru 1000 to an integer list and then iterates over the list, summing the items in the list.

Two different integer list classes were used and the performance of the code in each case compared (for brevity a lot of implementation code is not shown):

    TGenericIntegerList = Generics.Collections.TList;

    TDerivedIntegerList = class(Classes.TList)
    private
      function get_Item(const aIndex: Integer): Integer;
    public
      procedure Add(const aInteger: Integer);
      property Items[const aIndex: Integer]: Integer read get_Item; default;
    end;

    TTestGenerics = class(TPerformanceCase)
      procedure GenericList;
      procedure DerivedList;
      procedure EnumGenericList;
      procedure EnumDerivedList;
    end;

begin
  TestSuite.Initialize;
  TTestGenerics.Create(2, pmSeconds);
  TestSuite.Ready;
end.

The test case is configured to run repeatedly for 2 seconds – Smoketest then reports the number of complete executions of each method in that period.

Each of the test case methods follows the same basic pattern. The GenericList method is shown here:

var
  i: Integer;
  r: Integer;
  list: TGenericIntegerList;
begin
  r := 0;

  list := TGenericIntegerList.Create;
  try
    list.Capacity := 1000;

    for i := 1 to 1000 do
      list.Add(i);

    for i := 0 to Pred(list.Count) do
      r := r + list.Items[i];
  finally
    list.Free;
  end;

The EnumGenericList method replaced the explicit iteration of the list with an enumerator based version, i.e:

    for i in list do
      r := r + i;

This had nothing to do with the performance of generics as such but was more out of curiosity, as I had a suspicion that I would see a difference.  The enumerator test for the derived list implementation was implemented for completeness, but frankly you wouldn’t enumerate over such an implementation since – out of the box at least – the enumerator for that class yields Pointers, not Integer, so some real nasty type-casting is needed.

The results were good news for generics, not so good for enumerators.

The generics based implementation of a TList, in this performance test case, was consistently at least 50% faster than the implementation derived from a regular TList.

A 50% improvement is something absolutely not to be sniffed at.

Unfortunately the enumerator implementations aren’t quite so rewarding. Admittedly the test cases don’t completely isolate the list iteration code for a true test of manual iterations vs enumerators, but nevertheless, in this case the enumerator based test methods were at least 25% slower than their manual iteration counterparts.

For completeness I should mention that the performance impact of an enumerator was actually slightly worse for the derived class implementation.

On the face of it, this would seem to confirm something that I have always suspected with enumerators – the syntactic sugar they provide comes at a not insignificant performance cost and I’ve personally never considered the syntactic sugar as earth-shatteringly significant as some.

It should be said that in a very large proportion of applications the performance impact is likely to be insignificant, but it is something that developers should perhaps be aware of when working in performance critical areas.

At the same time, it does seem that generics deliver more than just syntactic sugar.  I am frankly stunned by the performance improvement that they seem to bring.

It makes it even more of a shame that the implementation in Delphi 2009, as it stands currently, lacks operator constraints.  This demands a great deal of hoop jumping and circumlocutory tricks to create generic classes that are anything more than simple containers.

This problem is not confined to Delphi though – .NET and C# generics have similar problems.

Hopefully these shortcomings will be addressed in the future (and perhaps an opportunity for Delphi to steal a march on C#).  But for now, the prospect of containers that are not only type-safe but also more efficient is enough to get me a lot more excited about generics than I – for one – previously was.

15 thoughts on “Delphi 2009 – A Heads-Up for Low-Level Coders

  1. Hmm, I wonder why the VMT had to be changed?
    Couldn’t the new methods be added add the beginning (most negative offset) and thus retain compatibility?

  2. FastStrings:
    Looking at the FastStrings license prevents anybody except Peter Morris from making the FastStrings unit Delphi 2009 compatible.

    Excerpt from the license:
    //No copying, alteration, or use is permitted without
    //prior permission from myself.

    But I don’t think that you need FastStrings anymore. The last change happend in 2003 and after that CodeGear started to include faster string handling functions from the FastCode project into the RTL.

    The problem with Delphi 2009 is that if you have any assembler code that operates on strings will be broken. But that was to be expected when they told us (in their blogs) that Char will be a WideChar.

    VMT change:
    It is really a shame that they marked the vmtXXX constants deprecated. It makes the usage of low level code a lot harder (except you ignore the deprecated mark). They must have thought that if somebody wants to read/alter the VMT, he will use assembler code where he can use VMTOFFSET.
    But if you use the vmtXXX constants your code should still work. This must mean that you (or the author of the unit) have used a hard-coded offset (what I certainly can’t proof)

    The Frustation of the New:
    As one of the JCL/JVCL developers I see your point with backward compatibility. The new language features are only of interest to application developers without the need to look back after the project has been migrated. I’m glad to be both so I will be able to use generics, anonymous methods and the new pointer arithmetic (if I ever need them in application code what I can’t think of at the moment).

    Enumerators:
    I haven’t use for-in in any of my projects because it doesn’t give me any advantage over a for-loop. I still have to declare a local variable and the overhead for the enumerator kills it in my eyes. I’m known to use very efficient code constructs while writing new code because I know the assembler code that the compiler generates very well. A simple “const” for strings that costs you only 5 key presses can make a huge difference without thinking about more efficient algorithms (a task that has to be done when you do the optimization turnaround)
    Because of this I will never ask myself the question for-loop or for-in during development.
    BTW: The “const S: string” has lost it’s efficiency in Delphi 2009 what I really regret.

    You should sit down before you take a look at the assembler code that the D2009 compiler generates for this:

    function GetLen(const S: string): Integer;
    begin
    Result := Length(S);
    end;

  3. @Ajasja – that certainly is an interesting question and did occur to me also.

    @Andreas – thanks for the very detailed comments.

    That’s an interesting thought w.r.t not needing FastStrings any more. I sense I shall be spending some more quality time with Smoketest in the very near future.

    🙂

    The breakage in my VMT code arose from the fact that I have an implementation that uses a regular Pascal packed record recreating the layout of the VMT, rather than using ASM.

    Thinking about it, it’s actually rather odd that such a record isn’t declared in the RTL/VCL itself. The structure is documented after all, so it’s not exactly a secret.

    I too am a habitual const declarer and your comment w.r.t “const param: String” in Delphi 2009 has definitely gotten me curious, although what I’ll be able to deduce from the ASM remains to be seen.

    🙂

  4. @Andreas: The “const S: string” has lost it’s efficiency in Delphi 2009.

    I think it’s an additional overhead related to the new codepage feature.
    http://blogs.codegear.com/abauer/2008/07/16/38864

    And I believe, it’s a counter-productive feature that induce an additional expense of memory and processor time. (See my comments in Allen’s blog).

  5. I’ve just compared the integer lists myself, and no, I don’t see such difference in speed. Sometimes the generic list faster then the derived one, sometimes another way. Depends of operation types, count of items, compiler options…

    At least the generic list isn’t worse than the derived list, it is enough for me to use it in my code.

    My test project is here:
    http://rapidshare.com/files/145179944/ListSpeed.zip.html

  6. Hi Kryvich – interesting. I know test methodology can be a contentious subject, but I do note that your test only times one iteration of the test for each class.

    Running your test repeatedly does give a slight variation in results – reducing the “max” to only 10000 allows the test to be repeated more often and results in greater variation. This variation sometimes favors the derived list and other times favors the generic list.

    Smoketest runs each method in a performance test case repeatedly for a specified duration or a specified number of executions. In this case I used a time limited run of 2 seconds (that’s per method). Performance test case results are then expressed as “executions per second” and “average execution time” for each method.

    However, I have now realised that my test – and yours in fact – included construction and destruction time, which if nothing else should be measured separately.

  7. Hi Andreas,

    I’m currently evaluating Delphi 2009 for our company. So far the GUI is far more stable than the one of Delphi 2006 we currently use.

    I’m shocked about the lost string efficiency for const parameters you mentioned!
    What bite Borland?

    Widestrings, unicode and the like are not only fine but are very long *really* needed and awaited
    …but…
    the fact that string and char can not be made compatible to former versions of Delphi (8 bit chars) by a compiler switch is a show stopper for us.
    We simply will not have the time to adapt and test some million source code lines.

    Most new language features are not implemented as I had expected.

    Generics are a great innovation
    …but…
    produce a massive code bloat since all methods are duplicated for *every* distinct type parameter, e.g.

    type
    t_1=class end;
    t_2=class end;
    t_generic=class
    v: T;
    function get:T; inline;
    end;
    function t_generic.get:T;
    begin
    exit(v);
    end;
    procedure test;
    var
    g1: t_generic;
    g2: t_generic;
    begin
    g1:=t_generic.create;
    g2:=t_generic.create;
    if g1.get=nil then;
    if g2.get=nil then;
    end;

    The 2 invocations of get not only do not get inlined – they are calls to 2 distict methods with the same contents (ignoring the “inline”)!
    I do not want to think what happens to code size if we convert all our containers to generics!

    The anonymous methods are also nice but looking at the implementation makes me shudder. Whenever anonymous methods are used object interfaces and the

    corresponding stubs are created; the interface is created, used and disposed (in an implicit finally).
    Just look at the disassempled code sample of
    http://blogs.codegear.com/davidi/
    The anonymous method can also be inlined (with the same effect).
    At least for this example there is no need for any variable to be captured. All that is needed for correctly calling the method is to provide the correct link, be it the

    static link, self, or whatever. A uniform closure would help.
    For anonymous methods as parameters restoring EBP would be OK.
    We usually do *not* need variable capture.
    The problem as I see it is the missing distinction between proc variables (Delphi) and proc parameters (ISO Pascal).
    Using proc parameters the need for vaiable capture is simply not there.

    The produced code is almost the same as the code of the first version of Delphi-32.

    Many known an reported compiler errors are still present:
    sqr(x) does not deliver the same result as x*x for byte, word, smallint, int64, uint64. Test with int64_var:=sqr(test_var).
    swap does not zero out the high word as it did for all 16 bit versions (TP/BP/Delphi-1).
    Code like “i:=i div 2” produces a division instruction instead of a shift as it does for “a:=b div 2”.
    In the unassembly movsx/movzx still show no operand size.
    Temp vars are not reused. Look at the disassembly of “if copy(s,1,1)=” then if copy(s,1,1)=” then if copy(s,1,1)=” then …”.

    Many opportunities for optimization are still left out:
    There is still no real optimizer.
    Value propagation is still missing!
    Inline often results in more code than expected, especially when string parameters are used.
    Code like x+2+3 results in 2 additions. 1 addition is sufficient if overflow check is off.
    Code like ‘x’+s+”+” or s+’a’+’b’+’c’ causes 4 string to be concatenated where 2 would be sufficient.

    The long wanted case for strings is still not there.

    All in all I am sort of disappointed and currently I would not opt to switch to Delphi 2009.
    We definitively need speed – at least for integer (possibly float/string) performance.
    Unfortunately we have no real alternative than to stick with D6 / D2006 or switch to another language (I hate C,C#).

  8. Regarding “const-string”, I’ve tried your example, but I see nothing strange :

    GetLen:
    004B4644 85C0 test eax,eax
    004B4646 7405 jz $004b464d
    004B4648 83E804 sub eax,$04
    004B464B 8B00 mov eax,[eax]
    004B464D C3 ret

    So I think this was resolved somewhere, somehow.

    Please check your results (and publish them if you please) so we can all lay this issue to rest before it starts living a life of it’s own 😉

    Cheers,
    Patrick

  9. Addendum:

    With the above mentioned non-needed capture I mean that for us normally an auxiliary object (interface) is not needed in order to enlarge the lifetime of otherwise temporary local variables of the defining/calling scope.
    It would be sufficient to have a mechanism to indirectly call a method declared on the fly to get the effect of iterators and/or call by name.

  10. @Patrick van Logchem: Looks like I simplified my example too much (I should have checked that after I changed it). Here is the original example code:

    procedure Test(const S: string);
    begin
    if Length(S) > 0 then ;
    end;

    Delphi 2007:
    begin
    if Length(S) > 0 then
    test eax,eax
    jz $004611f5
    sub eax,$04
    mov eax,[eax]
    test eax,eax
    end;
    ret

    Delphi 2009:
    begin
    push ebp
    mov ebp,esp
    push ecx
    mov [ebp-$04],eax
    mov eax,[ebp-$04]
    call @UStrAddRef // => CPU LOCK
    xor eax,eax
    push ebp
    push $004611df
    push dword ptr fs:[eax]
    mov fs:[eax],esp
    if Length(S) > 0 then
    mov eax,[ebp-$04]
    test eax,eax
    jz $004611be
    mov edx,eax
    sub edx,$0a
    cmp word ptr [edx],$02
    jz $004611be
    lea eax,[ebp-$04]
    mov edx,[ebp-$04]
    call @InternalUStrFromLStr
    test eax,eax
    jz $004611c7
    sub eax,$04
    mov eax,[eax]
    test eax,eax
    end;
    xor eax,eax
    pop edx
    pop ecx
    pop ecx
    mov fs:[eax],edx
    push $004611e6
    lea eax,[ebp-$04]
    call @UStrClr // => CPU LOCK
    ret
    jmp @HandleFinally
    jmp $004611d6
    pop ecx
    pop ebp
    ret

    If you use the {$STRINGCHECKS OFF} compiler option (also available in the project options) you can force Delphi 2009 to omit all the extra code that is only required to catch the missing C++Builder AnsiString to UnicodeString migration.

  11. You’re right. Switching off string checks helps.
    The help system is still nearly as bad as the one in D2006.
    I could not find anything about this switch and I wonder about the default setting ON.
    If the compiler were smart enough all the code of your test proc would have been reduced to a simple ret or even eliminated at all since calling an empty proc makes not always sense…
    See my comments optimization.

  12. Uh, oh, the template brackes are missing in my generic example above due to the web interface. Of course g1 should have been paramtresized with t1_1 and g2 with t_2.

Comments are closed.