Be careful what you wish for. A lot of people were overjoyed to hear that Unicode support was coming to Delphi. Some were skeptical of the chosen implementation approach however, it all seemed just a little bit too easy. I was one, and sadly it seems I was right.
I’ve just started updating a whole host of code to Delphi 2009. Since Unicode is what my code has to speak whether I like it or not (if I want to use the latest/greatest Delphi compilers) then I may as well get with the program and drag my ANSI code kicking and screaming into the UTF-16 world.
“Kicking and Screaming” is certainly involved. But mostly it’s me doing it.
In utter frustration.
There are currently two head-shaped dents in the wall. Allow me to share my pain.
Dent #1: UTF8ToANSI()
The help says:
Call Utf8ToAnsi to convert a UTF-8 string to Ansi. S is a string encoded in UTF-8. Utf8ToAnsi returns the corresponding string that uses the Ansi character set.
Well if you do call UTF8ToAnsi and expect it to do what that says it will do then you are in for a disappointment. Because it actually returns a UnicodeString.
Not “what it says on the tin” and sure as eggs is eggs not what the caller wants or expects, not matter how quickly and dirtily they want to hack their old ANSI code into a warning-free ANSI-on-WideAPI “pseudo-Unicode” application..
Dumb. Dumb. Dumb.
But it gets even dumber. There is now also a function UTF8ToString()! And if we inspect the source for that we find the following puzzling implementation:
function UTF8ToString(const S: RawByteString): string;
begin
{$IFDEF UNICODE}
Result := UTF8ToUnicodeString(S);
{$ELSE}
Result := UTf8ToAnsi(S);
{$ENDIF}
end;
You may be thinking that this isn’t puzzling at all. That it is a perfectly sensible implementation – if we’re compiling with UNICODE defined then defer to the UnicodeString implementation. Otherwise defer to the ANSI implementation.
But hang-on, since when was UNICODE an option in Delphi 2009? CodeGear decided that it was to be “Unicode or the Hi-rode”.
But it get’s better, because of course in deferring to the ANSI implementation – even assuming this is ever compiled with UNICODE not defined – they are of course calling the so-called-ANSI implementation that – UNICODE or not – calls the Unicode implementation!
You can’t help but think that CodeGear managed to confuse even themselves with their approach to Unicode.
OK, But What’s this Wide-ANSI Nonesense ?
Ok, so UTF8ToANSI() is not something that I’m guessing many people are actually using (my code is working with the Apple Bonjour SDK which makes extensive use of UTF-8, and the code originated in Delphi 7, hence the ANSI/UTF8 conversions).
So I ran into an edge-case. No big deal, right? Granted. But sadly there are other, wider [sic] issues which people will undoubtedly run into.
Let me give you an example that I’m certain will be encountered by more people than that UTF8/ANSI scenario.
Dent #2: Uppercase()
var ws: String; // UnicodeString in Delphi 2009/10 ws := 'aà'; ws := Uppercase(ws);
What does ws contain after the call to Uppercase()? If you said AÀ then congratulations – you are 100% wrong. It actually contains Aà.
Sadly, that is NOT the uppercase version of the input string according to the Unicode specification.
Now, to be fair, the documentation in this case is quite correct and clear that Uppercase() does not handle Unicode Strings as Unicode. Which makes an absolutely mockery of a strongly typed language such as Delphi.
Uppercase() accepts a String parameter. In Delphi 2009 String is a Unicode data type.
In Delphi 2009+ Uppercase() is – by contract – a Unicode function and should jolly well behave as such.
Frankly, this decision by CodeGear speaks to me of a cavalier disregard for what makes Delphi, well, Delphi.
Anyway… Now, the hard part. Can you guess what function you should use to properly convert a UnicodeString to uppercase?
Well, you actually have a choice of two:
WideUppercase( );
ANSIUppercase( );
No, wait. That can’t be right surely? ANSIUppercase?
Yes. Of course the alarm bells should be ringing as soon as you realise that ANSIUppercase accepts not an ANSIString parameter but a plain String parameter, which of course made perfect sense pre-Delphi 2009 when String meant ANSIString, but in Delphi 2009+ CodeGear found themselves painted into a corner.
Having decided that they, and only they, would get to choose what String meant to the compiler (although I think that the $ifdef UNICODE in System.pas is evidence that they weren’t – at one time at least – as convinced that this was the right and obvious thing to do as they seemed to suggest), if they properly modified these ANSI RTL functions to accept ANSI strings, then a lot of code currently calling them with String parameters would start throwing up warnings when String transformed from ANSIString to UnicodeString.
But at the same time the vast, and I mean VAST majority of code would simply be calling Uppercase(), and they didn’t want any warnings coming from that either.
What a pickle.
Fight Fire With Fire
It seems they chose to resolve this dilemma by adding a little more confusion into the mix, creating a situation where someone explicitly calling an ANSI routine will obtain a Wide operation and someone calling the “default” implementation, presumably expecting it to yield a Wide operation (because Delphi 2009 is Unicode through and through, right?) is rewarded with an ANSI operation.
Perhaps they thought that if they made our heads spin enough we wouldn’t notice what a pigs-ear they’d made of the whole thing?
But what does all this mean for someone with some old apps that they can’t wait to get updated to the latest Unicode compiler so that they can start selling to their customers who have been demanding Unicode support?
Well having compiled your pre-Unicode source with the Unicode compiler you may have been very happy to find that you had only a handful of warnings to deal with and then BINGO, you had a Unicode application.
Sorry.
I hate to break it to you, but you may not have finished yet.
What you have at the moment is an application that in all likelihood is still behaving very much like an ANSI application, it’s just that it’s now sitting atop the Windows Wide API’s, pretending to be a Unicode application.
In many respects it may fool you, perhaps for a long time, because anyone who hadn’t previously tackled Unicode head-on almost certainly doesn’t have any actual need for Unicode and their application will not be stressing those parts of Unicode support that actually separate a “Unicode” application from a “non-Unicode” application.
A Thought Experiment
This is a “Thought Experiment” in the sense of “You know what Thought did?” One answer to which is “He thought he did, but he didn’t.”
Let’s take a simple and I think highly common situation. A text entry field for some user specified code that is required to accept only alpha-numeric characters and enforce uppercase.
Pre-Delphi 2009 such a field may well have had a key filter installed in an event handler:
procedure TForm1.Edit1KeyPress(Sender: TObject; var Key: Char);
begin
if (Key in ['a'..'z']) then
Key := UpCase(Key)
else if NOT (Key in ['0'..'9']) then
Key := #0;
end;
In Delphi 2009 this throws up the “WideChar reduced to byte char” warning that everyone “doing Unicode” in Delphi has been talking about, and like a good little Delphi developer they do what CodeGear tell them and change this code. But let us imagine that we know a little bit about Unicode and are actually migrating to Delphi 2009 because we want to be able to market our product as a Unicode application.
So rather than simply calling CharInSet, because that still only deals with ASCII characters, we adjust the routine to call the Windows Unicode support routine IsCharAlphaNumeric instead:
procedure TForm1.Edit1KeyPress(Sender: TObject; var Key: Char);
begin
if IsCharAlphaNumeric(Key) then
Key := UpCase(Key)
else
Key := #0;
end;
With this simple change the code now compiles without any warnings.
YAY! We didn’t fall into the trap laid for us by CharInSet! We dealt with this properly and now we have a Unicode application!
Don’t we?
No.
Can you guess what that WideChar version of UpCase() does. Yep. Behaves exactly the same as the ANSI version.
Dumb. Dumb. Dumb.
There is not even the excuse of needing to maintain backward compatability in this case – there was no WideChar version of UpCase() prior to Delphi 2009.
Now, anyone who understands Unicode and specifically the properties and characteristics of UTF16 will be able to tell you why UpCase( WideChar ) does not perform a Unicode case conversion – it simply cannot. A single WideChar may not represent an entire character – it may be part of a surrogate pair and whilst I don’t think that there are any case convertible characters that require surrogate pairs currently, that could change (and I may be wrong on that anyhow).
So what else could it do but echo the ANSI implementation?
Well one option would have been to reflect the true nature of Unicode and simply not try to create the illusion of supporting something that cannot be supported.
But more acceptable I think would have been to perform a Unicode conversion on those chars that it could (non-surrogates) and if a surrogate was supplied as input, simply return it unmodified.
As it is, the previous (and current) “ANSI” implementation is incomplete w.r.t ANSI, providing only ASCII case conversion. It would have been easy to argue that a Unicode implementation that only operated on characters in the BMP (Basic Multilingual Plane) was the natural and obvious behaviour for a Wide UpCase implementation.
So I’m afraid if you do want and/or need proper Unicode support, you’ve still got some work to do before you can get there, and unfortunately the compiler is not going to help you from this point on. Furthermore, the VCL now seems to go out of it’s way to make it harder by in some cases completely breaking the type-safety that you can normally expect when working in Delphi and which would have guided you toward the answers to the numerous questions that arise when contemplating proper Unicode support.
The compiler assistance in “helping you find the things you need to change for Unicode support” only really works if you don’t actually need proper Unicode support and just want to get your ANSI code running over the Wide API in Windows.
Once you’ve reached that point – or even before that – if you then decide you want to do Unicode properly, I fear you will find that the design decisions made to facilitate the migration of Wide-ANSI applications will frustrate you and complicate your job no-end.
Really, I have to wonder if it really was worth getting so excited about Unicode support if it’s main audience is people not actually supporting Unicode properly?
I can only hope that the 64-bit support that it seems people increasingly need is not being delayed in order to make way for a cross-platform implementation that will suffer the same identity crisis as the Unicode implementation is littered with.
Getting it done quick sometimes is not as important as doing it right.
In the case of Unicode I’m afraid I’ve not come across anything to make me think it was “done right” at all.
-
The Unicode change has been a disaster in our shop. We have stuck with Delphi for a very long time because they have never really burned a bridge behind them like Microsoft (ex: VB). This change has done just that to us. Previous versions of Delphi prided themselves on backwards compatibility and compiling. The new versions while possessing some impressive features are ultimately what is going to drive us away from the language. We use several third party with source units and of course some date back quite a while. Several are a nightmare to untangle in the new compiler and the original authors are nowhere to be found. If we have to spend several weeks patching why not put that time to moving forward on something more industry standard that being C# is the question management has put to us. My last really big holdout which was Delphi has always supported older code is no longer a viable excuse. As much as I love Delphi, in our shop we are beginning to let go. Even harder to fathom is that Unicode does absolutely nothing for us so it is an even more bitter pill to swallow.
-
“Getting it done quick sometimes is not as important as doing it right.”
Unicode support was neither quick nor right done. So how in the name of (fill in whatever you prefer here) would you expect 64-bit support to be any different … come on get real.
-
Sorry, but your complaints are a bit lame IMO…
1. Presumably that conditional define just dates from when the RTL, VCL and IDE were in the process of being ‘unicodified’.
2. Yes, updating old UTF8 conversion code is a sore point, but to be honest, the new UTF8String type (and the much simpler conversion code it brings) is worth the pain for me. At least we know that your beef is with the odd break in source code backwards compatibilty. No, wait…
3. That UpperCase and LowerCase in D2009/10 only converts characters in the ASCII range is because they only ever did, a fact that was and is documented in the help (didn’t you ever wonder why parallel AnsiXXX functions were added to the RTL back in D3…?). Basically, what you were assuming here was a *break* in backwards compatibility.
3. Your yabbering on about the AnsiXXX functions is once again inconsistently demanding a break in backwards compatibility – in D3-2007, people usually used AnsiXXX on variables and properties declared as string, not AnsiString, and so would require the D2009/10 behaviour when upgrading. Moreover, do a bit of research before you vent – check out the Character unit and its ToLower/ToUpper functions, new to D2009.
4. All those words about UpCase miss a simple fact: the function is merely a holdover from the Turbo Pascal RTL.
5. ‘if you then decide you want to do Unicode properly, I fear you will find that the design decisions made to facilitate the migration of Wide-ANSI applications will frustrate you and complicate your job no-end’. Design decisions like what? Like adding all those compiler warnings for implicit conversions and suspicious casts, warnings with desriptions I personally found very thorough without being overlong? Like adding the Character unit and TEncoding class? Like imitating the .Net BCL’s unicode support, following well-recognised conventions as a result? Like not going for the tempting-yet-subtle-bug-introducing method of making the default string type UTF-8? Like not wasting resources on supporting parallel ANSI and unicode VCLs?
-
While I normally avoid your comments as you are in the same ranting angry place I spent so many years in, in this case instead of just ranting angry, you have something of a valid point.
Many of the functions SHOULD act unicode if the switch to unicode is meant to be an honest one. And to that end, I put to you the old chestnut: Put each case in the QC database.
Except perhaps the UpCase one – you might want to be more careful there. Part of the problem there is pretending utf-16 solves all our unicode problems. I’ve been starting to move my code forward, and I have seen the pitfalls with utf-8 mbcs. Clearly the solution is meant to be move everything over to a ‘unicodestring’, work with it ‘natively’ and then only use utf-8 as a transport container.
Problem is that utf-16 is also a transport container because it ALSO is a mbcs. In order to avoid all of it, we would need honest to goodness utf-32 (at least this week – on wonders when they will decide that only utf-64 will do… ). Oh wait, the win32 and linux worlds don’t work with utf-32, so everything would have to go through conversions again all the time. So UTF-16 is what we get, we plug our ears and pretend the multi-byte sequences aren’t there.
Probably going to bite us in the ass at some point.
Oh, I do not know why the help even talks about UTF8ToAnsi. It’s absurd. Since a simple
MyAnsiString := AnsiString(MyUnicodeString);
or
MyUTF8String := UTF8String(MyUnicodeString);Already do all the correct encoding conversions under the covers.
If you think that you’ve found problems? Wait until you have to interface unicode strings with older 8bit string libraries.
You end up with a lot of PAnsiChar(AnsiString(aUnicodeString)) code.
Why? Because the same magic does not apparently happen with PAnsiChar(UnicodeString) – and since tyingcasting a string with PAnsiChar or PChar causes compiler magic to call a function in the first place, it pretty much could (I guess that is my bit to toss at QC)
-
I agree with the confusion of names and type safety. It’s very confusing and is very likely to cause buggy programs. I cannot count the amount of websites and other programs that already mess up non US-ASCII characters on a regular basis, showing UTF-8 as Latin-1 or vice versa, loosing some data in the conversion, etc. It’s extremely annoying and it wasn’t necessary to increase the amount of problems.
They should have taken the time to clean up the mixture of ASCII and Ansi-functions. Having functions that have ANSI in the name return or expect Unicode in the params is a real mess.
To properly handle KeyPress Delphi should monitor WM_UNICHAR and pass a UTF-32 character instead of a UTF-16 character as in
procedure KeyPress(Sender: TObject; var Key: UCS4Char);BTW UTF32Char would be a far better name since UCS encodings are deprecated and misleading. UCS-2 is a subset of UTF-16 and UCS-4 is a subset of UTF-32. As such it is misleading to call the characters of a UTF-X string UCS-Y characters. The term UCS shouldn’t be used at all (this was already the case long before Delphi 2009 was released (i.e. years ago).
Similarly neglect of details can be seen when looking at the formatting functions. ListSeparator, ThousandSeparator, DateSeparator, etc.
MSDN states:
“The maximum number of characters allowed for this string is four, including a terminating null character. ”
But Delphi uses only one Char. Obviously with Unicode this is even worse if the separator character is not in the BMP.The result is, works in most cases, but strange error occur in special cases. The kind of bugs you “love” and for which your customers will hate you.
Now, I don’t want to even think about how many library functions properly handle Unicode. Making sure it works would have required thoroughly reviewing all the code. Considering how often it was mentioned that switching to Unicode was very fast and easy it highly doubt Unicode handling in Delphi is reliable.
Unicode is a difficult subject, therefore clear concepts and names are mandatory.
-
Delphi 2009 delivers all the functionality you need in order to do unicode right. The main problems for conversion arises, if you tried to do unicode in Delphi 2007 or earlier, supporting utf-8 and doing other tricks. Many different methods were used, but fortunately, many of these problems can be solved using search&replace in the units that are not easily upgraded.
The upgrade process is tricky, but instead of running your head into a wall, try to use the help from other programmers. We are many who succeeded to upgrade large and complex applications without big problems.
The good news is, that once you have upgraded, you will spend a LOT less time on doing character set handling, and there is so much code that is easier to write and much easier to read.
Many of your complaints are not really meaningful, IMO. For instance, uppercase() is meant to affect only ASCII, and leave all characters >#128 as they are, and since upcase() does not make sense in UTF-16, you will need to rewrite that code if you want to support UTF-16 (instead of just UCS-2). However, if you did well with ansi before, UCS-2 (16 bit per char) is a huge improvement, and seriously, when do you need upcase() on a character that is >=#65536?
The main problems with Unicode is actually not Delphi-related. The main problem is, that if the user can specify a unicode character, how compatible is that with your “export to CSV file functionality”? Remember, that Windows still uses ANSI 8-bit characters for a lot of I/O, so you may lose a lot. Also, what if your database app wants to export data to another application, that doesn’t do unicode well, should you restrict your TEdit input to ansi-only?
However, with the introduction of the euro-symbol €, and the general use of other non-ansi characters in communication, unicode is a step that most need to implement one way or the other. Delphi 2009 makes it possible to stop focusing on character sets, and to start focusing on solving customer problems. And the upgrade process couldn’t have been made much better.
So, try not to focus on your frustrations, and try to focus on how to solve your problems. Afterwards, your life will be easier.
-
1. You are right. They (Embt) sacrificed function naming for backward compatibility. But what should they have done. Upset 90% of their user base who wants a “Load, fix and recompiled” or upset 10% who wants to do more migration by hand (and have the skills to do that)? I don’t like all those “Ansi*” calls in my code but if you want to compile the code in Delphi 2007 and 2009/2010 you must use them because new units and functions don’t exist in Delphi 2007.
2.
> > That UpperCase and LowerCase in D2009/10 only converts
> > characters in the ASCII range is because they only ever did> Because “String” was only ever ANSIString
Your code was wrong in the beginning and stays wrong in Delphi 2009+. Let’s assume you have your application in an ANSI Delphi (D7, D2007) and you use UpCase/UpperCase in the KeyDown event. If a user now types an ‘a’ he gets an ‘A’. But what if he types a German umlaut? Does he get the uppercase representation of it. No, because ‘ä’ is not in the US-ASCII range. This UpCase/UpperCase behavior is still valid in Delphi 2009+. And I would throw things at Embt if they would have changed that.
Those who wrongly used UpperCase where they should have used AnsiUpperCase are still wrong in Delphi 2009+. And those who intentionally used UpperCase in their code still have the intended behavior.UTF8ToString implementation:
> You can’t help but think that CodeGear managed to confuse
> even themselves with their approach to Unicode.Maybe they had an ANSI and Unicode version in their mind when they started to work in Delphi 2009 (and I guess they had). But as we all know they dropped the idea. So this code only shows that they forgot to clean up the source code. And during the time of development the implementations of the ANSI-Unicode functions changed so the “unused” code became broken.
> No, wait. That can’t be right surely? ANSIUppercase?
On the one side I think they should have introduced another set of functions like “UniUpperCase”. But on the other side I wouldn’t like the idea of another set of string functions. Ansi*, Wide* and Uni* filling up the global namespace. It would also be fine for me if they replaced the Wide* functions with the Uni* function because I don’t use them. But there are others that might use them. So they can’t remove them.
-
Jolyon: I don’t want to engage in a discussion of the names of functions. The prefix “Ansi” didn’t make sense with Windows 3.1, it didn’t make sense with Windows 95, and it still doesn’t make sense, simply because “Ansi” is not the 8-bit character set on most Windows computer on this planet. However, it makes a lot of sense to make upcase() work with widechar, because widechar is the type that is used for characters, and upcase() takes a character as parameter. With Delphi 2009, you would not use ansichar, unless it’s a hack or you are doing binary data which is not upcased. In short, your criticism on these points come 13 years too late and is not related to Unicode. It is even possible to argue, that Delphi 2009 improves on this topic, since Unicode contains Ansi, whereas AnsiUppercase is not about Ansi.
There are a LOT of cases, where you do not want to treat non-ascii letters as being letters. There can be several reasons, including predictability, reversibility and performance. If you prefer fast software to slow software, it makes a lot of sense to make the normal uppercase operate only on ascii, and to make unicode uppercase a function with a different, more complex name. However, this is a choice where programmer’s taste may differ, of course, but I like when the default method performs best.
I have posted an answer to your question on stackoverflow.com – your code uses a string type that has been significantly changed for Delphi 2009, but you are using it when interfacing with a DLL, that did not change. I assume that the external DLL was not written using Delphi, and was not recompiled for Delphi 2009, and in that case, your source code simply has a bug: It uses PUtf8String. My best guess is that you should use PAnsiChar() for that DLL.
-
Jolyon, I’m surprised that you dislike the name “AnsiUppercase” and then introduce “type utf8char=ansichar”. To me, it would make much more sense to say “type utf8char=array[0..5] of char”, since utf-8 encoding allows up to 6 bytes per code point
With regard to your stackoverflow.com question, you indicate that the reason for the problem is related to unicode and a new string type, even though you did not find the actual cause, yet. How do you deduct that the problem is caused by the new string types? I ask that question here, because your stackoverflow.com question does not indicate that unicode is the problem.
-
Btw, widestrings are MUCH slower than Unicodestrings, but still supported. I would expect a wide* function to operate on Widestrings, and not on unicodestrings. Anyway, it’s always easy to find things that could be improved after the release of a product – the trick is to design things properly before the release.
-
Unicode in the latest Delphi is a disaster.
It was introduced the lazy way.The natural way was to introduce new API without compiler change:
- “string” as UTF-8 / ANSI / ASCII / my-hacked-code-page
- “WideString” – binary unicode
- AnsiString – ANSI stringIn fact, VCL & Win32 API should be changed (for consistency: in all cases UTF-8), not compiler.
It is far easier to prepare string for proper display (using something like AnsiToUtf8) then change all the string parsing code your application contains.
See Lazarus UI implementation.
-
@PiotrL: I agree that utf-8 would be the best path towards unicode – it would have unicode-enabled much more source code. I had some huge discussions about that on Borland’s servers once, but the opinion was against it, mostly because the standard way to do strings in Win32, Win64, .net and Java is to use UTF-16.
One way to make a unit unicode-compatible, is to do a search & replace of all strings and replace them with utf8string. However, since the RTL, VCL and others use unicodestring, this can potentially introduce a LOT of character set conversions, making your application extremely slow because each conversion involves the Windows API. A bad design, in my opinion, but I guess noone saw that problem before it was too late. The basic problem is, that CodeGear followed the Microsoft/Java crowd instead of following the Linux crowd which solved this much more elegantly.
However, widestring is seriously awful but required for COM, and unicodestring is a huge improvement – and once you start programming with it, it solves your character set problems just as easily as .net and Java. You simply stop thinking much about it, and in a Windows world, where applications need to handle many different character sets (utf-8, oem, “ansi”, UCS-2, UTF-16 and sometimes other stuff), Delphi 2009 is a great tool that does not restrict you to utf-16.
When I thought about how to convert my PChar-intensive XML parser to Delphi 2009, I decided to rename string to rawbytestring, instead of going unicode, and only changed the API so that I/O was unicode-enabled. This was a great choice – it is now faster than before, the source code was virtually unchanged, and it is 100% unicode enabled.
-
@Jolyon: For us, Search & Replace solved many of the problems that you describe. However, we had different approaches for different units – some bit-fuddling algorithms were handled by search&replace to utf8string, but most GUI stuff and business logic was converted to unicodestring (i.e. no rename). I agree, that a compiler switch would have helped doing that, but it would have led many developers into the trap that I just described above: Lots of character-set conversions that bring your application performance down to bottom.
I don’t think that there is a solution, that could have given us good performance and backwards compatibility without using utf-8 as the default character set, and that would have given us conversions on all interfaces to the exterior, which is a lot, which would also have hurt performance. The Windows API simply forces us to use utf-16, like it or not, and that causes trouble.
We have made a quite large and complex upgrade to Delphi 2009, and my experience with that has taught me that the upgrade process is complicated, mainly because we now have to separate unicode-text, other character sets and binary stuff, often in I/O contexts. We cannot use the same TStrings object for all three types of data any more, and that’s not CodeGear’s fault.
In other words, I don’t believe that CodeGear could have made a solution, that would perform well and be easy. I haven’t yet seen a solution that would have been better in all regards – including utf-8 and a compiler switch.
-
Let’s not get too detailed into the definition of words, but UTF8, UTF16 and UCS2 are encoding systems, that cover more than, all of, and parts of Unicode. It is incorrect to call them character sets, of course. I disregard UCS4/UTF32, because nobody uses them, except in a few very rare cases. Those that favor it because it means “one string position is one character”, are wrong – because with Unicode, you can combine multiple characters. There must have been some extremely important reasons to invent this stuff since it really makes the work of programmers more difficult.
I think I now understand one of the reasons why you are arguing about uppercase(). Since Delphi 1, uppercase() has been completely useless for doing uppercase on human text in my part of the world, because uppercase() does not support the ANSI letters of my language. However, uppercase() supports English… so while I could never write code like s:=uppercase(Edit1.Text), you probably could. Since Delphi 1, I had to use ansiuppercase() for all data that the user had to see or enter. In that regard, nothing has changed.
-
You may be interested in my article concerning uppercase conversion in Delphi:
http://sergworks.wordpress.com/2009/11/15/how-to-uppecase-lowercase-strings-in-delphi-2009/ -
My project includes about 450 Delphi units and about 185,000 lines of code. I would categorize the effort required in the upgrade to Delphi 2009 as negligible. Unicode was a big requirement as my system spits out HTML web pages and emails from SQL server data, which needed to support other languages. I pretty much did a search on all instances of “PChar” and “Char” and then decided whether they should a) stay like that (now as Unicode data), b) be converted to PAnsiChar/AnsiChar or c) change the methodology to handle unicode bytes. All the “string” types stayed as “string”. It’s worth noting that my system does a lot of raw byte handling (RS-232&TCP/IP communications) and character indexing string manipulations. All text and HTML output is generated as UTF8 via a simple helper to convert the internal UCS2 strings to UTF bytes.
So far the only significant inconvenience I have encountered is this non-conforming UpperCase() function, which I have fixed after googling onto this article and illogically replacing it with “AnsiUppercase”, which is described in the help file as supporting MBCS. Thanks.



DelphiFeeds
23 comments
Comments feed for this article
Trackback link: http://www.deltics.co.nz/blog/wp-trackback.php?p=563