Making a case for Strings, the sane way

[Estimated Reading Time: < 1 minute]

Lars Fosdal responded to my previous post suggesting a way of implementing string support in a case-like construct (but not actually a case statement) using generics and anonymous methods.

All very clever, but way, way too complicated and – if you don’t mind me saying so – as ugly as sin into the bargain (imho – ymmv).

For simple cases [sic], it is actually relatively straightforward to uses strings in a case statement.

Just add this function to a convenient unit:


  interface

    function StringIndex(const aString: String;
                         const aCases: array of String;
                         const aCaseSensitive: Boolean = TRUE): Integer;

  implementation 

    function StringIndex(const aString: String;
                         const aCases: array of String;
                         const aCaseSensitive: Boolean): Integer;
    begin
      if aCaseSensitive then
      begin
        for result := 0 to Pred(Length(aCases)) do
          if ANSISameText(aString, aCases[result]) then
            EXIT;
      end
      else
      begin
        for result := 0 to Pred(Length(aCases)) do
          if ANSISameStr(aString, aCases[result]) then
            EXIT;
      end;

      result := -1;
    end;

And now you can use this in an ordinary case statement:

  case StringIndex(SomeString, [SelectorA,
                                SelectorB,
                                SelectorC]) of
    0: // for SelectorA
    1: // for SelectorB
    2: // for SelectorC
  else
    // Some other SomeString
  end;

I added a case-sensitivity switch, but more sophisticated options could be provided such as partial matching. Similarly, there could possibly be some optimisation of the string matching within the function (perhaps triggered by a number of cases above a certain threshold), but this is the bare bones of what you need.

In fact, I am pretty sure I’ve seen something similar to this in the past, but I can’t put my finger on when or where.

In any event, build on, improve, embellish to your hearts content, and all with perfectly clear, concise, Pascal-like syntax in all versions of Delphi.

32 thoughts on “Making a case for Strings, the sane way”

Mason Wheeler says:

02 Dec 2010 at 13:13

Yeah, I’ve built something like this a few times. It’s probably the best solution until we get an actual CASE on any type.
oxffff says:

02 Dec 2010 at 17:49

it is very bad to steal my idea!!!

April 2010

See
http://santonov.blogspot.com/2010/04/just-any-type-delphi-case-statement.html

You may find my more smart method to support subtyping on case elements

https://forums.embarcadero.com/message.jspa?messageID=229522
Joe Meyer says:

02 Dec 2010 at 17:53

you can also use IndexText() and IndexStr() from SysUtils, can’t you?
François says:

02 Dec 2010 at 18:49

I also sometimes used GetEnumValue to do a case on strings, having declared an enum with all the strings I intended to allow… (not very clean either).

PS: Maybe you saw something similar to your function here: http://delphi.about.com/cs/adptips2002/a/bltip0202_5.htm
duriel says:

02 Dec 2010 at 19:12

Some time ago:
http://forum.delphi-treff.de/showthread.php?7486-Case-STRING-of&p=48030&viewfull=1#post48030

25.10.2003 and is still working 🙂
Anders Isaksson says:

02 Dec 2010 at 19:53

This is a recurring question – I wrote an article for UNDU (now long dead) in February, 2000, as a reply to an earlier article there.

My preferred method is using sorted stringlists with the ‘case index’ stored in .Objects[]. By doing it this way I can have more than one case label going to the same case, and adding labels will never invalidate existing code.

See it all in its archived glory: http://web.archive.org/web/20000303161411/http://www.undu.com/Articles/000229c.html
Marco Erdmann says:

02 Dec 2010 at 20:30

Why not just use StrUtils.IndexStr() or IndexText() which does exactly what you describe? Sure, if you need a faster/better implementation you have to do it yourself, but why reinvent the wheel?
1. Jolyon Smith says:
  
  03 Dec 2010 at 08:04
  
  @Michael and Marco – see other replies to similar comments.
Michael says:

02 Dec 2010 at 20:44

Hi,

Du you know there’s a function called “AnsiIndexStr”. It’s a part of the StrUtils Unit.

uses
StrUtils;

case AnsiIndexStr(‘ABC’,[‘XYZ’,’ABC’,’OPQ’]) of
0: ;
1: ;
2: ;
end;

The only disadvantage to your function: AnsiIndexStr does not support case-insensitive comparisions.
Dorin Duminica says:

02 Dec 2010 at 20:55

I know dynamic arrays start at 0 but I hate having magic numbers in code where you can have a constant or in this case use “Low” method “for result := Low(aCases) to High(aCases) do” 😉
1. Jolyon Smith says:
  
  03 Dec 2010 at 08:04
  
  @Dorin – see my reply to LDS. In this case it’s not a “magic” number, it’s a reliable, fixed constant. The array param as declared cannot be indexed any other way.
  
  Using Low() and High() might it self confuse someone into thinking that the bounds of the array may under some circumstances be something other than 0 and Length – 1, given that the function allows for such, which might then lead them to try to pass such an array and wonder why they can’t, after inspecting the source.
  
  Ok, it’s a bit of an extreme thought experiment, but I do see that sort of thinking going on. 🙂
Stefan says:

02 Dec 2010 at 23:29

Maybe not reinventing the wheel and using existing stuff instead? 🙂

http://docwiki.embarcadero.com/VCL/en/StrUtils.AnsiIndexText
1. Jolyon Smith says:
  
  03 Dec 2010 at 08:01
  
  @Stefan – see my reply to Anthony. Yes it exists, but the fact that I (for one) didn’t know it exists after 15 years of Delphing says something. It is also ever so slightly less efficient, and slightly more cumbersome to use (should you ever change your mind about case sensitivity, requiring you to call a different function, rather than having that behaviour conveniently parameterised).
LDS says:

02 Dec 2010 at 23:47

Maybe

for result := Low(aCases) to High(aCases) do

is clearer?
1. Jolyon Smith says:
  
  03 Dec 2010 at 07:59
  
  @LDS – possibly, but I’d say this is an issue of preference. As declared, the array passed in must be indexed 0 to Length – 1. If there was the potential for different array types to be passed then some allowance for varying index bounds would have to be made, but not in this case.
  
  At least, I don’t think so.
Jeroen Pluimers says:

03 Dec 2010 at 00:31

Nice!
–jeroen
Kibab says:

03 Dec 2010 at 00:41

Why I don’t see warning with this code like:
[DCC Warning] Unit5.pas(30): W1037 FOR-Loop variable ‘Result’ may be undefined after loop.

This happens if in StringIndex at end I add code like this:
if Result = 1 then
Result := 1;

There is a variable Result used in loop, but mostly in common cases, using (local?)variables in for-loop raises warning as above.
I know that Result in most cases (Integer/Boolean/?) is stored in registry (EAX?), so in this case is it 100% sure that in 100% cases using Result in for-loop will be safe?
Does compiler handles that for-loop for Result different than regular variables, or it’s just coincident that in this case works (same register used for for-loop as in for Result, does this can change in future compilers)?
1. Jolyon Smith says:
  
  03 Dec 2010 at 07:58
  
  @Kibab – the caution about using loop variables after a loop applies when the loop variable is not used within the loop itself. This is because in those circumstances the compiler is able (and will often) do tricks to optimise the loop (e.g. the loop counter may run in reverse):
  
  for i := 0 to 1000 do
  // stuff that doesn’t involve “i”
  
  May actually generate:
  
  for i := 1000 downto 0 do
  
  The compiler does this because at machine code level, it is far more efficient to test for non-zero than it is to test for equality to some non-zero ordinal, and since this test must be performed for each iteration of the loop, the accumulated efficiency gain can be significant.
  
  But when the loop variable is used in the body of the loop, it is reliable afaik.
Anthony Frazier says:

03 Dec 2010 at 03:19

“In fact, I am pretty sure I’ve seen something similar to this in the past, but I can’t put my finger on when or where.”

Maybe in the RTL? 🙂 IndexText/IndexStr, which only existed as AnsiIndexText/AnsiIndexStr in D7 (and maybe D2005, but I don’t have it installed right now to check it’s VCL).

I’ve made extensive use of them for doing exactly this sort of thing. It’s not perfect, but it’s better than nothing.
1. Jolyon Smith says:
  
  03 Dec 2010 at 07:55
  
  @Anthony – aha! Yes, that does the job, tho ever so slightly less efficiently. Another of those cases of a badly named RTL function – verb-noun implies some modification (the “doing” verb) will be applied (to the subject noun).
  
  If I were looking to index a string, then I might look at/for IndexStr/IndexText. but if I need to find the index OF a string, then I’m going to look for a noun-noun (subject-property) name (and when I’m reading my code in the future, I would like the functions I am using to reflect this as far as possible too, so that the code reads/says what it does).
  
  +0.02, ymmv 🙂
Lars Fosdal says:

03 Dec 2010 at 04:16

I don’t deny that it is pretty nasty, but – it actually carries one signficant benefit over your StringIndex. There is no intermediate index that can be messed up if you add an item in the middle, or reorder the entries.
1. Jolyon Smith says:
  
  03 Dec 2010 at 07:50
  
  @Lars – Yep, that is true. Sometimes developers have to be responsible for getting things right – no avoiding that unfortunately. 🙂
  
  But for small string sets that is unlikely to occur and would be easily spotted, and for small strings sets performance is likely to be acceptable. I have already thought how it would be trivial to extend the approach to register strings with some ordinal identifier, to make the case value independent of the string array entry order, but I don’t need such a refinement yet (in one of those serendipitous moments, the subject cropped up in my localised blogsphere at exactly the same time that I found myself needing – or at least wanting – something and just StringIndex() just happened to fit! :))
Fabricio says:

03 Dec 2010 at 05:46

If the arrays get bigger (from a delimited string from a outside file, for example), would be interesting using a sorted TStringList (intern to the unit) to get binary string searches instead (with it’s life time managed in initialization/finalization section).
1. Jolyon Smith says:
  
  03 Dec 2010 at 07:47
  
  @Fabricio – As I say, this can be refined and extended to suit. The overhead of setting up and tearing down a TStringlist is likely to offset any benefits from binary searching for smaller string sets.
  
  Similarly, for very large string sets there are additional optimisations you can perform, especially if the strings you are casing on are ASCII – you can build an array of stringlists, one for each initial letter. Depending on how many entries are in your string set for a given initial letter you can then use an array, string list scan or string list binary search accordingly.
  
  That of course requires that you do some work to build the necessary indexing and meta-data about your strings, but it may be worth it – it all depends on your specific needs.
  
  If you have large string sets that you repeatedly use, then a mechanism to initialise a string set once and re-use it in those cases where you need to branch based on it’s content would be trivial to provide, with a couple of overloads to StringIndex():
  
  function StringIndex( aValue, array of String): Integer; overload;
  function StringIndex( aValue, TStrings): Integer; overload;
  function StringIndex( aValue, TStringTable): Integer;
  
  Having said that, in the latter two cases, StringIndex() is likely to be a redundant wrapper around an “IndexOf()” function that TStrings or TStringTable is already providing – StringIndex() is useful for arrays because arrays have no such in-built facility.
Stefan says:

03 Dec 2010 at 09:10

No offense but not knowing it exists after 15 years using Delphi does not proof anything (well maybe the lack of documentation or the lack of using it). Less efficient? Have you looked at the code? It is exactly the same as your code with the exception of having 2 methods instead of 1. You should know that this method is not the only one having 2 different versions (…Str/String for case sensitivity and …Text for non case sensitivity)
And btw if there were some
case SomeString of
SelectorA: // for SelectorA
SelectorB: // for SelectorB
SelectorC: // for SelectorC
else
// Some other SomeString
end;

in Delphi, what would you expect? Case sensitivity or not? Well since I expect case sensitivity when doing “if s1 = s2 then” I would expect it here as well. If I want it without case sensitivity I would upper or lowercase the strings.
Stefan says:

03 Dec 2010 at 09:13

P.S.: Using Exit to break loops is evil 😛
Jolyon Smith says:

03 Dec 2010 at 09:57

@Stefan: Yes I looked at the code, and yes the use of two functions is one source of inefficiency.

This is especially lazy and inexcusable given that the “non-ANSI” function simply calls the ANSI version, rather than providing a true “non-ANSI” – (i.e. in the New World Order actually “NON-UNICODE”) – alternative. The use of a local var and assignment to result is another.

I was careful to qualify the observed “inefficiency” as “ever so slight”. 🙂

Re: case sensitivity. Yep, if the case statement supported strings I would expect it to be case sensitive, which is why my StringIndex() defaults to this behaviour.

But case doesn’t support strings, so a convenient alternative mechanism has to be provided. The fact that an inconvenience (the alternate mechanism) is required doesn’t excuse making that alternative more inconvenient than it needs to be. 🙂

As regards using EXIT to break a loop. I didn’t use EXIT to break the loop.

I used EXIT to indicate the terminal condition of my function. It *happened* to occur in a loop. 🙂

The alternative was to BREAK the loop then allow execution to drop out of the loop and then immediately fall out of the function. i.e. more reading to understand that the code would be doing…. exactly the same thing.

You may have differing preferences, but for myself I perhaps employ far more prag than dog in my ma.

🙂
Stefan says:

03 Dec 2010 at 11:07

I think we agree to disagree at least concerning coding style. 🙂
I try to avoid Exit whenever possible due to numerous reasons. One of them is if you tend to use Exit to “jump over” code that is not executed in this particular case like in your example that pretty much cries for putting these parts into seperate routines.
But I guess this is some topic for someone else to blog about 😛
1. Jolyon Smith says:
  
  03 Dec 2010 at 12:21
  
  @Stefan: Not sure what you mean by “jumping over code… like in your example”. In this case there is no code being jumped over. Once the exit condition has been reached, it exits.
  
  BREAK in this case would be the construct that jumps over code (loop constructs) to reach other code – the other code being the *implicit* EXIT point.
  
  As I say, I apply pragmatism in my code, rather than dogma. I’ve even been known to use “with” [GASP] on very rare occasion. 🙂
  
  I know, I know – I should be stripped of all Delphi credibility and credentials (did I have but any) and cast out into the wilderness for such sins. LOL
Lars Fosdal says:

04 Dec 2010 at 03:24

I prefer to think it is only those that know what they are doing, that can truly abuse all the keywords and constructs the Pascal language 😉
Rudy Velthuis says:

04 Dec 2010 at 09:06

As someone in the public Embarcadero groups remarked: Beside the fact that Delphi already has several such functions, this and those functions all have the severe disadvantage that the indices are all wrong as soon as you add a string in the middle of the array. IOW, there is no direct relationship between the index returned and the string passed, only an indirect one.

FWIW, I personally don’t mind using Exit in such loops. Break would be too cumbersome, since you’d need some signal value to check if the second loop should be entered, etc.
Ancient_Hacker says:

16 Dec 2010 at 09:50

I’ve used this solution for a couple decades:

function Grab( const What, Pat: String ): Char;
var
p: integer;
begin
p := pos( ‘|’ + What + ‘|’, pat );
if p = 0 then Grab:= ‘?’ else Grab := Pat[ p – 1 ];
end;

Which := ‘Three’;

case Grab( Which, ‘1|One|2|Two|3|Three|…’ ) of
‘1’: …;
‘2’: …;
end.

Comments are closed.

Making a case for Strings, the sane way

Related

32 thoughts on “Making a case for Strings, the sane way”