1From: herbs@cntc.com (Herb Sutter)
2Subject: Guru of the Week #29: Solution
3Date: 22 Jan 1998 00:00:00 GMT
4Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu>
5Newsgroups: comp.lang.c++.moderated
6
7
8 .--------------------------------------------------------------------.
9 |  Guru of the Week problems and solutions are posted regularly on   |
10 |   news:comp.lang.c++.moderated. For past problems and solutions    |
11 |            see the GotW archive at http://www.cntc.com.            |
12 | Is there a topic you'd like to see covered? mailto:herbs@cntc.com  |
13 `--------------------------------------------------------------------'
14_______________________________________________________
15
16GotW #29:   Strings
17
18Difficulty: 7 / 10
19_______________________________________________________
20
21
22>Write a ci_string class which is identical to the
23>standard 'string' class, but is case-insensitive in the
24>same way as the C function stricmp():
25
26The "how can I make a case-insensitive string?"
27question is so common that it probably deserves its own
28FAQ -- hence this issue of GotW.
29
30Note 1:  The stricmp() case-insensitive string
31comparison function is not part of the C standard, but
32it is a common extension on many C compilers.
33
34Note 2:  What "case insensitive" actually means depends
35entirely on your application and language.  For
36example, many languages do not have "cases" at all, and
37for languages that do you have to decide whether you
38want accented characters to compare equal to unaccented
39characters, and so on.  This GotW provides guidance on
40how to implement case-insensitivity for standard
41strings in whatever sense applies to your particular
42situation.
43
44
45Here's what we want to achieve:
46
47>    ci_string s( "AbCdE" );
48>
49>    // case insensitive
50>    assert( s == "abcde" );
51>    assert( s == "ABCDE" );
52>
53>    // still case-preserving, of course
54>    assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
55>    assert( strcmp( s.c_str(), "abcde" ) != 0 );
56
57The key here is to understand what a "string" actually
58is in standard C++.  If you look in your trusty string
59header, you'll see something like this:
60
61  typedef basic_string<char> string;
62
63So string isn't really a class... it's a typedef of a
64template.  In turn, the basic_string<> template is
65declared as follows, in all its glory:
66
67  template<class charT,
68           class traits = char_traits<charT>,
69           class Allocator = allocator<charT> >
70      class basic_string;
71
72So "string" really means "basic_string<char,
73char_traits<char>, allocator<char> >".  We don't need
74to worry about the allocator part, but the key here is
75the char_traits part because char_traits defines how
76characters interact and compare(!).
77
78basic_string supplies useful comparison functions that
79let you compare whether a string is equal to another,
80less than another, and so on.  These string comparisons
81functions are built on top of character comparison
82functions supplied in the char_traits template.  In
83particular, the char_traits template supplies character
84comparison functions named eq(), ne(), and lt() for
85equality, inequality, and less-than comparisons, and
86compare() and find() functions to compare and search
87sequences of characters.
88
89If we want these to behave differently, all we have to
90do is provide a different char_traits template!  Here's
91the easiest way:
92
93  struct ci_char_traits : public char_traits<char>
94                // just inherit all the other functions
95                //  that we don't need to override
96  {
97    static bool eq( char c1, char c2 ) {
98      return tolower(c1) == tolower(c2);
99    }
100
101    static bool ne( char c1, char c2 ) {
102      return tolower(c1) != tolower(c2);
103    }
104
105    static bool lt( char c1, char c2 ) {
106      return tolower(c1) < tolower(c2);
107    }
108
109    static int compare( const char* s1,
110                        const char* s2,
111                        size_t n ) {
112      return strnicmp( s1, s2, n );
113             // if available on your compiler,
114             //  otherwise you can roll your own
115    }
116
117    static const char*
118    find( const char* s, int n, char a ) {
119      while( n-- > 0 && tolower(*s) != tolower(a) ) {
120          ++s;
121      }
122      return s;
123    }
124  };
125
126And finally, the key that brings it all together:
127
128  typedef basic_string<char, ci_char_traits> ci_string;
129
130All we've done is created a typedef named "ci_string"
131which operates exactly like the standard "string",
132except that it uses ci_char_traits instead of
133char_traits<char> to get its character comparison
134rules.  Since we've handily made the ci_char_traits
135rules case-insensitive, we've made ci_string itself
136case-insensitive without any further surgery -- that
137is, we have a case-insensitive string without having
138touched basic_string at all!
139
140This GotW should give you a flavour for how the
141basic_string template works and how flexible it is in
142practice.  If you want different comparisons than the
143ones stricmp() and tolower() give you, just replace the
144five functions shown above with your own code that
145performs character comparisons the way that's
146appropriate in your particular application.
147
148
149
150Exercise for the reader:
151
152Is it safe to inherit ci_char_traits from
153char_traits<char> this way?  Why or why not?
154
155
156