The getField C utility In modern programming languages, doing an operation such as splitting a string into parts is a simple process. However, in the C world, this seemingly simple task is not a part of the standard ANSI-C library. If you use C++, there is a string class generally available that helps you in this regard, but even those may vary in operation from compiler to compiler.
Towards this end, many years ago I wrote a function that I've used on
practically every C programming job I've ever had.
Initially, this was simply used to extract a field from a string and place
it into a buffer provided by the caller. That is what the function This worked fine as long as all I had to deal with was English ASCII text that was defined as a sequence of single byte characters terminated by a NULL character (i.e., a C string). My first job dealing with internationalization changed this drastically. ASCII was defined many decades ago, by mono-lingual English speakers. It didn't consider other languages. It defines 128 characters, half of what a byte can hold, and it seemed plenty: 26 for lower case letters, 26 for upper-case letters, 10 for the digits, a few dozen for special characters (percent, ampersand, caret, etc.), some hidden characters for computer talk (NULL, SOH, EOL, etc.), and you still have some left over. But then the Europeans complained, "What about our umlauts and cidillas and other special characters?" So the powers that be decided to define "extended ASCII": this used the other 127 values in a byte to define all those extra characters that European languages use. Offically called ISO 8859-1, this is frequently called Latin1, and often is the default character set on computers sold in western countries. Many western web sites also use this encoding. This website, however, uses what is fast becoming the best encoding for internationalization: UTF-8. UTF-8 is a type of unicode encoding whereby all letters in the world in any language can be uniquely identified. So Japanese, Arabic, Hebrew, Mandarin, and any other language can be displayed properly.
I wanted my
This solved all my internationalization problems, but then I ran into one
regarding efficiency. I was designing a startup process whereby a large
configuration file (with multi-lingual data) would be loaded into memory,
an algorithm would be used to call
I determined that if I could convert my linked lists so that they held
pointers to the data in the large config file (which was held in memory),
that would
save me a memory allocation operation for every piece of data, as well
as the CPU cycles required to copy the data into a second buffer. Thus was
born my third function in this suite,
If you have some very low-level, multi-lingual data manipulation you have
to do, this functions may be of some use.
There is one very large assumption throughout this code that you must
know about: I assume that the size of a
|