= String API = Recently HelenOS switched to a new string API. It is non-standard, but in many ways similar to the ANSI C string API. * [#CharacterRepertoire Character Repertoire and Encoding] * [#StringMetrics String Metrics] * [#PrefixFunctions Functions Operating on Prefixes] * [#EncodingDecoding Encoding and Decoding a Character] * [#OutputBuffers Output Buffers] * [#FunctionReference Function Reference] == Character Repertoire and Encoding == #CharacterRepertoire HelenOS uses the Universal Character Set or UCS (as defined by ISO/IEC 10646) for representing characters throughout the system. A single ''character'' is represented as `wchar_t` (32-bit). Normally all ''strings'' are represented in UTF-8 and null-terminated. A string is usually declared as `char *`. The API also has limited support for strings that are not null-terimanted (or sub-strings). There is also limited support for ''wide strings''. These are encoded in UTF-32 and null-terminated. Wide strings can represent exactly the same characters like normal strings. However, with UTF-8 each character is encoded as one or more bytes. With UTF-32, which is used for the wide strings, each character is encoded as exactly four bytes. == Character and String Literals == In source code non-ASCII characters should only be used in character and string literals. Keep in mind that HelenOS source files are encoded in UTF-8, too. Non-ASCII character literals need to be written as `L'x'`. String literals are written the usual way (`"string"`) and wide-string literals are written as `L"wide string"`. == String Metrics == #StringMetrics Unlike with an 8-bit encoding, there is not a 1:1:1 mapping between bytes in memory, characters and display cells on a monospace display. Therefore, three different metrics are needed: * ''Size'' is the number of bytes to which the string is encoded, ''excluding'' the null terminator. * ''Length'' is the number of ''characters'' in the string (i.e. the number of times we need to call str_decode() or chr_encode()). Again the null terminator is not counted. * ''Width'' is the number of display cells on a monospace display the string will be rendered to. == Encoding and Decoding a Character == #EncodingDecoding * str_decode() * chr_encode() == Well-formed Strings == A string is considered ''well formed'' if and only if it is null-terminated and consists only of complete and valid UTF-8-encoded characters (i.e. it can be decoded with `str_decode()` without error). Unless stated otherwise, all strings passed to functions must be well-formed and all string functions produce well-formed strings. == Output Buffers == #OutputBuffers Whenever the user supplies an output buffer to a string function, they must also pass the size of this buffer to the function (it is always passed in the following argument). The buffer size ''must be greater than zero''. The function will always fill the buffer with a well-formed string. If the string produced does not fit in the buffer, the function will only store as many (complete) characters as possible and add the null terminator. == Function Reference == #FunctionReference Some functions operate on string prefixes. These have a name like `str_[n|l|w]op()`. Such a function only uses a prefix of the string limited by a metric, `n` for size, `l` for length, `w` for width. * str_size() * wstr_size() * str_lsize() * wstr_lsize() * str_length() * wstr_length() * str_nlength() * wstr_nlength() * str_cpy() * str_ncpy() * str_append() * str_dup() * wstr_nstr() * str_chr() * str_rchr()