Version 1 (modified by 15 years ago) ( diff ) | ,
---|
String API
Recently HelenOS switched to a new string API. It is non-standard, but in many ways similar to the ANSI C string API.
- Character Repertoire and
- String Metrics
- Encoding and Decoding a Character
- Output Buffers
- Function Reference
Character Repertoire and Encoding
HelenOS uses the Universal Character Set or UCS (as defined by ISO/IEC 10646) for representing characters throughout the system. A single character is represented as wchar_t
(32-bit). Normally all strings are represented in UTF-8 and null-terminated. A string is usually declared as char *
. The API also has limited support for strings that are not null-terimanted (or sub-strings).
There is also limited support for wide strings. These are encoded in UTF-32 and null-terminated. Wide strings can represent exactly the same characters like normal strings. However, with UTF-8 each character is encoded as one or more bytes. With UTF-32, which is used for the wide strings, each character is encoded as exactly four bytes.
Character and String Literals
In source code non-ASCII characters should only be used in character and string literals. Keep in mind that HelenOS source files are encoded in UTF-8, too. Non-ASCII character literals need to be written as L'x'
. String literals are written the usual way ("string"
) and wide-string literals are written as L"wide string"
.
String Metrics
- Size
- Length
- Width
Encoding and Decoding a Character
- str_decode()
- chr_encode()
Well-formed Strings
A string is considered well formed if and only if it is null-terminated and consists only of complete and valid UTF-8-encoded characters (i.e. it can be decoded with str_decode()
without error). Unless stated otherwise, all strings passed to functions must be well-formed and all string functions produce well-formed strings.
Output Buffers
Whenever the user supplies an output buffer to a string function, they must also pass the size of this buffer to the function (it is always passed in the following argument). The buffer size must be greater than zero. The function will always fill the buffer with a well-formed string. If the string produced does not fit in the buffer, the function will only store as many (complete) characters as possible and add the null terminator.
Function Reference
- str_size()
- wstr_size()
- str_lsize()
- wstr_lsize()
- str_length()
- wstr_length()
- str_nlength()
- wstr_nlength()
- str_cpy()
- str_ncpy()
- str_append()
- str_dup()
- wstr_nstr()
- str_chr()
- str_rchr()