Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of StringAPI

Timestamp:: 2009-05-01T21:50:06Z (16 years ago)
Author:: Jiri Svoboda
Comment:: The new and shiny API (t.m.)

Legend:

: Unmodified
: Added
: Removed
: Modified

StringAPI

               v1
+= String API =
+Recently HelenOS switched to a new string API. It is non-standard, but in many ways similar to the ANSI C string API.
+ * [#CharacterRepertoire Character Repertoire and]
+ * [#StringMetrics String Metrics]
+ * [#EncodingDecoding Encoding and Decoding a Character]
+ * [#OutputBuffers Output Buffers]
+ * [#FunctionReference Function Reference]
+== Character Repertoire and Encoding == #CharacterRepertoire
+HelenOS uses the Universal Character Set or UCS (as defined by ISO/IEC 10646) for representing characters throughout the system. A single ''character'' is represented as `wchar_t` (32-bit). Normally all ''strings'' are represented in UTF-8 and null-terminated. A string is usually declared as `char *`. The API also has limited support for strings that are not null-terimanted (or sub-strings).
+There is also limited support for ''wide strings''. These are encoded in UTF-32 and null-terminated. Wide strings can represent exactly the same characters like normal strings. However, with UTF-8 each character is encoded as one or more bytes. With UTF-32, which is used for the wide strings, each character is encoded as exactly four bytes.
+== Character and String Literals ==
+In source code non-ASCII characters should only be used in character and string literals. Keep in mind that HelenOS source files are encoded in UTF-8, too. Non-ASCII character literals need to be written as `L'x'`. String literals are written the usual way (`"string"`) and wide-string literals are written as `L"wide string"`.
+== String Metrics == #StringMetrics
+ * Size
+ * Length
+ * Width
+== Encoding and Decoding a Character == #EncodingDecoding
+ * str_decode()
+ * chr_encode()
+== Well-formed Strings ==
+A string is considered ''well formed'' if and only if it is null-terminated and consists only of complete and valid UTF-8-encoded characters (i.e. it can be decoded with `str_decode()` without error). Unless stated otherwise, all strings passed to functions must be well-formed and all string functions produce well-formed strings.
+== Output Buffers == #OutputBuffers
+Whenever the user supplies an output buffer to a string function, they must also pass the size of this buffer to the function (it is always passed in the following argument). The buffer size ''must be greater than zero''. The function will always fill the buffer with a well-formed string. If the string produced does not fit in the buffer, the function will only store as many (complete) characters as possible and add the null terminator.
+== Function Reference == #FunctionReference
+ * str_size()
+ * wstr_size()
+ * str_lsize()
+ * wstr_lsize()
+ * str_length()
+ * wstr_length()
+ * str_nlength()
+ * wstr_nlength()
+ * str_cpy()
+ * str_ncpy()
+ * str_append()
+ * str_dup()
+ * wstr_nstr()
+ * str_chr()
+ * str_rchr()