Changes between Initial Version and Version 1 of StringAPI


Ignore:
Timestamp:
2009-05-01T21:50:06Z (15 years ago)
Author:
Jiri Svoboda
Comment:

The new and shiny API (t.m.)

Legend:

Unmodified
Added
Removed
Modified
  • StringAPI

    v1 v1  
     1= String API =
     2
     3Recently HelenOS switched to a new string API. It is non-standard, but in many ways similar to the ANSI C string API.
     4
     5 * [#CharacterRepertoire Character Repertoire and]
     6 * [#StringMetrics String Metrics]
     7 * [#EncodingDecoding Encoding and Decoding a Character]
     8 * [#OutputBuffers Output Buffers]
     9 * [#FunctionReference Function Reference]
     10
     11== Character Repertoire and Encoding == #CharacterRepertoire
     12
     13HelenOS uses the Universal Character Set or UCS (as defined by ISO/IEC 10646) for representing characters throughout the system. A single ''character'' is represented as `wchar_t` (32-bit). Normally all ''strings'' are represented in UTF-8 and null-terminated. A string is usually declared as `char *`. The API also has limited support for strings that are not null-terimanted (or sub-strings).
     14
     15There is also limited support for ''wide strings''. These are encoded in UTF-32 and null-terminated. Wide strings can represent exactly the same characters like normal strings. However, with UTF-8 each character is encoded as one or more bytes. With UTF-32, which is used for the wide strings, each character is encoded as exactly four bytes.
     16
     17== Character and String Literals ==
     18
     19In source code non-ASCII characters should only be used in character and string literals. Keep in mind that HelenOS source files are encoded in UTF-8, too. Non-ASCII character literals need to be written as `L'x'`. String literals are written the usual way (`"string"`) and wide-string literals are written as `L"wide string"`.
     20
     21== String Metrics == #StringMetrics
     22
     23 * Size
     24 * Length
     25 * Width
     26
     27== Encoding and Decoding a Character == #EncodingDecoding
     28
     29 * str_decode()
     30 * chr_encode()
     31
     32== Well-formed Strings ==
     33
     34A string is considered ''well formed'' if and only if it is null-terminated and consists only of complete and valid UTF-8-encoded characters (i.e. it can be decoded with `str_decode()` without error). Unless stated otherwise, all strings passed to functions must be well-formed and all string functions produce well-formed strings.
     35
     36== Output Buffers == #OutputBuffers
     37
     38Whenever the user supplies an output buffer to a string function, they must also pass the size of this buffer to the function (it is always passed in the following argument). The buffer size ''must be greater than zero''. The function will always fill the buffer with a well-formed string. If the string produced does not fit in the buffer, the function will only store as many (complete) characters as possible and add the null terminator.
     39
     40== Function Reference == #FunctionReference
     41
     42 * str_size()
     43 * wstr_size()
     44 * str_lsize()
     45 * wstr_lsize()
     46
     47 * str_length()
     48 * wstr_length()
     49 * str_nlength()
     50 * wstr_nlength()
     51
     52 * str_cpy()
     53 * str_ncpy()
     54 * str_append()
     55 * str_dup()
     56
     57 * wstr_nstr()
     58
     59 * str_chr()
     60 * str_rchr()