Changeset 1d2f85e in mainline for kernel/generic/src/lib/str.c
- Timestamp:
- 2019-02-05T18:26:54Z (6 years ago)
- Parents:
- 08e103d4
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
kernel/generic/src/lib/str.c
r08e103d4 r1d2f85e 41 41 * Strings and characters use the Universal Character Set (UCS). The standard 42 42 * strings, called just strings are encoded in UTF-8. Wide strings (encoded 43 * in UTF-32) are supported to a limited degree. A single c haracteris43 * in UTF-32) are supported to a limited degree. A single code point is 44 44 * represented as wchar_t.@n 45 45 * … … 50 50 * byte 8 bits stored in uint8_t (unsigned 8 bit integer) 51 51 * 52 * character UTF-32 encoded Unicode c haracter, stored in wchar_t52 * character UTF-32 encoded Unicode code point, stored in wchar_t 53 53 * (signed 32 bit integer), code points 0 .. 1114111 54 54 * are valid … … 66 66 * the NULL-terminator), size_t 67 67 * 68 * [wide] string length number of C HARACTERS in a [wide] string (excluding68 * [wide] string length number of CODE POINTS in a [wide] string (excluding 69 69 * the NULL-terminator), size_t 70 70 * … … 80 80 * NULL-terminator) 81 81 * 82 * length l size_t number of C HARACTERS in a string (excluding the82 * length l size_t number of CODE POINTS in a string (excluding the 83 83 * null terminator) 84 84 * … … 89 89 * Function naming prefixes:@n 90 90 * 91 * chr_ operate on c haracters91 * chr_ operate on code points 92 92 * ascii_ operate on ASCII characters 93 93 * str_ operate on strings … … 102 102 * pointer (char *, wchar_t *) 103 103 * byte offset (size_t) 104 * c haracterindex (size_t)104 * code point index (size_t) 105 105 * 106 106 */ … … 137 137 #define CONT_BITS 6 138 138 139 /** Decode a single c haracter from astring.140 * 141 * Decode a single c haracterfrom a string of size @a size. Decoding starts139 /** Decode a single code point from an UTF-8 encoded string. 140 * 141 * Decode a single code point from a string of size @a size. Decoding starts 142 142 * at @a offset and this offset is moved to the beginning of the next 143 * c haracter. In case of decoding error, offset generally advances at least143 * code point. In case of decoding error, offset generally advances at least 144 144 * by one. However, offset is never moved beyond size. 145 145 * … … 148 148 * @param size Size of the string (in bytes). 149 149 * 150 * @return Value of decoded c haracter, U_SPECIAL on decoding error or150 * @return Value of decoded code point, U_SPECIAL on decoding error or 151 151 * NULL if attempt to decode beyond @a size. 152 152 * … … 207 207 } 208 208 209 /** Encode a single c haracter tostring representation.210 * 211 * Encode a single c haracter to string representation (i.e. UTF-8)and store209 /** Encode a single code point to a UTF-8 string representation. 210 * 211 * Encode a single code point to a UTF-8 string representation and store 212 212 * it into a buffer at @a offset. Encoding starts at @a offset and this offset 213 * is moved to the position where the next c haractercan be written to.214 * 215 * @param ch Input c haracter.213 * is moved to the position where the next code point can be written to. 214 * 215 * @param ch Input code point. 216 216 * @param str Output buffer. 217 217 * @param offset Byte offset where to start writing. 218 218 * @param size Size of the output buffer (in bytes). 219 219 * 220 * @return EOK if the c haracterwas encoded successfully, EOVERFLOW if there221 * was not enough space in the output buffer or EINVAL if the c haracter220 * @return EOK if the code point was encoded successfully, EOVERFLOW if there 221 * was not enough space in the output buffer or EINVAL if the code point 222 222 * code was invalid. 223 223 */ … … 313 313 } 314 314 315 /** Get size of string with lengthlimit.315 /** Get size of string with code point count limit. 316 316 * 317 317 * Get the number of bytes which are used by up to @a max_len first 318 * c haracters in the string @a str. If @a max_len is greater than319 * the length of @a str, the entire string is measured (excluding the320 * NULL-terminator).318 * code points in the string @a str. If @a max_len is greater than 319 * the number of code points in @a str, the entire string is measured 320 * (excluding the NULL-terminator). 321 321 * 322 322 * @param str String to consider. 323 * @param max_len Maximum number of c haracters to measure.324 * 325 * @return Number of bytes used by the c haracters.323 * @param max_len Maximum number of code points to measure. 324 * 325 * @return Number of bytes used by the code points. 326 326 * 327 327 */ … … 344 344 * 345 345 * Get the number of bytes which are used by up to @a max_len first 346 * wide characters in the wide string @a str. If @a max_len is greater than346 * code points in the wide string @a str. If @a max_len is greater than 347 347 * the length of @a str, the entire wide string is measured (excluding the 348 348 * NULL-terminator). 349 349 * 350 350 * @param str Wide string to consider. 351 * @param max_len Maximum number of wide characters to measure.352 * 353 * @return Number of bytes used by the wide characters.351 * @param max_len Maximum number of code points to measure. 352 * 353 * @return Number of bytes used by the code points. 354 354 * 355 355 */ … … 359 359 } 360 360 361 /** Get number of characters in astring.362 * 363 * @param str NULL-terminated string.364 * 365 * @return Number of c haracters instring.361 /** Get number of unicode code points in a UTF-8 encoded string. 362 * 363 * @param str NULL-terminated UTF-8 string. 364 * 365 * @return Number of code points in the string. 366 366 * 367 367 */ … … 377 377 } 378 378 379 /** Get number of c haracters in a wide string.379 /** Get number of code points in a wide string. 380 380 * 381 381 * @param str NULL-terminated wide string. 382 382 * 383 * @return Number of c haracters in @a str.383 * @return Number of code points in @a str. 384 384 * 385 385 */ … … 394 394 } 395 395 396 /** Get number of c haracters in a string with size limit.396 /** Get number of code points in a string with size limit. 397 397 * 398 398 * @param str NULL-terminated string. 399 399 * @param size Maximum number of bytes to consider. 400 400 * 401 * @return Number of c haracters in string.401 * @return Number of code points in string. 402 402 * 403 403 */ … … 413 413 } 414 414 415 /** Get number of c haracters in a string with size limit.415 /** Get number of code points in a string with size limit. 416 416 * 417 417 * @param str NULL-terminated string. 418 418 * @param size Maximum number of bytes to consider. 419 419 * 420 * @return Number of c haracters in string.420 * @return Number of code points in string. 421 421 * 422 422 */ … … 435 435 } 436 436 437 /** Check whether c haracteris plain ASCII.438 * 439 * @return True if c haracteris plain ASCII.437 /** Check whether code point is plain ASCII. 438 * 439 * @return True if code point is plain ASCII. 440 440 * 441 441 */ … … 448 448 } 449 449 450 /** Check whether c haracteris valid451 * 452 * @return True if c haracteris a valid Unicode code point.450 /** Check whether code point is valid 451 * 452 * @return True if code point is a valid Unicode code point. 453 453 * 454 454 */ … … 465 465 * Do a char-by-char comparison of two NULL-terminated strings. 466 466 * The strings are considered equal iff their length is equal 467 * and both strings consist of the same sequence of c haracters.468 * 469 * A string S1 is less than another string S2 if it has a c haracterwith470 * lower value at the first c haracterposition where the strings differ.467 * and both strings consist of the same sequence of code points. 468 * 469 * A string S1 is less than another string S2 if it has a code point with 470 * lower value at the first code point position where the strings differ. 471 471 * If the strings differ in length, the shorter one is treated as if 472 * padded by c haracters with a value of zero.472 * padded by code points with a value of zero. 473 473 * 474 474 * @param s1 First string to compare. … … 509 509 * The strings are considered equal iff 510 510 * min(str_code_points(s1), max_len) == min(str_code_points(s2), max_len) 511 * and both strings consist of the same sequence of c haracters,512 * up to max_len c haracters.513 * 514 * A string S1 is less than another string S2 if it has a c haracterwith515 * lower value at the first c haracterposition where the strings differ.511 * and both strings consist of the same sequence of code points, 512 * up to max_len code points. 513 * 514 * A string S1 is less than another string S2 if it has a code point with 515 * lower value at the first code point position where the strings differ. 516 516 * If the strings differ in length, the shorter one is treated as if 517 * padded by c haracters with a value of zero. Only the first max_len518 * c haracters are considered.517 * padded by code points with a value of zero. Only the first max_len 518 * code points are considered. 519 519 * 520 520 * @param s1 First string to compare. 521 521 * @param s2 Second string to compare. 522 * @param max_len Maximum number of c haracters to consider.522 * @param max_len Maximum number of code points to consider. 523 523 * 524 524 * @return 0 if the strings are equal, -1 if the first is less than the second, … … 564 564 * No more than @a size bytes are written. If the size of the output buffer 565 565 * is at least one byte, the output string will always be well-formed, i.e. 566 * null-terminated and containing only complete c haracters.566 * null-terminated and containing only complete code points. 567 567 * 568 568 * @param dest Destination buffer. … … 594 594 * @a dest. No more than @a size bytes are written. The output string will 595 595 * always be well-formed, i.e. null-terminated and containing only complete 596 * c haracters.596 * code points. 597 597 * 598 598 * No more than @a n bytes are read from the input string, so it does not … … 652 652 } 653 653 654 /** Find first occurence of c haracterin string.654 /** Find first occurence of code point in string. 655 655 * 656 656 * @param str String to search. 657 * @param ch Characterto look for.658 * 659 * @return Pointer to c haracterin @a str or NULL if not found.657 * @param ch code point to look for. 658 * 659 * @return Pointer to code point in @a str or NULL if not found. 660 660 */ 661 661 char *str_chr(const char *str, wchar_t ch) … … 674 674 } 675 675 676 /** Insert a wide characterinto a wide string.677 * 678 * Insert a wide characterinto a wide string at position679 * @a pos. The c haracters after the position are shifted.676 /** Insert a code point into a wide string. 677 * 678 * Insert a code point into a wide string at position 679 * @a pos. The code points after the position are shifted. 680 680 * 681 681 * @param str String to insert to. 682 * @param ch C haracter to insert to.683 * @param pos C haracterindex where to insert.684 * @param max_pos Charactersin the buffer.682 * @param ch Code point to insert. 683 * @param pos Code point index where to insert. 684 * @param max_pos Number of code points that fit in the buffer. 685 685 * 686 686 * @return True if the insertion was sucessful, false if the position … … 704 704 } 705 705 706 /** Remove a wide characterfrom a wide string.707 * 708 * Remove a wide characterfrom a wide string at position709 * @a pos. The c haracters after the position are shifted.706 /** Remove a code point from a wide string. 707 * 708 * Remove a code point from a wide string at position 709 * @a pos. The code points after the position are shifted. 710 710 * 711 711 * @param str String to remove from. 712 * @param pos C haracterindex to remove.712 * @param pos Code point index to remove. 713 713 * 714 714 * @return True if the removal was sucessful, false if the position … … 732 732 /** Duplicate string. 733 733 * 734 * Allocate a new string and copy characters from the source 735 * string into it. The duplicate string is allocated via sleeping 736 * malloc(), thus this function can sleep in no memory conditions. 737 * 738 * The allocation cannot fail and the return value is always 739 * a valid pointer. The duplicate string is always a well-formed 734 * Allocate a new string and copy the contents of the source string into it. 735 * The duplicate string is allocated as if by malloc(). 736 * 737 * If successful, the duplicate string is always a well-formed 740 738 * null-terminated UTF-8 string, but it can differ from the source 741 739 * string on the byte level. … … 743 741 * @param src Source string. 744 742 * 745 * @return Duplicate string .743 * @return Duplicate string, or NULL if allocation failed. 746 744 * 747 745 */ … … 760 758 * 761 759 * Allocate a new string and copy up to @max_size bytes from the source 762 * string into it. The duplicate string is allocated via sleeping 763 * malloc(), thus this function can sleep in no memory conditions. 760 * string into it. The duplicate string is allocated as if by malloc(). 764 761 * No more than @max_size + 1 bytes is allocated, but if the size 765 762 * occupied by the source string is smaller than @max_size + 1, 766 763 * less is allocated. 767 764 * 768 * The allocation cannot fail and the return value is always 769 * a valid pointer. The duplicate string is always a well-formed 765 * If successful, the duplicate string is always a well-formed 770 766 * null-terminated UTF-8 string, but it can differ from the source 771 767 * string on the byte level.
Note:
See TracChangeset
for help on using the changeset viewer.