STANDARD LIBRARY extensions for PHP core functionality

PHP Unicode/multibyte string manipulation

Unicode Support in PHP

PHP’s lack of Unicode/multibyte support means that the standard string handling functions treat strings as a sequence of single-byte characters. In fact, the official manual defines a string in PHP as a “series of characters, where a character is the same as a byte.” PHP supports only 8-bit characters, while Unicode (and many other character sets) may require more than one byte to represent a character. This limitation of PHP affects almost all aspects of string manipulation, including (but not limited to) substring extraction, determining string lengths, string splitting, shuffling etc.www.sitepoint.com

The Strings and the MbString classes support UTF-8 multibyte character encoding scheme. The difference between them is that the MbString is an object oriented representation of a string whereas the Strings is a static utility class containing methods for string manipulation.

  • Multibyte string example PHP code
    1. <?php
    2.  
    3. namespace Sphp\Stdlib;
    4.  
    5. use Sphp\Config\PHPConfig;
    6.  
    7. (new PHPConfig())->setCharacterEncoding("UTF-8");
    8.  
    9. $string = 'лдэфвәәуүйәуйүәу034928348539857әшаыдларорашһһрлоавы';
    10. echo 'strlen: ' . strlen($string);
    11. echo "\n\StringObject length: " . (new MbString($string))->length();
    12. echo "\n\Strings length: " . Strings::length($string);
    13.  
    Highlighted with GeSHi 1.0.9.1
  • Multibyte string example results
    1. strlen: 87
    2. \StringObject length: 51
    3. \Strings length: 51
    Highlighted with GeSHi 1.0.9.1

The MbString class

The MbString class is a wrapper for a PHP string with any character encoding. Therefore it can deal with the issues concerning multibyte encodings in PHP.

  • PHP code
    1. <?php
    2.  
    3. namespace Sphp\Stdlib;
    4.  
    5. $str = new MbString("Hello! I am a string.\n");
    6.  
    7. echo $str->convertCase(MB_CASE_LOWER);
    8. echo $str->convertCase(MB_CASE_UPPER);
    9. echo $str->convertCase(MB_CASE_TITLE);
    10. echo $str->reverse()->trim();
    11. echo "\n" . $str[0] . $str[1] . $str[2] . $str[3] . $str[4];
    12.  
    13.  
    Highlighted with GeSHi 1.0.9.1
  • Execution result as highlighted code
    1. hello! i am a string.
    2. HELLO! I AM A STRING.
    3. Hello! I Am A String.
    4. .gnirts a ma I !olleH
    5. Hello
    Highlighted with GeSHi 1.0.9.1

The Strings class

Strings class is a static utility class for multibyte PHP string manipulation, comparison and matching functions.

  • Multibyte String utility example
    1. <?php
    2.  
    3. namespace Sphp\Stdlib;
    4.  
    5. echo "empty:\n";
    6.     Strings::isEmpty(''),
    7.     Strings::length(''));
    8. echo "matching:\n";
    9.                 Strings::match("0 1 2", '/^[0-9]+$/'),
    10.                 Strings::match("123abc", "/^([0-9a-zA-ZäöåÄÖÅ])*$/"));
    11. echo "start & end:\n";
    12.                 Strings::startsWith("0 1", "0 1 2"),
    13.                 Strings::endsWith("bcd", "123abc"));
    14.  
    Highlighted with GeSHi 1.0.9.1
  • String utility results
    1. empty:
    2. bool(true)
    3. int(0)
    4. matching:
    5. bool(false)
    6. bool(true)
    7. start & end:
    8. bool(false)
    9. bool(false)
    10.  
    Highlighted with GeSHi 1.0.9.1

The Arrays class for PHP's array manipulation

PHP's array type can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and much more. As values can be other arrays, trees and multidimensional arrays are also possible.

Arrays is a static utility class extending array related operations in PHP core and it introduces methods for:

  • testing array properties.
  • manipulating array properties and creating new arrays.
  • 'cloning' multidimensional PHP arrays.

Bit manipulation by using BitMask objects

A BitMask Implements an collection of bits that grows as needed. The bits of a BitMask are indexed by nonnegative integers. Individual indexed bits can be examined, set, or cleared. Additionally one BitMask may be used to modify the contents of another BitMask through the implemented logical AND, logical inclusive OR, and logical exclusive OR operations.

  • PHP code
    1. <?php
    2.  
    3. namespace Sphp\Stdlib;
    4.  
    5. $mask1 = new BitMask(0b10);
    6. echo "bin: $mask1\n";
    7. echo "bit at 0: " . $mask1->getBit(0) . "\n";
    8. echo "bit at 1: " . $mask1->getBit(1) . "\n";
    9. echo "OR 0b101: {$mask1->binOR(0b101)}\n";
    10. echo "set and unset: {$mask1->setBit(4)->unsetBit(0)->unsetBit(1)->unsetBit(2)}\n";
    11.  
    Highlighted with GeSHi 1.0.9.1
  • Execution result as highlighted code
    1. bin: 10
    2. bit at 0: 0
    3. bit at 1: 1
    4. OR 0b101: 111
    5. set and unset: 10000
    6.  
    Highlighted with GeSHi 1.0.9.1