Edit File by line
/home/zeestwma/richards.../wp-inclu...
File: class-wp-token-map.php
<?php
[0] Fix | Delete
[1] Fix | Delete
/**
[2] Fix | Delete
* Class for efficiently looking up and mapping string keys to string values, with limits.
[3] Fix | Delete
*
[4] Fix | Delete
* @package WordPress
[5] Fix | Delete
* @since 6.6.0
[6] Fix | Delete
*/
[7] Fix | Delete
[8] Fix | Delete
/**
[9] Fix | Delete
* WP_Token_Map class.
[10] Fix | Delete
*
[11] Fix | Delete
* Use this class in specific circumstances with a static set of lookup keys which map to
[12] Fix | Delete
* a static set of transformed values. For example, this class is used to map HTML named
[13] Fix | Delete
* character references to their equivalent UTF-8 values.
[14] Fix | Delete
*
[15] Fix | Delete
* This class works differently than code calling `in_array()` and other methods. It
[16] Fix | Delete
* internalizes lookup logic and provides helper interfaces to optimize lookup and
[17] Fix | Delete
* transformation. It provides a method for precomputing the lookup tables and storing
[18] Fix | Delete
* them as PHP source code.
[19] Fix | Delete
*
[20] Fix | Delete
* All tokens and substitutions must be shorter than 256 bytes.
[21] Fix | Delete
*
[22] Fix | Delete
* Example:
[23] Fix | Delete
*
[24] Fix | Delete
* $smilies = WP_Token_Map::from_array( array(
[25] Fix | Delete
* '8O' => '😯',
[26] Fix | Delete
* ':(' => '🙁',
[27] Fix | Delete
* ':)' => '🙂',
[28] Fix | Delete
* ':?' => '😕',
[29] Fix | Delete
* ) );
[30] Fix | Delete
*
[31] Fix | Delete
* true === $smilies->contains( ':)' );
[32] Fix | Delete
* false === $smilies->contains( 'simile' );
[33] Fix | Delete
*
[34] Fix | Delete
* '😕' === $smilies->read_token( 'Not sure :?.', 9, $length_of_smily_syntax );
[35] Fix | Delete
* 2 === $length_of_smily_syntax;
[36] Fix | Delete
*
[37] Fix | Delete
* ## Precomputing the Token Map.
[38] Fix | Delete
*
[39] Fix | Delete
* Creating the class involves some work sorting and organizing the tokens and their
[40] Fix | Delete
* replacement values. In order to skip this, it's possible for the class to export
[41] Fix | Delete
* its state and be used as actual PHP source code.
[42] Fix | Delete
*
[43] Fix | Delete
* Example:
[44] Fix | Delete
*
[45] Fix | Delete
* // Export with four spaces as the indent, only for the sake of this docblock.
[46] Fix | Delete
* // The default indent is a tab character.
[47] Fix | Delete
* $indent = ' ';
[48] Fix | Delete
* echo $smilies->precomputed_php_source_table( $indent );
[49] Fix | Delete
*
[50] Fix | Delete
* // Output, to be pasted into a PHP source file:
[51] Fix | Delete
* WP_Token_Map::from_precomputed_table(
[52] Fix | Delete
* array(
[53] Fix | Delete
* "storage_version" => "6.6.0",
[54] Fix | Delete
* "key_length" => 2,
[55] Fix | Delete
* "groups" => "",
[56] Fix | Delete
* "long_words" => array(),
[57] Fix | Delete
* "small_words" => "8O\x00:)\x00:(\x00:?\x00",
[58] Fix | Delete
* "small_mappings" => array( "😯", "🙂", "🙁", "😕" )
[59] Fix | Delete
* )
[60] Fix | Delete
* );
[61] Fix | Delete
*
[62] Fix | Delete
* ## Large vs. small words.
[63] Fix | Delete
*
[64] Fix | Delete
* This class uses a short prefix called the "key" to optimize lookup of its tokens.
[65] Fix | Delete
* This means that some tokens may be shorter than or equal in length to that key.
[66] Fix | Delete
* Those words that are longer than the key are called "large" while those shorter
[67] Fix | Delete
* than or equal to the key length are called "small."
[68] Fix | Delete
*
[69] Fix | Delete
* This separation of large and small words is incidental to the way this class
[70] Fix | Delete
* optimizes lookup, and should be considered an internal implementation detail
[71] Fix | Delete
* of the class. It may still be important to be aware of it, however.
[72] Fix | Delete
*
[73] Fix | Delete
* ## Determining Key Length.
[74] Fix | Delete
*
[75] Fix | Delete
* The choice of the size of the key length should be based on the data being stored in
[76] Fix | Delete
* the token map. It should divide the data as evenly as possible, but should not create
[77] Fix | Delete
* so many groups that a large fraction of the groups only contain a single token.
[78] Fix | Delete
*
[79] Fix | Delete
* For the HTML5 named character references, a key length of 2 was found to provide a
[80] Fix | Delete
* sufficient spread and should be a good default for relatively large sets of tokens.
[81] Fix | Delete
*
[82] Fix | Delete
* However, for some data sets this might be too long. For example, a list of smilies
[83] Fix | Delete
* may be too small for a key length of 2. Perhaps 1 would be more appropriate. It's
[84] Fix | Delete
* best to experiment and determine empirically which values are appropriate.
[85] Fix | Delete
*
[86] Fix | Delete
* ## Generate Pre-Computed Source Code.
[87] Fix | Delete
*
[88] Fix | Delete
* Since the `WP_Token_Map` is designed for relatively static lookups, it can be
[89] Fix | Delete
* advantageous to precompute the values and instantiate a table that has already
[90] Fix | Delete
* sorted and grouped the tokens and built the lookup strings.
[91] Fix | Delete
*
[92] Fix | Delete
* This can be done with `WP_Token_Map::precomputed_php_source_table()`.
[93] Fix | Delete
*
[94] Fix | Delete
* Note that if there is a leading character that all tokens need, such as `&` for
[95] Fix | Delete
* HTML named character references, it can be beneficial to exclude this from the
[96] Fix | Delete
* token map. Instead, find occurrences of the leading character and then use the
[97] Fix | Delete
* token map to see if the following characters complete the token.
[98] Fix | Delete
*
[99] Fix | Delete
* Example:
[100] Fix | Delete
*
[101] Fix | Delete
* $map = WP_Token_Map::from_array( array( 'simple_smile:' => '🙂', 'sob:' => '😭', 'soba:' => '🍜' ) );
[102] Fix | Delete
* echo $map->precomputed_php_source_table();
[103] Fix | Delete
* // Output
[104] Fix | Delete
* WP_Token_Map::from_precomputed_table(
[105] Fix | Delete
* array(
[106] Fix | Delete
* "storage_version" => "6.6.0",
[107] Fix | Delete
* "key_length" => 2,
[108] Fix | Delete
* "groups" => "si\x00so\x00",
[109] Fix | Delete
* "long_words" => array(
[110] Fix | Delete
* // simple_smile:[🙂].
[111] Fix | Delete
* "\x0bmple_smile:\x04🙂",
[112] Fix | Delete
* // soba:[🍜] sob:[😭].
[113] Fix | Delete
* "\x03ba:\x04🍜\x02b:\x04😭",
[114] Fix | Delete
* ),
[115] Fix | Delete
* "short_words" => "",
[116] Fix | Delete
* "short_mappings" => array()
[117] Fix | Delete
* }
[118] Fix | Delete
* );
[119] Fix | Delete
*
[120] Fix | Delete
* This precomputed value can be stored directly in source code and will skip the
[121] Fix | Delete
* startup cost of generating the lookup strings. See `$html5_named_character_entities`.
[122] Fix | Delete
*
[123] Fix | Delete
* Note that any updates to the precomputed format should update the storage version
[124] Fix | Delete
* constant. It would also be best to provide an update function to take older known
[125] Fix | Delete
* versions and upgrade them in place when loading into `from_precomputed_table()`.
[126] Fix | Delete
*
[127] Fix | Delete
* ## Future Direction.
[128] Fix | Delete
*
[129] Fix | Delete
* It may be viable to dynamically increase the length limits such that there's no need to impose them.
[130] Fix | Delete
* The limit appears because of the packing structure, which indicates how many bytes each segment of
[131] Fix | Delete
* text in the lookup tables spans. If, however, care were taken to track the longest word length, then
[132] Fix | Delete
* the packing structure could change its representation to allow for that. Each additional byte storing
[133] Fix | Delete
* length, however, increases the memory overhead and lookup runtime.
[134] Fix | Delete
*
[135] Fix | Delete
* An alternative approach could be to borrow the UTF-8 variable-length encoding and store lengths of less
[136] Fix | Delete
* than 127 as a single byte with the high bit unset, storing longer lengths as the combination of
[137] Fix | Delete
* continuation bytes.
[138] Fix | Delete
*
[139] Fix | Delete
* Since it has not been shown during the development of this class that longer strings are required, this
[140] Fix | Delete
* update is deferred until such a need is clear.
[141] Fix | Delete
*
[142] Fix | Delete
* @since 6.6.0
[143] Fix | Delete
*/
[144] Fix | Delete
class WP_Token_Map {
[145] Fix | Delete
/**
[146] Fix | Delete
* Denotes the version of the code which produces pre-computed source tables.
[147] Fix | Delete
*
[148] Fix | Delete
* This version will be used not only to verify pre-computed data, but also
[149] Fix | Delete
* to upgrade pre-computed data from older versions. Choosing a name that
[150] Fix | Delete
* corresponds to the WordPress release will help people identify where an
[151] Fix | Delete
* old copy of data came from.
[152] Fix | Delete
*/
[153] Fix | Delete
const STORAGE_VERSION = '6.6.0-trunk';
[154] Fix | Delete
[155] Fix | Delete
/**
[156] Fix | Delete
* Maximum length for each key and each transformed value in the table (in bytes).
[157] Fix | Delete
*
[158] Fix | Delete
* @since 6.6.0
[159] Fix | Delete
*/
[160] Fix | Delete
const MAX_LENGTH = 256;
[161] Fix | Delete
[162] Fix | Delete
/**
[163] Fix | Delete
* How many bytes of each key are used to form a group key for lookup.
[164] Fix | Delete
* This also determines whether a word is considered short or long.
[165] Fix | Delete
*
[166] Fix | Delete
* @since 6.6.0
[167] Fix | Delete
*
[168] Fix | Delete
* @var int
[169] Fix | Delete
*/
[170] Fix | Delete
private $key_length = 2;
[171] Fix | Delete
[172] Fix | Delete
/**
[173] Fix | Delete
* Stores an optimized form of the word set, where words are grouped
[174] Fix | Delete
* by a prefix of the `$key_length` and then collapsed into a string.
[175] Fix | Delete
*
[176] Fix | Delete
* In each group, the keys and lookups form a packed data structure.
[177] Fix | Delete
* The keys in the string are stripped of their "group key," which is
[178] Fix | Delete
* the prefix of length `$this->key_length` shared by all of the items
[179] Fix | Delete
* in the group. Each word in the string is prefixed by a single byte
[180] Fix | Delete
* whose raw unsigned integer value represents how many bytes follow.
[181] Fix | Delete
*
[182] Fix | Delete
* ┌────────────────┬───────────────┬─────────────────┬────────┐
[183] Fix | Delete
* │ Length of rest │ Rest of key │ Length of value │ Value │
[184] Fix | Delete
* │ of key (bytes) │ │ (bytes) │ │
[185] Fix | Delete
* ├────────────────┼───────────────┼─────────────────┼────────┤
[186] Fix | Delete
* │ 0x08 │ nterDot; │ 0x02 │ · │
[187] Fix | Delete
* └────────────────┴───────────────┴─────────────────┴────────┘
[188] Fix | Delete
*
[189] Fix | Delete
* In this example, the key `CenterDot;` has a group key `Ce`, leaving
[190] Fix | Delete
* eight bytes for the rest of the key, `nterDot;`, and two bytes for
[191] Fix | Delete
* the transformed value `·` (or U+B7 or "\xC2\xB7").
[192] Fix | Delete
*
[193] Fix | Delete
* Example:
[194] Fix | Delete
*
[195] Fix | Delete
* // Stores array( 'CenterDot;' => '·', 'Cedilla;' => '¸' ).
[196] Fix | Delete
* $groups = "Ce\x00";
[197] Fix | Delete
* $large_words = array( "\x08nterDot;\x02·\x06dilla;\x02¸" )
[198] Fix | Delete
*
[199] Fix | Delete
* The prefixes appear in the `$groups` string, each followed by a null
[200] Fix | Delete
* byte. This makes for quick lookup of where in the group string the key
[201] Fix | Delete
* is found, and then a simple division converts that offset into the index
[202] Fix | Delete
* in the `$large_words` array where the group string is to be found.
[203] Fix | Delete
*
[204] Fix | Delete
* This lookup data structure is designed to optimize cache locality and
[205] Fix | Delete
* minimize indirect memory reads when matching strings in the set.
[206] Fix | Delete
*
[207] Fix | Delete
* @since 6.6.0
[208] Fix | Delete
*
[209] Fix | Delete
* @var array
[210] Fix | Delete
*/
[211] Fix | Delete
private $large_words = array();
[212] Fix | Delete
[213] Fix | Delete
/**
[214] Fix | Delete
* Stores the group keys for sequential string lookup.
[215] Fix | Delete
*
[216] Fix | Delete
* The offset into this string where the group key appears corresponds with the index
[217] Fix | Delete
* into the group array where the rest of the group string appears. This is an optimization
[218] Fix | Delete
* to improve cache locality while searching and minimize indirect memory accesses.
[219] Fix | Delete
*
[220] Fix | Delete
* @since 6.6.0
[221] Fix | Delete
*
[222] Fix | Delete
* @var string
[223] Fix | Delete
*/
[224] Fix | Delete
private $groups = '';
[225] Fix | Delete
[226] Fix | Delete
/**
[227] Fix | Delete
* Stores an optimized row of small words, where every entry is
[228] Fix | Delete
* `$this->key_size + 1` bytes long and zero-extended.
[229] Fix | Delete
*
[230] Fix | Delete
* This packing allows for direct lookup of a short word followed
[231] Fix | Delete
* by the null byte, if extended to `$this->key_size + 1`.
[232] Fix | Delete
*
[233] Fix | Delete
* Example:
[234] Fix | Delete
*
[235] Fix | Delete
* // Stores array( 'GT', 'LT', 'gt', 'lt' ).
[236] Fix | Delete
* "GT\x00LT\x00gt\x00lt\x00"
[237] Fix | Delete
*
[238] Fix | Delete
* @since 6.6.0
[239] Fix | Delete
*
[240] Fix | Delete
* @var string
[241] Fix | Delete
*/
[242] Fix | Delete
private $small_words = '';
[243] Fix | Delete
[244] Fix | Delete
/**
[245] Fix | Delete
* Replacements for the small words, in the same order they appear.
[246] Fix | Delete
*
[247] Fix | Delete
* With the position of a small word it's possible to index the translation
[248] Fix | Delete
* directly, as its position in the `$small_words` string corresponds to
[249] Fix | Delete
* the index of the replacement in the `$small_mapping` array.
[250] Fix | Delete
*
[251] Fix | Delete
* Example:
[252] Fix | Delete
*
[253] Fix | Delete
* array( '>', '<', '>', '<' )
[254] Fix | Delete
*
[255] Fix | Delete
* @since 6.6.0
[256] Fix | Delete
*
[257] Fix | Delete
* @var string[]
[258] Fix | Delete
*/
[259] Fix | Delete
private $small_mappings = array();
[260] Fix | Delete
[261] Fix | Delete
/**
[262] Fix | Delete
* Create a token map using an associative array of key/value pairs as the input.
[263] Fix | Delete
*
[264] Fix | Delete
* Example:
[265] Fix | Delete
*
[266] Fix | Delete
* $smilies = WP_Token_Map::from_array( array(
[267] Fix | Delete
* '8O' => '😯',
[268] Fix | Delete
* ':(' => '🙁',
[269] Fix | Delete
* ':)' => '🙂',
[270] Fix | Delete
* ':?' => '😕',
[271] Fix | Delete
* ) );
[272] Fix | Delete
*
[273] Fix | Delete
* @since 6.6.0
[274] Fix | Delete
*
[275] Fix | Delete
* @param array $mappings The keys transform into the values, both are strings.
[276] Fix | Delete
* @param int $key_length Determines the group key length. Leave at the default value
[277] Fix | Delete
* of 2 unless there's an empirical reason to change it.
[278] Fix | Delete
*
[279] Fix | Delete
* @return WP_Token_Map|null Token map, unless unable to create it.
[280] Fix | Delete
*/
[281] Fix | Delete
public static function from_array( $mappings, $key_length = 2 ) {
[282] Fix | Delete
$map = new WP_Token_Map();
[283] Fix | Delete
$map->key_length = $key_length;
[284] Fix | Delete
[285] Fix | Delete
// Start by grouping words.
[286] Fix | Delete
[287] Fix | Delete
$groups = array();
[288] Fix | Delete
$shorts = array();
[289] Fix | Delete
foreach ( $mappings as $word => $mapping ) {
[290] Fix | Delete
if (
[291] Fix | Delete
self::MAX_LENGTH <= strlen( $word ) ||
[292] Fix | Delete
self::MAX_LENGTH <= strlen( $mapping )
[293] Fix | Delete
) {
[294] Fix | Delete
_doing_it_wrong(
[295] Fix | Delete
__METHOD__,
[296] Fix | Delete
sprintf(
[297] Fix | Delete
/* translators: 1: maximum byte length (a count) */
[298] Fix | Delete
__( 'Token Map tokens and substitutions must all be shorter than %1$d bytes.' ),
[299] Fix | Delete
self::MAX_LENGTH
[300] Fix | Delete
),
[301] Fix | Delete
'6.6.0'
[302] Fix | Delete
);
[303] Fix | Delete
return null;
[304] Fix | Delete
}
[305] Fix | Delete
[306] Fix | Delete
$length = strlen( $word );
[307] Fix | Delete
[308] Fix | Delete
if ( $key_length >= $length ) {
[309] Fix | Delete
$shorts[] = $word;
[310] Fix | Delete
} else {
[311] Fix | Delete
$group = substr( $word, 0, $key_length );
[312] Fix | Delete
[313] Fix | Delete
if ( ! isset( $groups[ $group ] ) ) {
[314] Fix | Delete
$groups[ $group ] = array();
[315] Fix | Delete
}
[316] Fix | Delete
[317] Fix | Delete
$groups[ $group ][] = array( substr( $word, $key_length ), $mapping );
[318] Fix | Delete
}
[319] Fix | Delete
}
[320] Fix | Delete
[321] Fix | Delete
/*
[322] Fix | Delete
* Sort the words to ensure that no smaller substring of a match masks the full match.
[323] Fix | Delete
* For example, `Cap` should not match before `CapitalDifferentialD`.
[324] Fix | Delete
*/
[325] Fix | Delete
usort( $shorts, 'WP_Token_Map::longest_first_then_alphabetical' );
[326] Fix | Delete
foreach ( $groups as $group_key => $group ) {
[327] Fix | Delete
usort(
[328] Fix | Delete
$groups[ $group_key ],
[329] Fix | Delete
static function ( $a, $b ) {
[330] Fix | Delete
return self::longest_first_then_alphabetical( $a[0], $b[0] );
[331] Fix | Delete
}
[332] Fix | Delete
);
[333] Fix | Delete
}
[334] Fix | Delete
[335] Fix | Delete
// Finally construct the optimized lookups.
[336] Fix | Delete
[337] Fix | Delete
foreach ( $shorts as $word ) {
[338] Fix | Delete
$map->small_words .= str_pad( $word, $key_length + 1, "\x00", STR_PAD_RIGHT );
[339] Fix | Delete
$map->small_mappings[] = $mappings[ $word ];
[340] Fix | Delete
}
[341] Fix | Delete
[342] Fix | Delete
$group_keys = array_keys( $groups );
[343] Fix | Delete
sort( $group_keys );
[344] Fix | Delete
[345] Fix | Delete
foreach ( $group_keys as $group ) {
[346] Fix | Delete
$map->groups .= "{$group}\x00";
[347] Fix | Delete
[348] Fix | Delete
$group_string = '';
[349] Fix | Delete
[350] Fix | Delete
foreach ( $groups[ $group ] as $group_word ) {
[351] Fix | Delete
list( $word, $mapping ) = $group_word;
[352] Fix | Delete
[353] Fix | Delete
$word_length = pack( 'C', strlen( $word ) );
[354] Fix | Delete
$mapping_length = pack( 'C', strlen( $mapping ) );
[355] Fix | Delete
$group_string .= "{$word_length}{$word}{$mapping_length}{$mapping}";
[356] Fix | Delete
}
[357] Fix | Delete
[358] Fix | Delete
$map->large_words[] = $group_string;
[359] Fix | Delete
}
[360] Fix | Delete
[361] Fix | Delete
return $map;
[362] Fix | Delete
}
[363] Fix | Delete
[364] Fix | Delete
/**
[365] Fix | Delete
* Creates a token map from a pre-computed table.
[366] Fix | Delete
* This skips the initialization cost of generating the table.
[367] Fix | Delete
*
[368] Fix | Delete
* This function should only be used to load data created with
[369] Fix | Delete
* WP_Token_Map::precomputed_php_source_tag().
[370] Fix | Delete
*
[371] Fix | Delete
* @since 6.6.0
[372] Fix | Delete
*
[373] Fix | Delete
* @param array $state {
[374] Fix | Delete
* Stores pre-computed state for directly loading into a Token Map.
[375] Fix | Delete
*
[376] Fix | Delete
* @type string $storage_version Which version of the code produced this state.
[377] Fix | Delete
* @type int $key_length Group key length.
[378] Fix | Delete
* @type string $groups Group lookup index.
[379] Fix | Delete
* @type array $large_words Large word groups and packed strings.
[380] Fix | Delete
* @type string $small_words Small words packed string.
[381] Fix | Delete
* @type array $small_mappings Small word mappings.
[382] Fix | Delete
* }
[383] Fix | Delete
*
[384] Fix | Delete
* @return WP_Token_Map Map with precomputed data loaded.
[385] Fix | Delete
*/
[386] Fix | Delete
public static function from_precomputed_table( $state ) {
[387] Fix | Delete
$has_necessary_state = isset(
[388] Fix | Delete
$state['storage_version'],
[389] Fix | Delete
$state['key_length'],
[390] Fix | Delete
$state['groups'],
[391] Fix | Delete
$state['large_words'],
[392] Fix | Delete
$state['small_words'],
[393] Fix | Delete
$state['small_mappings']
[394] Fix | Delete
);
[395] Fix | Delete
[396] Fix | Delete
if ( ! $has_necessary_state ) {
[397] Fix | Delete
_doing_it_wrong(
[398] Fix | Delete
__METHOD__,
[399] Fix | Delete
__( 'Missing required inputs to pre-computed WP_Token_Map.' ),
[400] Fix | Delete
'6.6.0'
[401] Fix | Delete
);
[402] Fix | Delete
return null;
[403] Fix | Delete
}
[404] Fix | Delete
[405] Fix | Delete
if ( self::STORAGE_VERSION !== $state['storage_version'] ) {
[406] Fix | Delete
_doing_it_wrong(
[407] Fix | Delete
__METHOD__,
[408] Fix | Delete
/* translators: 1: version string, 2: version string. */
[409] Fix | Delete
sprintf( __( 'Loaded version \'%1$s\' incompatible with expected version \'%2$s\'.' ), $state['storage_version'], self::STORAGE_VERSION ),
[410] Fix | Delete
'6.6.0'
[411] Fix | Delete
);
[412] Fix | Delete
return null;
[413] Fix | Delete
}
[414] Fix | Delete
[415] Fix | Delete
$map = new WP_Token_Map();
[416] Fix | Delete
[417] Fix | Delete
$map->key_length = $state['key_length'];
[418] Fix | Delete
$map->groups = $state['groups'];
[419] Fix | Delete
$map->large_words = $state['large_words'];
[420] Fix | Delete
$map->small_words = $state['small_words'];
[421] Fix | Delete
$map->small_mappings = $state['small_mappings'];
[422] Fix | Delete
[423] Fix | Delete
return $map;
[424] Fix | Delete
}
[425] Fix | Delete
[426] Fix | Delete
/**
[427] Fix | Delete
* Indicates if a given word is a lookup key in the map.
[428] Fix | Delete
*
[429] Fix | Delete
* Example:
[430] Fix | Delete
*
[431] Fix | Delete
* true === $smilies->contains( ':)' );
[432] Fix | Delete
* false === $smilies->contains( 'simile' );
[433] Fix | Delete
*
[434] Fix | Delete
* @since 6.6.0
[435] Fix | Delete
*
[436] Fix | Delete
* @param string $word Determine if this word is a lookup key in the map.
[437] Fix | Delete
* @param string $case_sensitivity Optional. Pass 'ascii-case-insensitive' to ignore ASCII case when matching. Default 'case-sensitive'.
[438] Fix | Delete
* @return bool Whether there's an entry for the given word in the map.
[439] Fix | Delete
*/
[440] Fix | Delete
public function contains( $word, $case_sensitivity = 'case-sensitive' ) {
[441] Fix | Delete
$ignore_case = 'ascii-case-insensitive' === $case_sensitivity;
[442] Fix | Delete
[443] Fix | Delete
if ( $this->key_length >= strlen( $word ) ) {
[444] Fix | Delete
if ( 0 === strlen( $this->small_words ) ) {
[445] Fix | Delete
return false;
[446] Fix | Delete
}
[447] Fix | Delete
[448] Fix | Delete
$term = str_pad( $word, $this->key_length + 1, "\x00", STR_PAD_RIGHT );
[449] Fix | Delete
$word_at = $ignore_case ? stripos( $this->small_words, $term ) : strpos( $this->small_words, $term );
[450] Fix | Delete
if ( false === $word_at ) {
[451] Fix | Delete
return false;
[452] Fix | Delete
}
[453] Fix | Delete
[454] Fix | Delete
return true;
[455] Fix | Delete
}
[456] Fix | Delete
[457] Fix | Delete
$group_key = substr( $word, 0, $this->key_length );
[458] Fix | Delete
$group_at = $ignore_case ? stripos( $this->groups, $group_key ) : strpos( $this->groups, $group_key );
[459] Fix | Delete
if ( false === $group_at ) {
[460] Fix | Delete
return false;
[461] Fix | Delete
}
[462] Fix | Delete
$group = $this->large_words[ $group_at / ( $this->key_length + 1 ) ];
[463] Fix | Delete
$group_length = strlen( $group );
[464] Fix | Delete
$slug = substr( $word, $this->key_length );
[465] Fix | Delete
$length = strlen( $slug );
[466] Fix | Delete
$at = 0;
[467] Fix | Delete
[468] Fix | Delete
while ( $at < $group_length ) {
[469] Fix | Delete
$token_length = unpack( 'C', $group[ $at++ ] )[1];
[470] Fix | Delete
$token_at = $at;
[471] Fix | Delete
$at += $token_length;
[472] Fix | Delete
$mapping_length = unpack( 'C', $group[ $at++ ] )[1];
[473] Fix | Delete
$mapping_at = $at;
[474] Fix | Delete
[475] Fix | Delete
if ( $token_length === $length && 0 === substr_compare( $group, $slug, $token_at, $token_length, $ignore_case ) ) {
[476] Fix | Delete
return true;
[477] Fix | Delete
}
[478] Fix | Delete
[479] Fix | Delete
$at = $mapping_at + $mapping_length;
[480] Fix | Delete
}
[481] Fix | Delete
[482] Fix | Delete
return false;
[483] Fix | Delete
}
[484] Fix | Delete
[485] Fix | Delete
/**
[486] Fix | Delete
* If the text starting at a given offset is a lookup key in the map,
[487] Fix | Delete
* return the corresponding transformation from the map, else `false`.
[488] Fix | Delete
*
[489] Fix | Delete
* This function returns the translated string, but accepts an optional
[490] Fix | Delete
* parameter `$matched_token_byte_length`, which communicates how many
[491] Fix | Delete
* bytes long the lookup key was, if it found one. This can be used to
[492] Fix | Delete
* advance a cursor in calling code if a lookup key was found.
[493] Fix | Delete
*
[494] Fix | Delete
* Example:
[495] Fix | Delete
*
[496] Fix | Delete
* false === $smilies->read_token( 'Not sure :?.', 0, $token_byte_length );
[497] Fix | Delete
* '😕' === $smilies->read_token( 'Not sure :?.', 9, $token_byte_length );
[498] Fix | Delete
* 2 === $token_byte_length;
[499] Fix | Delete
12
It is recommended that you Edit text format, this type of Fix handles quite a lot in one request
Function