Text.WordBreak Class
<p> Provides utility methods for splitting strings on word breaks and determining whether a character index represents a word boundary, using the generic word breaking algorithm defined in the Unicode Text Segmentation guidelines (<a href="http://unicode.org/reports/tr29/#Word_Boundaries">Unicode Standard Annex #29</a>). </p>
<p> This algorithm provides a reasonable default for many languages. However, it does not cover language or context specific requirements, and it does not provide meaningful results at all for languages that don't use spaces between words, such as Chinese, Japanese, Thai, Lao, Khmer, and others. Server-based word breaking services usually provide significantly better results with better performance. </p>
Index
Methods
- _classify static
- _isWordBoundary static
- getUniqueWords static
- getWords static
- isWordBoundary static
Methods
_classify
-
string
Returns a character classification map for the specified string.
Parameters:
-
stringStringString to classify.
Returns:
Classification map.
_isWordBoundary
-
map -
index
<p> Returns <code>true</code> if there is a word boundary between the specified character index and the next character index (or the end of the string). </p>
<p> Note that there are always word breaks at the beginning and end of a string, so <code>_isWordBoundary('', 0)</code> and <code>_isWordBoundary('a', 0)</code> will both return <code>true</code>. </p>
Parameters:
-
mapArrayCharacter classification map generated by <code>_classify</code>.
-
indexNumberCharacter index to test.
Returns:
getUniqueWords
-
string -
options
Returns an array containing only unique words from the specified string. For example, the string <code>'foo bar baz foo'</code> would result in the array <code>['foo', 'bar', 'baz']</code>.
Parameters:
-
stringStringString to split.
-
optionsObject(optional) Options (see <code>getWords()</code> for details).
Returns:
Array of unique words.
getWords
-
string -
options
Splits the specified string into an array of individual words.
Parameters:
-
stringStringString to split.
-
optionsObject(optional) Options object containing zero or more of the following properties:
<dl> <dt>ignoreCase (Boolean)</dt> <dd> If <code>true</code>, the string will be converted to lowercase before being split. Default is <code>false</code>. </dd>
<dt>includePunctuation (Boolean)</dt> <dd> If <code>true</code>, the returned array will include punctuation characters. Default is <code>false</code>. </dd>
<dt>includeWhitespace (Boolean)</dt> <dd> If <code>true</code>, the returned array will include whitespace characters. Default is <code>false</code>. </dd> </dl>
Returns:
Array of words.
isWordBoundary
-
string -
index
<p> Returns <code>true</code> if there is a word boundary between the specified character index and the next character index (or the end of the string). </p>
<p> Note that there are always word breaks at the beginning and end of a string, so <code>isWordBoundary('', 0)</code> and <code>isWordBoundary('a', 0)</code> will both return <code>true</code>. </p>
Parameters:
-
stringStringString to test.
-
indexNumberCharacter index to test within the string.
Returns:
<code>true</code> for a word boundary, <code>false</code> otherwise.