Breaks up $string-var1 into substrings based on the regular expression delimiter specified in $string-var2.
If the value of $string-var1, $string-var2, or $string-var3 is the empty sequence, the empty sequence is returned. The empty sequence is a sequence containing zero items (), which is similar to null in SQL.
If any character besides m or i is specified in $string-var3 (the flags string), the TransformException exception is raised with the RT_REGEXP_FLAGS fault code. In the mapper the following error message is displayed:
Error occurred while executing XQuery: Invalid regular expression syntax (flags: "a")
To learn more about using fault codes, see Getting the TransformException Fault Code Programmatically.
xf:tokenize(xs:string? $string-var1, xs:string? $string-var2) —> xs:string*
xf:tokenize(xs:string? $string-var1, xs:string? $string-var2 xs:string? $string-var3) —> xs:string*
Represents the regular expression which determines how to break up the source string. To learn more, see W3C Regular Expresssions. |
|||
Specifies flags that affect how the regular expression is interpreted. (Optional) |
|||
If specified the match operates in multiline mode. By default, the match operates in string mode. |
|||
If specified the match operates in case-insensitive mode. By default, the match operates in case-sensitive mode. |
Returns the strings after break up of $string-var1 based on the pattern specified in $string-var2 has occurred.
Invoking tokenize("Jane fell down the hill", "\s") returns the following sequence of strings: ("Jane", "fell", "down", "the", "hill"). The regular expression \s specifies that the delimiter is white space, so in this case the source string is broken up by white space as shown in the following example query:
<l>{
for $tok in xf:tokenize("Jane fell down the hill", "\s")
return <i>{ $tok }</i>
}</l>
The preceding query generates the following result:
<l>
<i>Jane</i>
<i>fell</i>
<i>down</i>
<i>the</i>
<i>hill</i>
</l>
Invoking tokenize("3,20,,27,60", ",") returns the following strings: ("3", "20","","27","60"). In this case, the delimiter is a comma, so the source string is broken up by commas as shown in the following example query:
<l>{
for $tok in xf:tokenize("3,20,,27,60", ",")
return <i>{ $tok }</i>
}</l>
The preceding query generates the following result:
<l>
<i>3</i>
<i>20</i>
<i></i>
<i>27</i>
<i>60</i>
</l>
Invoking tokenize("3, 4, 27,67", ",\s") returns the following strings: ("3", "4", "27,67"). The regular expression,\s specifies that the delimiter is a comma with white space, so in this case the source string is broken up by a comma with white space as shown in the following example query:
<l>{
for $tok in xf:tokenize("3, 4, 27,67", ",\s")
return <i>{ $tok }</i>
}</l>
The preceding query generates the following result:
<l> <i>3</i> <i>4</i> <i>27,67</i> </l>
Note: The numbers 27 and 67 are not broken up as tokens because there is no white space between the 27 and the 67 in the source string (just a comma).
Invoking tokenize("1a2A3a4A5", "a", "i") returns the returns the following strings: ("1","2","3","4","5"). The i flag in $string-var3 specifies to ignore the case of the characters during matching, so in this case the source string is broken up by both the capital A and the lowercase a characters, as shown in the following example query:
<l>{
for $tok in xf:tokenize("1a2A3a4A5", "a", "i")
return <i>{ $tok }</i>
}</l>
The preceding query generates the following result:
<l>
<i>1</i>
<i>2</i>
<i>3</i>
<i>4</i>
<i>5</i>
</l>
W3C tokenize function description.
W3C Regular Expressions description.