Regular Expression Conditions
The regex_like
function performs regular expression matching. A regular expression is a pattern that the regular expression engine attempts to match with an input string. The syntax for invoking the regex_like
function in a query is the same as all other functions, described in the Function Calls section. The regex_like
function has two signatures with two and three parameters, respectively.
Syntax
boolean regex_like(any*, string)
boolean regex_like(any*, string, string)
Semantics
The regex_like
function provides functionality similar to the LIKE operator in standard SQL, that is, it can be used to check if an input string matches a given pattern. The input string and the pattern are computed by the first and second arguments, respectively. A third, optional, argument specifies a set of flags that affect how the matching is done.
Normally, the regex_like
function expects each of its arguments to return a single string. If that is not the case, it behaves as follows:
- If it can be detected at compile time that the first argument will never return a string, it raises a compile-time error. Otherwise, it returns false if the first argument returns nothing, more than one item, or a single item that is neither a string nor NULL.
- It raises an error if the pattern or flags do not return a single string or NULL.
- It returns NULL if any of the arguments returns a single NULL.
Otherwise, the regex_like function behaves as follows:
- Raises an error if the pattern string is not valid or its length is greater than 512 characters.
- Returns false if the pattern does not match the input string.
- Returns true if the pattern matches the input string.
- Period (.)
You use period metacharacter to match every character except a new line
- Greedy quantifier (*)
You use the greedy quantifier (*) to indicate zero or more occurrences of the preceding element.
For example, the character D with a combination of period (.) and (*)
"D.*"
, matches any string that starts with the character 'D' and is followed by zero or more characters. - Quotation constructs (\, \Q, \E)
You use the backslash '\' character as a quotation construct with other metacharacters to instruct the regular expression engine to interpret the following metacharacter as a literal character.
For example, you use the '\*' pattern to match the literal asterisk '*' character and not interpret it as the greedy quantifier (*).
You use '\Q' and '\E' quotation constructs in the query to match the exact string pattern that starts after the '\Q' character and ends at the '\E' character.
For example, you use "\\Q**Houston**\\E" to match any string that has a value '**Houston**'.
Note:
In the SQL shell or other equivalent tools,you must use a double backslash '\\' in place of a single backslash '\' quotation construct. - Escape sequences
You use the escape sequences to match certain predefined characters. The following escape sequences are supported:
Table 6-2 Escape sequences
Character | Description |
---|---|
\x{h...h} | Matches the character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT) |
\xhh | Matches the character with hexadecimal value 0xhh |
\uhhhh |
Matches the Unicode character specified by the hexadecimal number 0xhhhh. Example: 'u000A' matches the newline character |
\t | Matches the tab character ('\u0009') |
\r | Matches the carriage-return character ('\u000D') |
\n | Matches the newline (line feed) character ('\u000A') |
\f | Matches the form-feed character ('\u000C') |
\e | Matches the escape character ('\u001B') |
\cx | Matches the control character corresponding to x |
\a | Matches the alert (bell) character ('\u0007') |
\\ | Matches the backslash '\' character ('\u005C') |
\0nn | Matches the character with octal value 0nn(0 <= n<= 7) |
\0n | Matches the character with octal value 0n (0 <= n <= 7) |
\0mnn | Matches the character with octal value 0mnn(0 <= m<= 3, 0 <= n<= 7) |
The flag contains one or more characters, where each character flag specifies some particular behavior. The full list of flag characters and their semantics are listed in the following table:
Table 6-3 Predefined Quoted Characters
Flag | Full Name | Description |
---|---|---|
"d" | UNIX_LINES |
Enables Unix lines mode. The new line character ('\n' ) is the only line termination method recognized in this mode. |
"i" | CASE_INSENSITIVE |
Enables case-insensitive matching. By default, CASE_INSENSITIVE matching assumes that only characters in the US-ASCII character set are being matched. You can enable Unicode-aware CASE_INSENSITIVE by specifying the UNICODE_CASE flag in conjunction with this flag. Specifying this flag may impose a slight performance penalty. |
"x" | COMMENTS |
Permits white space and comments in pattern. In this mode, white space is ignored, and embedded comments starting with # are ignored until the end of a line. |
"l" | LITERAL |
When LITERAL is specified then the input string that specifies the pattern is treated as a sequence of literal characters. There is no special meaning for Metacharacters or escape sequences. The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag. The other flags become superfluous. |
"s" | DOTALL |
Enables DOTALL mode. In DOTALL mode, the expression dot (.) matches any character, including a line terminator. However, by default, the expression dot (.) does not match line terminators. |
"u" | UNICODE_CASE |
When you enable the CASE_INSENSITIVE flag, by default, it matches using only the characters in the US-ASCII character set. When you specify UNICODE_CASE then it performs CASE_INSENSITIVE matching using the Unicode standard. Specifying this flag may impose a performance penalty. |
"c" | CANON_EQ |
When this flag is specified then two characters will be considered to match if, and only if, their full canonical decompositions match. When you specify this flag, the expression "a\u030A", for example, will match the string "\u00E5. By default, matching does not take canonical equivalence into account. Specifying this flag may impose a performance penalty. |
"U" | UNICODE_CHARACTER_CLASS |
Enables the Unicode version of Predefined character classes and POSIX character classes. When you specify this flag, then the (US-ASCII only) Predefined character classes and POSIX character classes are in conformance with Unicode Technical Standards. See http://unicode.org/reports/tr18/#Compatibility_Properties. The flag implies UNICODE_CASE; it enables Unicode-aware case folding.Specifying this flag may impose a performance penalty. |
Note:
Theregex_like
function will not be used for index scan optimization.
Note:
Run the queries provided in the below examples from the SQL shell. Make sure that you use a double backslash '\\' in place of a single backslash '\' for quotation constructs as shown in the queries.Example 6-76 Fetch from the user data application table the list of users whose last name starts with 'S'
SELECT id, firstName, lastName FROM users WHERE regex_like(lastName, "S.*")
Explanation: In the above query, you use the regex_like
function to match the lastName
field beginning with an 'S' character. You create a pattern with the first character as 'S' followed by the period metacharacter (.) and the greedy quantifier (* ) to match zero or more occurrences of any other character.
{"id":10,"firstName":"John","lastName":"Smith"}
{"id":20,"firstName":"Jane","lastName":"Smith"}
{"id":30,"firstName":"Adam","lastName":"Smith"}
{"id":40,"firstName":"Joanna","lastName":"Smith"}
Example 6-77 Fetch from the user data application table the list of users whose last name has at least one 'w' or 'W'
SELECT id, lastName FROM users WHERE regex_like(lastname,".*w.*","i")
Explanation: In the above query, use the regex_like
function with the required pattern and the 'i' flag to enable matching that is not case-sensitive.
{"id":60,"lastName":"Law"}
{"id":50,"lastName":"Winslet"}
Example 6-78 Validate the format of an e-mail address in the user data application table
SELECT id, firstName, lastName, email FROM users WHERE
regex_like(email,".*@.*\\..*")
email
field contains the e-mail address of the user. Assuming the email
field has an '@' character and ends with a '.string' pattern such as '.com', '.us', '.in', and so forth, you can validate the e-mail address using the regex_like
function as follows:
- Use the '@' character preceded and followed by a combination of period metacharacter (.) and the greedy quantifier (* ). This combination creates a pattern to match zero or more occurrences of other characters before and after the '@' character in the
email
field. - To achieve a domain name pattern such as '.com', you use the quotation construct backslash '\' character to match the period as a literal '.' character and not a period (.) metacharacter. Further, use the combination of period metacharacter (.) and the greedy quantifier (* ) to allow any domain name.
Note:
This example only validates the e-mail address format by checking the pattern availability of '@' and '.' characters in the specified order. It does not assure the validity of the e-mail address itself. All the rows that include the mentioned character pattern are fetched.{"id":50,"firstName":"Aubrey","lastName":"Winslet","email":"reachaubrey@somemail.co.us"}
{"id":60,"firstName":"Jimmy","lastName":"Law","email":"reachjimlaw@gotmail.co.us"}
{"id":20,"firstName":"Jane","lastName":"Smith","email":"jane.smith201@reachmail.com"}
{"id":10,"firstName":"John","lastName":"Smith","email":"john.smith@reachmail.com"}
Example 6-79 Fetch from the user data application table the list of users with a five-star rating for community service
SELECT id, firstName, lastName FROM users WHERE
regex_like(communityService,"\\Q*****\\E")
communityService
field. You use the regex_like
function with the quotation constructs '\Q' and '\E' to match the pattern of five asterisk '*' characters. In this query, the asterisk (* ) character is used as a literal '*' character and not as a greedy quantifier (* ).
Note:
If you create the pattern without quotation constructs, an error is generated indicating that the specified pattern for theregex_like
function is invalid.
{"id":20,"firstName":"Jane","lastName":"Smith"}
Example 6-80 Fetch from the user data application table the list of users whose street attribute of the address field has a suspected data entry error due to the presence of a tab character
SELECT id, firstName, users.address.street FROM users WHERE regex_like(users.address.street, ".*\t.*")
regex_like
function to identify the rows that include escape sequences. For a detailed list of supported predefined characters, see Table 6-2. In this query, you fetch the list of users from the user data application table whose street
attribute of the address
field erroneously includes a tab character. You use the regex_like
function with the following pattern:
- Use escape sequence '\t' to identify the tab character.
- Use the combination of period metacharacter (.) and the greedy quantifier (*) before and after the escape sequence to allow zero or more occurrences of any other character in the street attribute.
{"id":70,"firstName":"Dierdre","street":"Maine\t(Suburb)"}
SELECT id, firstName, users.address.street FROM users WHERE regex_like(users.address.street,
".*\u0009.*")
Note:
Oracle NoSQL Database supports insertion of control characters (ASCII code 0~31) and characters with ASCII code > 128 using their Unicode hexadecimal values in the SQL shell or equivalent tools.
For example, you can insert the Escape (ESC) character using its Unicode hexadecimal value 0x001B as given in the user data application table (see the state field in the row with id = 70).
SELECT id, firstName, users.address[].state FROM users WHERE
EXISTS(users.address[regex_like($element.state, ".*\\e")])
Explanation: In this query, you check whether or not the state
attribute of the address
field includes a string that ends with an escape character. You use the quotation construct '\' to match the escape character preceded by the combination of period metacharacter (.) and greedy quantifier (*) to allow zero or more occurrences of other characters before the escape character.
{"id":70,"firstName":"Dierdre","state":"TX\u001B"}
Example 6-81 Find all the bags that traveled through ORD airport as an intermediate hop between the source of a trip and the final destination
SELECT
ticketNo AS TICKETNUM,
fullName AS NAME,
BaggageInfo.bagInfo.routing[] AS ROUTING
FROM BaggageInfo WHERE regex_like(BaggageInfo.bagInfo.routing[],".*/ORD/.*")
BaggageInfo
table contains the airport codes in the format source/transit/destination
. In this query, you use the regex_like
function to match the airport code of the transit airport as follows:
- Use the combination of the period (. ) metacharacter and greedy quantifier (*) to allow any source airport code characters.
- Include the airport code of the transit airport ORD between the two forward slash '/' characters.
- Use the combination of the period (. ) metacharacter and greedy quantifier (*) to allow any destination airport code characters.
{"TICKETNUM":176234463813,"NAME":"Henry Jenkins","ROUTING":"SFO/ORD/FRA"}
{"TICKETNUM":1762392135540,"NAME":"Adelaide Willard","ROUTING":"GRU/ORD/SEA"}
Example 6-82 Find all the passengers with area code 364 in their contact phone.
SELECT ticketNo, contactPhone, fullName FROM BaggageInfo WHERE regex_like(contactPhone,"364-.*")
contactPhone
field contains the US-based contact number in a three-digit area code followed by a seven-digit local number format. Assuming the contact phone pattern in the contactPhone
field is XXX-YYY-ZZZZ where X, Y, and Z are digits between 0-9, you use the regex_like
function as follows:
- Use the area code 364 followed by a hyphen '-' character.
- Use the combination of the period (.) metacharacter and greedy quantifier (*) to allow the pattern matching of any three-digit number followed by a hyphen '-' character and a four-digit number.
{"ticketNo":1762320369957,"contactPhone":"364-610-4444","fullName":"Lorenzo Phil"}
{"ticketNo":1762320569757,"contactPhone":"364-610-4455","fullName":"Lucinda Beckman"}
User data application table
users
table.CREATE TABLE users ( id INTEGER,
firstName STRING,
lastName STRING,
otherNames ARRAY(RECORD(first STRING, last STRING)),
age INTEGER,
income INTEGER,
address JSON,
connections ARRAY(INTEGER),
email STRING,
communityService STRING,
PRIMARY KEY (id) )
users
tables with sample
rows.INSERT INTO users VALUES (
10,
"John",
"Smith",
[ {"first" : "Johny", "last" : "Good"}, {"first" : "Johny2", "last" : "Brave"},{"first" : "Johny3", "last" : "Kind"},{"first" : "Johny4", "last" : "Humble"} ],
22,
45000,
{
"street" : "Pacific Ave",
"number" : 101,
"city" : "Santa Cruz",
"state" : "CA",
"zip" : 95008,
"phones" : [
{ "area" : 408, "number" : 4538955, "kind" : "work" },
{ "area" : 831, "number" : 7533341, "kind" : "home" },
{ "area" : 831, "number" : 7533382, "kind" : "mobile" }
]
},
[ 30, 55, 43 ],
"john.smith@reachmail.com",
"****"
)
INSERT INTO users VALUES (
20,
"Jane",
"Smith",
[ {"first" : "Jane", "last" : "BeGood"} ],
22,
55000,
{
"street" : "Atlantic Ave",
"number" : 201,
"city" : "San Jose",
"state" : "CA",
"zip" : 95005,
"phones" : [
{ "area" : 608, "number" : 6538955, "kind" : "work" },
{ "area" : 931, "number" : 9533341, "kind" : "home" },
{ "area" : 931, "number" : 9533382, "kind" : "mobile" }
]
},
[ 40, 75, 63 ],
"jane.smith201@reachmail.com",
"*****"
)
INSERT INTO users VALUES (
30,
"Adam",
"Smith",
[ {"first" : "Adam", "last" : "BeGood"} ],
45,
75000,
{
"street" : "Indian Ave",
"number" : 301,
"city" : "Houston",
"state" : "TX",
"zip" : 95075,
"phones" : [
{ "area" : 618, "number" : 6618955, "kind" : "work" },
{ "area" : 951, "number" : 9613341, "kind" : "home" },
{ "area" : 981, "number" : 9613382, "kind" : "mobile" }
]
},
[ 60, 45, 73 ],
"adam.smith201reachmail.com",
"***"
)
INSERT INTO users VALUES (
40,
"Joanna",
"Smith",
[ {"first" : "Joanna", "last" : "Smart"} ],
NULL,
75000,
{
"street" : "Tex Ave",
"number" : 401,
"city" : "Houston",
"state" : "TX",
"zip" : 95085,
"phones" : [
{ "area" : NULL, "number" : 1618955, "kind" : "work" },
{ "area" : 451, "number" : 4613341, "kind" : "home" },
{ "area" : 481, "number" : 4613382, "kind" : "mobile" }
]
},
[ 70, 30, 40 ],
NULL,
"**"
)
INSERT INTO users VALUES (
50,
"Aubrey",
"Winslet",
[ {"first" : "Aubrey", "last" : "Regal"} ],
NULL,
15000,
{
"street" : "Tex Ave",
"number" :501,
"city" : "Houston",
"state" : "TX",
"zip" : 95085,
"phones" : [
{ "area" : 723, "number" : 7618955, "kind" : "work" },
{ "area" : 751, "number" : 7613341, "kind" : "home" },
{ "area" : 781, "number" : 7613382, "kind" : "mobile" }
]
},
[ 50, 20, 40 ],
"reachaubrey@somemail.co.us",
"****"
)
INSERT INTO users VALUES (
60,
"Jimmy",
"Law",
[ {"first" : "Jimmy", "last" : "Smart"} ],
NULL,
25000,
{
"street" : "Maine",
"number" :501,
"city" : "San Jose",
"state" : "TX",
"zip" : 95085,
"phones" : [
{ "area" : 223, "number" : 2618955, "kind" : "work" },
{ "area" : 251, "number" : 2613341, "kind" : "home" },
{ "area" : 281, "number" : 2613382, "kind" : "mobile" }
]
},
[ 50, 20, 40 ],
"reachjimlaw@gotmail.co.us",
"***"
)
INSERT INTO users VALUES (
70,
"Dierdre",
"Amador",
[ {"first" : "Dierdre", "last" : "Kind"} ],
NULL,
25000,
{
"street" : "Maine\t(Suburb)",
"number" :701,
"city" : "San Jose",
"state" : "TX\u001B",
"zip" : 95085,
"phones" : [
{ "area" : 223, "number" : 6718955, "kind" : "work" },
{ "area" : 251, "number" : 6213341, "kind" : "home" },
{ "area" : 281, "number" : 6213382, "kind" : "mobile" }
]
},
[ 10, 60, 40 ],
NULL,
"***"
)