Regular Functions

Regular functions work as if they are applied to each row separately (for each row, the result of the function does not depend on the other rows). Regular functions have these characteristics:

Strong typing: In contrast to standard SQL, Cytomic Orion does not make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
Common subexpression elimination: All expressions in a query that have the same AST (the same record or same result of syntactic parsing) are considered to have identical values. Such expressions are concatenated and executed once. Identical subqueries are also eliminated this way.
Types of results: All functions return a single return as the result (not multiple values, and not zero values). The type of result is usually defined only by the types of arguments, not by the values.
Constants: For simplicity, certain functions can only work with constants for some arguments. For example, the right argument of the LIKE operator must be a constant. Almost all functions return a constant for constant arguments. The exception is functions that generate random numbers. The now function returns different values for queries that were run at different times, but the result is considered a constant, since constancy is only important within a single query. A constant expression is also considered a constant (for example, the right half of the LIKE operator can be constructed from multiple constants).
NULL processing: If at least one of the arguments of the function is NULL, the function result is also NULL, except in functions where it is specified otherwise.
Constancy: Functions cannot change the values of their arguments – any changes are returned as the result. Thus, the result of calculating separate functions does not depend on the order in which the functions are written in the query.
Error handling: Some functions can generate an exception if the data is invalid. In this case, the query is canceled and Cytomic Orion returns an error message to the client.
Argument evaluation: In almost all programming languages, certain arguments might not be evaluated with some operators, such as &&, ||, and ?:. In Cytomic Orion, the arguments of functions (operators) are always evaluated. This is because whole parts of the columns are evaluated at the same time instead of calculating each row separately.

These are the most important functions:

Arithmetic Functions

For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it is a floating-point number. If there are not enough bits, the highest bit type is taken.

Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.

Function	Description
plus(a, b) a + b operator	Calculates the sum of two numbers. You can also add integer numbers with a date (Date) or date and time (DateTime). In the case of a date, adding an integer means adding the corresponding number of days. For a date with time, it means adding the corresponding number of seconds.
minus(a, b) a - b operator	Calculates the difference between two numbers. The result is always signed. You can also calculate integer numbers from a date (Date) or date with time (DateTime).
divide (a, b) a / b operator	Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the `intDiv` function. When dividing by zero you get `inf`, `-inf`, or `nan`.
intDiv (a, b)	Calculates the quotient of the numbers. Divides into integers, rounding down (by the absolute value). An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
intDivOrZero (a, b)	Differs from `intDiv` in that it returns zero when dividing by zero or when dividing a minimal negative number by minus one.
module (a, b) a % b operator	Calculates the remainder after division.. If arguments are floating-point numbers, they are pre-converted to integers. The remainder is taken in the same sense as in C++. Truncated division is used for negative numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
negate(a) -a operator	Calculates a number with the reverse sign. The result is always signed.
abs (a)	Calculates the absolute value of the number (`a`). That is, if a < 0, it returns -`a`. For unsigned types it does not do anything. For signed integer types, it returns an unsigned number.
gcd (a, b)	Returns the greatest common divisor of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
lcm(a, b)	Returns the least common multiple of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.
Arithmetic functions

Comparison Functions

Comparison functions always return 0 or 1 (Uint8). These types can be compared:

Numbers.
Character strings (String) and fixed-length character strings (FixedString(N)).
Dates (Date).
Dates with times (DateTime)

For example, you cannot compare a date with a string. You have to use a function to convert the string to a date, or vice versa.

Strings are compared by bytes. A shorter string is smaller than all strings that start with it and that contain at least one more character..

The comparison operators are:

Equals: a = b and a == b
NotEquals: a != b and a <> b
Less: a < b
Greater: a > b
LessOrEquals: a <= b
GreaterOrEquals: a >= b

Logical Functions

Logical functions accept any numeric types, but return a UInt8 number equal to 0 or 1.

Zero as an argument is considered "false," while any non-zero value is considered "true".

AND: AND
OR: OR
Not: NOT
Xor: XOR

Type Conversion Functions

The basic supported conversions are:

Conversion to unsigned data types: toUInt8, toUInt16, toUInt32, toUInt64.
Conversion to signed data types: toInt8, toInt16, toInt32, toInt64, toFloat32, toFloat64, toDate, toDateTime.
Conversion to zero if error: toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero, toDateOrZero, toDateTimeOrZero.
Conversion to null if error: toUInt8OrNull, toUInt16OrNull, toUInt32OrNull, toUInt64OrNull, toInt8OrNull, toInt16OrNull, toInt32OrNull, toInt64OrNull, toFloat32OrNull, toFloat64OrNull, toDateOrNull, toDateTimeOrNull.

These are more complex data type conversions:

Function	Description
toDecimal32(value, S), toDecimal64(value, S), toDecimal128(value, S)	Converts `value` to decimal of precision `S`. The `value` can be a number or a string. The `S` parameter specifies the number of decimal places.
toString	Functions for converting between numbers, strings (but not fixed strings), dates (Dates), and dates with times (DateTimes). All these functions accept one argument. When converting to or from a string, the value is formatted or parsed using the same rules as for the tab-separated format (TSV). If the string cannot be parsed, an exception is thrown and the request is canceled. When converting dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix epoch (1/1/1970). The date and date-with-time formats for the `toDate`/`toDateTime` functions are defined as follows: `YYYY-MM-DD` `YYYY-MM-DD hh: mm: ss` As an exception, if converting from UInt32, Int32, UInt64, or Int64 numeric types to Date, and if the number is greater than or equal to 65536, the number is interpreted as a Unix timestamp (and not as the number of days). This allows support for the common occurrence of writing `toDate (unix_timestamp)`, which otherwise would be an error and would require writing `toDate (toDateTime (unix_timestamp))`. Conversion between a date (Date) and date with time (DateTime) is performed the natural way: by adding a null time or deleting the time. Conversion between numeric types uses the same rules as assignments between different numeric types in C++. Additionally, the `toString` function of the `DateTime` argument can take a second String argument containing the name of the time zone.
toFixedString(s, N)	Converts a String type argument to a FixedString(N) type (a string with fixed length `N`). `N` must be a constant. If the string has fewer bytes than `N`, it is passed with null bytes to the right. If the string has more bytes than `N`, an exception is thrown.
toStringCutToZero(s)	Accepts a String or FixedString argument. It returns the String with the content truncated at the first zero byte found.
reinterpretAsUInt8, reinterpretAsUInt16, reinterpretAsUInt32, reinterpretAsUInt64 reinterpretAsInt8, reinterpretAsInt16, reinterpretAsInt32, reinterpretAsInt64 reinterpretAsFloat32, reinterpretAsFloat64 reinterpretAsDate, reinterpretAsDateTime	These functions accept a string and interpret the bytes placed at the beginning of the string as a number (little endian). If the string isn't long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date (Date) is interpreted as the number of days since the beginning of the Unix Epoch (1/1/1970), and a date with time (DateTime) is interpreted as the number of seconds since the beginning of the Unix Epoch.
reinterpretAsString	This function accepts a number or date (Date) or date with time (DateTime), and returns a string containing bytes representing the corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.
reinterpretAsFixedString	This function accepts a number or date (Date) or date with time (DateTime), and returns a FixedString containing bytes representing the corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.
CAST(x, t)	Converts `x` to the `t` data type.
toIntervalYear, toIntervalQuarter, toIntervalMonth, toIntervalWeek, toIntervalDay, toIntervalHour, toIntervalMinute, toIntervalSecond	Converts a Number type argument to a Interval type (duration). The interval type is actually very useful, you can use this type of data to perform arithmetic operations directly with Date or DateTime.
parseDateTimeBestEffort	Parses a number type argument to a Date or DateTime type. Unlike toDate and toDateTime, `parseDateTimeBestEffort` can return a more complex date format.
parseDateTimeBestEffortOrNull	Same as for `parseDateTimeBestEffort`, except that it returns null when it encounters a date format that cannot be processed.
parseDateTimeBestEffortOrZero	Same as for `parseDateTimeBestEffort`, except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
Type conversion functions

Functions for Working with Dates and Times

All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument. Only time zones that differ from UTC by a whole number of hours are supported

Function	Description
toTimeZone	Converts time (Date) or date and time (DateTime) to the specified time zone.
toYear	Converts a date (Date) or date with time (DateTime) to a UInt16 number containing the year number.
toQuarter	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the quarter number.
toMonth	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the month number (1-12).
toDayOfYear	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the year (1-366).
toDayOfMonth	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the month (1-31).
toDayOfWeek	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
toHour	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead for daylight saving time, it is by one hour and occurs at 2 A.M., and if clocks are moved back, it is by one hour and occurs at 3 A.M.
toMinute	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the minute of the hour (0-59).
toSecond	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for.
toUnixTimestamp	Converts a date (Date) or date with time (DateTime) to a unix timestamp.
toStartOfYear	Rounds down a date (Date) or date with time (DateTime) to the first day of the year. It returns the date (Date).
toStartOfISOYear	Rounds down a date (Date) or date with time (DateTime) to the first day of ISO year. It returns the date (Date).
toStartOfQuarter	Rounds down a date (Date) or date with time (DateTime) to the first day of the quarter (1 January, 1 April, 1 July, or 1 October). It returns the date (Date).
toStartOfMonth	Rounds down a date (Date) or date with time (DateTime) to the first day of the month. It returns the date (Date).
toMonday	Rounds down a date (Date) or date with time (DateTime) to the nearest Monday. It returns the date (Date).
toStartOfDay	Rounds down a date (Date) or date with time (DateTime) to the start of the day.
toStartOfHour	Rounds down a date (Date) or date with time (DateTime) to the start of the hour.
toStartOfMinute	Rounds down a date (Date) or date with time (DateTime) to the start of the minute.
toStartOfFiveMinute	Rounds down a date (Date) or date with time (DateTime) to the start of the five-minute interval.
toStartOfTenMinutes	Rounds down a date (Date) or date with time (DateTime) to the start of the ten-minute interval.
toStartOfFifteenMinutes	Rounds down a date (Date) or date with time (DateTime) to the start of the fifteen-minute interval.
toTime	Converts a date (Date) or date with time (DateTime) to a certain fixed date, while preserving the time.
toRelativeYearNum	Converts a date (Date) or date with time (DateTime) to the number of the year, starting from a certain fixed point in the past.
toRelativeQuarterNum	Converts a date (Date) or date with time (DateTime) to the number of the quarter, starting from a certain fixed point in the past.
toRelativeMonthNum	Converts a date (Date) or date with time (DateTime) to the number of the month, starting from a certain fixed point in the past.
toRelativeWeekNum	Converts a date (Date) or date with time (DateTime) to the number of the week, starting from a certain fixed point in the past.
toRelativeDayNum	Converts a date (Date) or date with time (DateTime) to the number of the day, starting from a certain fixed point in the past.
toRelativeHourNum	Converts a date (Date) or date with time (DateTime) to the number of the hour, starting from a certain fixed point in the past.
toRelativeMinuteNum	Converts a date (Date) or date with time (DateTime) to the number of the minute, starting from a certain fixed point in the past.
toRelativeSecondNum	Converts a date (Date) or date with time (DateTime) to the number of the second, starting from a certain fixed point in the past.
toISOYear	Converts a date (Date) or date with time (DateTime) to a UInt16 number containing the ISO year number.
toISOWeek	Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the ISO week number.
now	Accepts zero arguments. It returns the current time at the time of function execution. This function returns a constant.
today	Accepts zero arguments. It returns the current date at the time of function execution. The same as `toDate (now ())`.
yesterday	Accepts zero arguments. It returns yesterday's date at the time of function execution. The same as `today () - 1`.
timeSlot	Rounds the time to the half hour.
toYYYYMM	Converts a date (Date) or date with time (DateTime) to a UInt32 number containing the year and month number (YYYY * 100 + MM).
toYYYYMMDD	Converts a date (Date) or date with time (DateTime) to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).
toYYYYMMDDhhmmss	Converts a date (Date) or date with time (DateTime) to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters	This function adds a Date/DateTime interval to a Date/DateTime variable and then return the Date/DateTime.
subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters	This function subtracts a Date/DateTime interval to a Date/DateTime variable and then return the Date/DateTime.
dateDiff('unit', t1, t2, [timezone])	Returns the difference between two times expressed in `unit` e.g. `hours`. `t1` and `t2` can be Date or DateTime. If `timezone` is specified, it is applied to both arguments. If not, timezones from data types `t1` and `t2` are used. If the timezones are not the same, the result is unspecified. Supported unit values: `second, minute, hour, day, week, month, quarter, year`.
formatDateTime(Time, Format[, Timezone])	This function formats `Time` according to the format specified in the `Format` string. For more information about the format parameters, see table Format codes for the formatDateTime function.
Functions for working with dates and times

Format Codes for the formatDateTime Function

Code	Description	Example
%C	Year divided by 100 and truncated to integer (00-99).	20
%d	Day of the month, zero-padded (01-31).	02
%D	Short MM/DD/YY date, equivalent to %m/%d/%y.	01/07/2023
%e	Day of the month, space-padded ( 1-31)	2
%F	Short YYYY-MM-DD date, equivalent to %Y-%m-%d.	2023-01-07
%H	Hour in 24h format (00-23).	22
%I	Hour in 12h format (01-12).	10
%j	Day of the year (001-366).	007
%m	Month as a decimal number (01-12).	01
%M	Minute (00-59).	33
%n	New-line character. '`\n`'.
%p	AM or PM.	PM
%R	24-hour HH:MM time, equivalent to %H:%M.	22:33
%S	Second (00-59).	44
%t	Horizontal-tab character (‘`\t`').
%T	ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S.	22:33:44
%u	ISO 8601 weekday as number with Monday as 1 (1-7).	2
%V	ISO 8601 week number (01-53).	01
%w	Weekday as a decimal number with Sunday as 0 (0-6).	2
%y	Year, last two digits (00-99).	23
%Y	Year	2023
%%	A `%` sign.
Format codes for the formatDateTime function

Functions for Working with Strings

Function	Description
empty	Returns 1 for an empty string or 0 for a non-empty string. The result type is UInt8. A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
notEmpty	Returns 0 for an empty string or 1 for a non-empty string. The result type is UInt8. The function also works for arrays.
length	Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64.
lengthUTF8	Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.
char_length, CHAR_LENGTH	Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.
character_length, CHARACTER_LENGTH	Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.
lower, lcase	Converts ASCII Latin symbols in a string to lowercase.
upper, ucase	Converts ASCII Latin symbols in a string to uppercase.
lowerUTF8	Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.
upperUTF8	Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.
isValidUTF8	Returns 1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise 0.
reverse	Reverses the string (as a sequence of bytes).
reverseUTF8	Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does not throw an exception.
concat(s1, s2, ...)	Concatenates the strings listed in the arguments, without a separator.
concatAssumeInjective(s1, s2, ...)	Same as `concat`, the difference is that you need to ensure that `concat (s1, s2, s3) -> s4` is injective. It is used for optimization the `GROUP BY` clause.
substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)	Returns a substring starting with the byte from the `offset` index that is `length` bytes long. Character indexing starts from one (as in standard SQL). The `offset` and `length` arguments must be constants.
substringUTF8(s, offset, length)	The same as `substring`, but for Unicode code points. It works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it does not throw an exception.
appendTrailingCharIfAbsent(s, c)	If the `s` string is non-empty and does not contain the `c` character at the end, it appends the `c` character to the end.
convertCharset(s, from, to)	Returns the string `s` that was converted from the encoding in `from` to the encoding in `to`.
base64Encode(s)	Encodes the `s` string into base64.
base64Decode(s)	Decodes the base64-encoded string `s` into its original string. In case of failure, it raises an exception.
tryBase64Decode(s)	Similar to `base64Decode`, but in case of error an empty string is returned.
endsWith(s, suffix)	Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
startsWith(s, prefix)	Returns 1 if the string starts with the specified prefix, otherwise it returns 0.
trimLeft(s)	Returns a string that removes the whitespace characters on the left side.
trimRight(s)	Returns a string that removes the whitespace characters on the right side.
trimBoth(s)	Returns a string that removes the whitespace characters on either side.
Functions for working with strings

Functions for Searching Strings

The search is case-sensitive by default in all these functions. There are separate variants for case insensitive search.

Function	Description
position(haystack, needle), locate(haystack, needle)	Searches for the substring `needle` in the string `haystack`. It returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found. For a case-insensitive search, use the function `positionCaseInsensitive`.
positionUTF8(haystack, needle)	The same as `position`, but the position is returned in Unicode code points. It works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it does not throw an exception. For a case-insensitive search, use the function `positionCaseInsensitiveUTF8`.
multiSearchFirstPosition(haystack, [needle1, needle2, ..., needlen])	The same as `position` but returns the leftmost offset of the string `haystack` that matches some of the `needle`. For a case-insensitive search or/and in UTF-8 format use functions `multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8`.
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needlen])	Returns the index `i` (starting from 1) of the leftmost found `needle` in the string `haystack` and 0 otherwise. For a case-insensitive search or/and in UTF-8 format use functions: `multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8`.
multiSearchAny(haystack, [needle1, needle2, ..., needlen])	Returns 1, if at least one string `needle` matches the string `haystack` and 0 otherwise. For a case-insensitive search or/and in UTF-8 format use functions: `multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8.` Note: In all `multiSearch` * functions, the number of `needle` parameters should be less than 28.
match(haystack, pattern)	Checks whether the string matches the pattern regular expression. The syntax of the re2 regular expressions is more limited than the syntax of the Perl regular expressions. It returns 0 if it does not match, or 1 if it matches. Note that the backslash symbol (`\`) is used for escaping in the regular expression. The same symbol is used for escaping in string literals. So, in order to escape the symbol in a regular expression, you must write two backslashes (`\\`) in a string literal. The regular expression works with the string as if it is a set of bytes. The regular expression cannot contain null bytes. For patterns to search for substrings in a string, it is better to use `LIKE` or `position`, necause they work much faster.
multiMatchAny(haystack, [pattern1, pattern2, ..., patternn])	The same as `match`, but it returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. For patterns to search substrings in a string, it is better to use `multiSearchAny`, because it works much faster. Note: The length of any of the `haystack` string must be less than 232 bytes, otherwise an exception is thrown.
multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternn])	The same as `multiMatchAny`, but it returns any index that matches the `haystack`.
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternn])	The same as `multiMatchAny`, but it returns 1 if any pattern matches the `haystack` within a constant edit distance. This function is in an experimental mode and can be extremely slow.
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternn])	The same as `multiFuzzyMatchAny`, but it returns any index that matches the `haystack` within a constant edit distance. Note: `multiFuzzyMatch *` functions do not support UTF-8 regular expressions, and such expressions are treated as bytes.
extract(haystack, pattern)	Extracts a fragment of a string using a regular expression. If `haystack` does not match the `pattern` regex, an empty string is returned. If the regex doesn’t contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.
like(haystack, pattern), haystack LIKE pattern operator	Checks whether a string matches a simple regular expression. The regular expression can contain the metasymbols ‘`%`’ and ‘`_`’. ‘%’: indicates any quantity of any bytes (including zero characters). ‘_’: indicates any one byte.. Use the backslash (`\`) for escaping metasymbols. See the note on escaping in the description of the `match` function. For regular expressions such as `%needle%` the code is more optimal and works as fast as the `position` function. For other regular expressions, the code is the same as for the `match` function.
notLike(haystack, pattern), haystack NOT LIKE pattern operator	The same thing as `like`, but negative.
Functions for searching strings

Functions for Replacing in Strings

Function	Description
replaceOne(haystack, pattern, replacement)	Replaces the first occurrence, if it exists, of the `pattern` substring in `haystack` with the `replacement` substring. `pattern` and `replacement` must be constants.
replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)	Replaces all occurrences of the `pattern` substring in `haystack` with the `replacement` substring.
replaceRegexpOne(haystack, pattern, replacement)	Replacement using the `pattern` regular expression. A re2 regular expression. It replaces only the first occurrence, if it exists. A pattern can be specified as `replacement`. This pattern can include substitutions `\0 - \9`. The substitution `\0` includes the entire regular expression. Substitutions `\1 - \9` correspond to the subpattern numbers. To use the `\` character in a template, escape it using `\`. Also keep in mind that a string literal requires an extra escape. Example. Copying a string ten times: `SELECT replaceRegexpOne ('Hello, World!', '. *', '\\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0) AS res` Result: `Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! ¦`
replaceRegexpAll(haystack, pattern, replacement)	This does the same thing as `replaceRegexpOne`, but replaces all the occurrences. Example: `SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res` `HHeelllloo,, WWoorrlldd!!` If a regular expression worked on an empty substring, the replacement is not made more than once. Example: `SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res` `here: Hello, World!`
regexpQuoteMeta(s)	The function adds a backslash before some predefined characters in the string. Predefined characters: `'0', '\', '\|', '(', ')', '^', '$', '.', '[', ']', '?', '*' , '+', '{', ':', '-'`. This implementation slightly differs from re2. It escapes zero byte as `\0` instead of `\x00`, and escapes only required characters.
Functions for replacing in strings

Mathematical Functions

All these functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.

Function	Description
e()	Returns a Float64 number that is close to the number e.
pi()	Returns a Float64 number that is close to the number π.
exp(x)	Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
log(x), ln(x)	Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.
exp2(x)	Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.
log2(x)	Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.
exp10(x)	Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.
log10(x)	Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.
sqrt(x)	Accepts a numeric argument and returns a Float64 number close to the square root of the argument.
cbrt(x)	Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.
erf(x)	If `x` is non-negative, then `erf` is the probability that a random variable having a normal distribution with standard deviation `s` takes the value that is separated from the expected value by more than `x`.
erfc(x)	Accepts a numeric argument and returns a Float64 number close to `1 - erf (x)`, but without loss of precision for large `x` values.
lgamma(x)	The logarithm of the gamma function.
tgamma(x)	Gamma function.
sin(x)	The sine.
cos(x)	The cosine.
tan(x)	The tangent.
asin(x)	The arc sine.
acos(x)	The arc cosine.
atan(x)	The arc tangent.
pow(x, y), power(x, y)	Takes two numeric arguments `x` and `y`. It returns a Float64 number close to `x` to the power of `y`.
intExp2	Accepts a numeric argument and returns a UInt64 number close to 2 to the power of `x`.
intExp10	Accepts a numeric argument and returns a UInt64 number close to 10 to the power of `x` .
Arithmetic functions

Rounding Functions

Function	Description
floor(x[, N])	Returns the largest round number that is less than or equal to `x`. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N is not exact. `N` is an integer constant, optional parameter. By default it is zero, which means to round to an integer. `N` may be negative. Examples: `floor (123.45, 1) = 123.4` `floor (123.45, -1) = 120.` x is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative `N` value. For a non-negative `N`, the function does not do anything.
ceil(x[, N]), ceiling(x[, N])	Returns the smallest round number that is greater than or equal to `x`. In every other way, it is the same as the `floor` function.
round(x[, N])	Rounds `x` to a specified number of decimal places (`N`). It rounds to the nearest even integer. In case when given number has equal distance to surrounding numbers, the function returns the number that has the closest even digit. Parameters: x: The number to be rounded. It can be any expression returning the numeric data type. N: An integer value. If `N` > 0 then the function rounds the value to the right of the decimal point. If `N` < 0 then the function rounds the value to the left of the decimal point. If `N` = 0 then the function rounds the value to integer. In this case, the argument can be omitted. Returned value: The rounded number of the same type as the input number.
roundToExp2(num)	Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two..
Rounding functions

Random Number Generation Functions

Non-cryptographic generators of pseudo-random numbers are used. All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The only purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.

Function	Description
rand	Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers. It uses a linear congruential generator.
rand64	Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers. It uses a linear congruential generator.
randConstant	Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Functions for generating random numbers

Encoding Functions

Function	Description
hex	Accepts arguments of types: String, UInt, Date, or DateTime. It returns a string containing the argument's hexadecimal representation. It uses uppercase letters A-F. It does not use 0x prefixes or h suffixes. For character strings, all bytes are simply encoded as two hexadecimal numbers. Numbers are converted to big endian format. Date is encoded as the number of days since the beginning of the Unix epoch. DateTime is encoded as the number of seconds since the beginning of the Unix epoch.
unhex(str)	Accepts a string containing any number of hexadecimal digits, and returns a string containing the corresponding bytes. It supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the younger half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, an exception is not thrown. If you want to convert the result to a number, you can use the `reverse` and `reinterpretAsType` functions.
bitmaskToList(num)	Accepts an integer. It returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.
Encoding functions

Functions for Working with URLs

Function Description

protocol

Returns the protocol. Examples: http, ftp, mailto, imap, etc.

domain

Returns the domain.

domainWithoutWWW

Returns the domain and removes no more than one 'www.' from the beginning of it, if present

topLevelDomain

Returns the top-level domain.

Example: .com.

firstSignificantSubdomain

Returns the "first significant subdomain".

The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'.
Otherwise, it is a third-level domain.

cutToFirstSignificantSubdomain

Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).

path

Returns the path. The path does not include the query string.

pathFull

The same as above, but including query string and fragment.

Example: /top/news.html?page=2#comments

queryString

Returns the query string. The query string does not include the initial question mark, #, or anything after #.

fragmenttext

Returns the fragment identifier. It does not include the initial hash # symbol.

queryStringAndFragment

Returns the query string and fragment identifier.

extractURLParameter(URL, name)

Returns the value of the name parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

extractURLParameters(URL)

Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.

extractURLParameterNames(URL)

Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.

URLHierarchy(URL)

Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.

Example:

URLPathHierarchy('https://example.com/browse/CONV-6788') =

[

'/browse/',

'/browse/CONV-6788'

]

URLPathHierarchy(URL)

The same as above, but without the protocol and host in the result. The /’ element (root) is not included.

decodeURLComponent(URL)

Returns the decoded URL.

cutWWW

Removes no more than one 'www.' from the beginning of the URL's domain, if present.

cutQueryString

Removes the query string. The question mark is also removed.

cutFragment

Removes the fragment identifier. The number sign is also removed.

cutQueryStringAndFragment

Removes the query string and fragment identifier. The question mark and number sign are also removed.

cutURLParameter(URL, name)

Removes the 'name' URL parameter, if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

Functions for working with URLs

Functions for Working with IP Addresses

Function	Description
IPv4NumToString (num)	Takes a UInt32 number. Interprets it as an IPv4 address in big endian. It returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).
IPv4StringToNum(s)	The reverse function of `IPv4NumToString`. If the IPv4 address has an invalid format, it returns 0.
IPv4NumToStringClassC(num)	Similar to `IPv4NumToString`, but using xxx instead of the last octet.
IPv6NumToString(x)	Accepts a FixedString(16) value containing the IPv6 address in binary format. It returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format :: ffff: 111.222.33.44.
IPv6StringToNum(s)	The reverse function of `IPv6NumToString`. If the IPv6 address has an invalid format, it returns a string of null bytes. HEX can be uppercase or lowercase.
IPv4ToIPv6(x)	Takes a UInt32 number. Interprets it as an IPv4 address in big endian. It returns a FixedString(16) value containing the IPv6 address in binary format.
cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4)	Accepts a FixedString(16) value containing the IPv6 address in binary format. It returns a string containing the address of the specified number of bits removed in text format.
IPv4CIDRtoIPv4Range(ipv4, cidr)	Accepts an IPv4 and an UInt8 value containing the CIDR. It returns a tuple with two IPv4 containing the lower range and the higher range of the subnet.
IPv6CIDRtoIPv6Range(ipv6, cidr)	Accepts an IPv6 and an UInt8 value containing the CIDR. It returns a tuple with two IPv6 containing the lower range and the higher range of the subnet.
Functions for working with IP addresses

Functions for Working with Nullable Arguments

Function	Description
isNull(x)	Checks whether the argument is NULL. Parameters: x: A value with a non-compound data type. Returned value: 1: If x is NULL. 0: If x is not NULL.
isNotNull (x)	Parameters: x: A value with a non-compound data type. Returned value: 0: If x is NULL. 1: If x is not NULL.
coalesce(x,...)	Parameters: Any number of parameters of a non-compound type. All parameters must be compatible by data type. Returned values: The first non-NULL argument. NULL, if all arguments are NULL.
ifNull	Returns an alternative value if the main argument is NULL.
ifNull (x, alt)	Parameters: x: The value to check for NULL. alt: The value that the function returns if `x` is NULL. Returned values The value `x`, if `x` is not NULL. The value `alt`, if `x` is NULL.
nullIf(x,y)	Returns NULL if the arguments are equal. Parameters: x, y: Values for comparison. They must be compatible types, or the solution generates an exception. Returned values: NULL, if the arguments are equal. The x value, if the arguments are not equal.
assumeNotNull(x)	Parameters: x: The original value. Returned values The original value from the non-Nullable type, if it is not NULL. The default value for the non-Nullable type if the original value was NULL.
toNullable(x)	Converts the argument type to Nullable. Parameters: x: The value of any non-compound type. Returned value: The input value with a non-Nullable type.
Functions for working with Nullable aggregates