Regular Functions

Regular functions work as if they are applied to each row separately (for each row, the result of the function does not depend on the other rows). Regular functions have these characteristics:

  • Strong typing: In contrast to standard SQL, Cytomic Orion does not make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.

  • Common subexpression elimination: All expressions in a query that have the same AST (the same record or same result of syntactic parsing) are considered to have identical values. Such expressions are concatenated and executed once. Identical subqueries are also eliminated this way.

  • Types of results: All functions return a single return as the result (not multiple values, and not zero values). The type of result is usually defined only by the types of arguments, not by the values.

  • Constants: For simplicity, certain functions can only work with constants for some arguments. For example, the right argument of the LIKE operator must be a constant. Almost all functions return a constant for constant arguments. The exception is functions that generate random numbers. The now function returns different values for queries that were run at different times, but the result is considered a constant, since constancy is only important within a single query. A constant expression is also considered a constant (for example, the right half of the LIKE operator can be constructed from multiple constants).

  • NULL processing: If at least one of the arguments of the function is NULL, the function result is also NULL, except in functions where it is specified otherwise.

  • Constancy: Functions cannot change the values of their arguments – any changes are returned as the result. Thus, the result of calculating separate functions does not depend on the order in which the functions are written in the query.

  • Error handling: Some functions can generate an exception if the data is invalid. In this case, the query is canceled and Cytomic Orion returns an error message to the client.

  • Argument evaluation: In almost all programming languages, certain arguments might not be evaluated with some operators, such as &&, ||, and ?:. In Cytomic Orion, the arguments of functions (operators) are always evaluated. This is because whole parts of the columns are evaluated at the same time instead of calculating each row separately.

These are the most important functions:

Arithmetic Functions

For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it is a floating-point number. If there are not enough bits, the highest bit type is taken.

Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.

Function Description

plus(a, b)

a + b operator

Calculates the sum of two numbers. You can also add integer numbers with a date (Date) or date and time (DateTime). In the case of a date, adding an integer means adding the corresponding number of days. For a date with time, it means adding the corresponding number of seconds.

minus(a, b)

a - b operator

Calculates the difference between two numbers. The result is always signed. You can also calculate integer numbers from a date (Date) or date with time (DateTime).

divide (a, b)

a / b operator

Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the intDiv function. When dividing by zero you get inf, -inf, or nan.

intDiv (a, b)

Calculates the quotient of the numbers. Divides into integers, rounding down (by the absolute value). An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

intDivOrZero (a, b)

Differs from intDiv in that it returns zero when dividing by zero or when dividing a minimal negative number by minus one.

module (a, b)

a % b operator

Calculates the remainder after division.. If arguments are floating-point numbers, they are pre-converted to integers. The remainder is taken in the same sense as in C++. Truncated division is used for negative numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

negate(a)

-a operator

Calculates a number with the reverse sign. The result is always signed.

abs (a)

Calculates the absolute value of the number (a). That is, if a < 0, it returns -a. For unsigned types it does not do anything. For signed integer types, it returns an unsigned number.

gcd (a, b)

Returns the greatest common divisor of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

lcm(a, b)

Returns the least common multiple of the numbers. An exception is thrown when dividing by zero or when dividing a minimal negative number by minus one.

Arithmetic functions

Comparison Functions

Comparison functions always return 0 or 1 (Uint8). These types can be compared:

  • Numbers.

  • Character strings (String) and fixed-length character strings (FixedString(N)).

  • Dates (Date).

  • Dates with times (DateTime)

For example, you cannot compare a date with a string. You have to use a function to convert the string to a date, or vice versa.

Strings are compared by bytes. A shorter string is smaller than all strings that start with it and that contain at least one more character..

The comparison operators are:

  • Equals: a = b and a == b

  • NotEquals: a != b and a <> b

  • Less: a < b

  • Greater: a > b

  • LessOrEquals: a <= b

  • GreaterOrEquals: a >= b

Logical Functions

Logical functions accept any numeric types, but return a UInt8 number equal to 0 or 1.

Zero as an argument is considered "false," while any non-zero value is considered "true".

  • AND: AND

  • OR: OR

  • Not: NOT

  • Xor: XOR

Type Conversion Functions

The basic supported conversions are:

  • Conversion to unsigned data types: toUInt8, toUInt16, toUInt32, toUInt64.

  • Conversion to signed data types: toInt8, toInt16, toInt32, toInt64, toFloat32, toFloat64, toDate, toDateTime.

  • Conversion to zero if error: toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero, toDateOrZero, toDateTimeOrZero.

  • Conversion to null if error: toUInt8OrNull, toUInt16OrNull, toUInt32OrNull, toUInt64OrNull, toInt8OrNull, toInt16OrNull, toInt32OrNull, toInt64OrNull, toFloat32OrNull, toFloat64OrNull, toDateOrNull, toDateTimeOrNull.

These are more complex data type conversions:

Function Description

toDecimal32(value, S), toDecimal64(value, S), toDecimal128(value, S)

Converts value to decimal of precision S. The value can be a number or a string. The S parameter specifies the number of decimal places.

toString

Functions for converting between numbers, strings (but not fixed strings), dates (Dates), and dates with times (DateTimes). All these functions accept one argument.

When converting to or from a string, the value is formatted or parsed using the same rules as for the tab-separated format (TSV). If the string cannot be parsed, an exception is thrown and the request is canceled.

When converting dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix epoch (1/1/1970).

The date and date-with-time formats for the toDate/toDateTime functions are defined as follows:

YYYY-MM-DD

YYYY-MM-DD hh: mm: ss

As an exception, if converting from UInt32, Int32, UInt64, or Int64 numeric types to Date, and if the number is greater than or equal to 65536, the number is interpreted as a Unix timestamp (and not as the number of days). This allows support for the common occurrence of writing toDate (unix_timestamp), which otherwise would be an error and would require writing toDate (toDateTime (unix_timestamp)).

Conversion between a date (Date) and date with time (DateTime) is performed the natural way: by adding a null time or deleting the time.

Conversion between numeric types uses the same rules as assignments between different numeric types in C++.

Additionally, the toString function of the DateTime argument can take a second String argument containing the name of the time zone.

toFixedString(s, N)

Converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is passed with null bytes to the right. If the string has more bytes than N, an exception is thrown.

toStringCutToZero(s)

Accepts a String or FixedString argument. It returns the String with the content truncated at the first zero byte found.

reinterpretAsUInt8, reinterpretAsUInt16, reinterpretAsUInt32, reinterpretAsUInt64

reinterpretAsInt8, reinterpretAsInt16, reinterpretAsInt32, reinterpretAsInt64

reinterpretAsFloat32, reinterpretAsFloat64

reinterpretAsDate, reinterpretAsDateTime

These functions accept a string and interpret the bytes placed at the beginning of the string as a number (little endian). If the string isn't long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date (Date) is interpreted as the number of days since the beginning of the Unix Epoch (1/1/1970), and a date with time (DateTime) is interpreted as the number of seconds since the beginning of the Unix Epoch.

reinterpretAsString

This function accepts a number or date (Date) or date with time (DateTime), and returns a string containing bytes representing the corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.

reinterpretAsFixedString

This function accepts a number or date (Date) or date with time (DateTime), and returns a FixedString containing bytes representing the corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.

CAST(x, t)

Converts x to the t data type.

toIntervalYear, toIntervalQuarter, toIntervalMonth, toIntervalWeek, toIntervalDay, toIntervalHour, toIntervalMinute, toIntervalSecond

Converts a Number type argument to a Interval type (duration). The interval type is actually very useful, you can use this type of data to perform arithmetic operations directly with Date or DateTime.

parseDateTimeBestEffort

Parses a number type argument to a Date or DateTime type. Unlike toDate and toDateTime, parseDateTimeBestEffort can return a more complex date format.

parseDateTimeBestEffortOrNull

Same as for parseDateTimeBestEffort, except that it returns null when it encounters a date format that cannot be processed.

parseDateTimeBestEffortOrZero

Same as for parseDateTimeBestEffort, except that it returns zero date or zero date time when it encounters a date format that cannot be processed.

Type conversion functions

Functions for Working with Dates and Times

All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument. Only time zones that differ from UTC by a whole number of hours are supported

Function Description

toTimeZone

Converts time (Date) or date and time (DateTime) to the specified time zone.

toYear

Converts a date (Date) or date with time (DateTime) to a UInt16 number containing the year number.

toQuarter

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the quarter number.

toMonth

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the month number (1-12).

toDayOfYear

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the year (1-366).

toDayOfMonth

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the month (1-31).

toDayOfWeek

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).

toHour

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead for daylight saving time, it is by one hour and occurs at 2 A.M., and if clocks are moved back, it is by one hour and occurs at 3 A.M.

toMinute

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the minute of the hour (0-59).

toSecond

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the number of the second in the minute (0-59). Leap seconds are not accounted for.

toUnixTimestamp

Converts a date (Date) or date with time (DateTime) to a unix timestamp.

toStartOfYear

Rounds down a date (Date) or date with time (DateTime) to the first day of the year. It returns the date (Date).

toStartOfISOYear

Rounds down a date (Date) or date with time (DateTime) to the first day of ISO year. It returns the date (Date).

toStartOfQuarter

Rounds down a date (Date) or date with time (DateTime) to the first day of the quarter (1 January, 1 April, 1 July, or 1 October). It returns the date (Date).

toStartOfMonth

Rounds down a date (Date) or date with time (DateTime) to the first day of the month. It returns the date (Date).

toMonday

Rounds down a date (Date) or date with time (DateTime) to the nearest Monday. It returns the date (Date).

toStartOfDay

Rounds down a date (Date) or date with time (DateTime) to the start of the day.

toStartOfHour

Rounds down a date (Date) or date with time (DateTime) to the start of the hour.

toStartOfMinute

Rounds down a date (Date) or date with time (DateTime) to the start of the minute.

toStartOfFiveMinute

Rounds down a date (Date) or date with time (DateTime) to the start of the five-minute interval.

toStartOfTenMinutes

Rounds down a date (Date) or date with time (DateTime) to the start of the ten-minute interval.

toStartOfFifteenMinutes

Rounds down a date (Date) or date with time (DateTime) to the start of the fifteen-minute interval.

toTime

Converts a date (Date) or date with time (DateTime) to a certain fixed date, while preserving the time.

toRelativeYearNum

Converts a date (Date) or date with time (DateTime) to the number of the year, starting from a certain fixed point in the past.

toRelativeQuarterNum

Converts a date (Date) or date with time (DateTime) to the number of the quarter, starting from a certain fixed point in the past.

toRelativeMonthNum

Converts a date (Date) or date with time (DateTime) to the number of the month, starting from a certain fixed point in the past.

toRelativeWeekNum

Converts a date (Date) or date with time (DateTime) to the number of the week, starting from a certain fixed point in the past.

toRelativeDayNum

Converts a date (Date) or date with time (DateTime) to the number of the day, starting from a certain fixed point in the past.

toRelativeHourNum

Converts a date (Date) or date with time (DateTime) to the number of the hour, starting from a certain fixed point in the past.

toRelativeMinuteNum

Converts a date (Date) or date with time (DateTime) to the number of the minute, starting from a certain fixed point in the past.

toRelativeSecondNum

Converts a date (Date) or date with time (DateTime) to the number of the second, starting from a certain fixed point in the past.

toISOYear

Converts a date (Date) or date with time (DateTime) to a UInt16 number containing the ISO year number.

toISOWeek

Converts a date (Date) or date with time (DateTime) to a UInt8 number containing the ISO week number.

now

Accepts zero arguments. It returns the current time at the time of function execution. This function returns a constant.

today

Accepts zero arguments. It returns the current date at the time of function execution. The same as toDate (now ()).

yesterday

Accepts zero arguments. It returns yesterday's date at the time of function execution. The same as today () - 1.

timeSlot

Rounds the time to the half hour.

toYYYYMM

Converts a date (Date) or date with time (DateTime) to a UInt32 number containing the year and month number (YYYY * 100 + MM).

toYYYYMMDD

Converts a date (Date) or date with time (DateTime) to a UInt32 number containing the year and month number (YYYY * 10000 + MM * 100 + DD).

toYYYYMMDDhhmmss

Converts a date (Date) or date with time (DateTime) to a UInt64 number containing the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).

addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters

This function adds a Date/DateTime interval to a Date/DateTime variable and then return the Date/DateTime.

subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters

This function subtracts a Date/DateTime interval to a Date/DateTime variable and then return the Date/DateTime.

dateDiff('unit', t1, t2, [timezone])

Returns the difference between two times expressed in unit e.g. hours. t1 and t2 can be Date or DateTime. If timezone is specified, it is applied to both arguments. If not, timezones from data types t1 and t2 are used. If the timezones are not the same, the result is unspecified.

Supported unit values: second, minute, hour, day, week, month, quarter, year.

formatDateTime(Time, Format[, Timezone])

This function formats Time according to the format specified in the Format string. For more information about the format parameters, see table Format codes for the formatDateTime function.

Functions for working with dates and times

Format Codes for the formatDateTime Function
Code Description Example

%C

Year divided by 100 and truncated to integer (00-99).

20

%d

Day of the month, zero-padded (01-31).

02

%D

Short MM/DD/YY date, equivalent to %m/%d/%y.

01/07/2023

%e

Day of the month, space-padded ( 1-31)

2

%F

Short YYYY-MM-DD date, equivalent to %Y-%m-%d.

2023-01-07

%H

Hour in 24h format (00-23).

22

%I

Hour in 12h format (01-12).

10

%j

Day of the year (001-366).

007

%m

Month as a decimal number (01-12).

01

%M

Minute (00-59).

33

%n

New-line character. '\n'.

 

%p

AM or PM.

PM

%R

24-hour HH:MM time, equivalent to %H:%M.

22:33

%S

Second (00-59).

44

%t

Horizontal-tab character (‘\t').

 

%T

ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S.

22:33:44

%u

ISO 8601 weekday as number with Monday as 1 (1-7).

2

%V

ISO 8601 week number (01-53).

01

%w

Weekday as a decimal number with Sunday as 0 (0-6).

2

%y

Year, last two digits (00-99).

23

%Y

Year

2023

%%

A % sign.

 

Format codes for the formatDateTime function

Functions for Working with Strings

Function Description

empty

Returns 1 for an empty string or 0 for a non-empty string. The result type is UInt8. A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.

notEmpty

Returns 0 for an empty string or 1 for a non-empty string. The result type is UInt8. The function also works for arrays.

length

Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64.

lengthUTF8

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.

char_length, CHAR_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.

character_length, CHARACTER_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not throw an exception. The result type is UInt64.

lower, lcase

Converts ASCII Latin symbols in a string to lowercase.

upper, ucase

Converts ASCII Latin symbols in a string to uppercase.

lowerUTF8

Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.

upperUTF8

Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.

isValidUTF8

Returns 1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise 0.

reverse

Reverses the string (as a sequence of bytes).

reverseUTF8

Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does not throw an exception.

concat(s1, s2, ...)

Concatenates the strings listed in the arguments, without a separator.

concatAssumeInjective(s1, s2, ...)

Same as concat, the difference is that you need to ensure that concat (s1, s2, s3) -> s4 is injective. It is used for optimization the GROUP BY clause.

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

Returns a substring starting with the byte from the offset index that is length bytes long. Character indexing starts from one (as in standard SQL). The offset and length arguments must be constants.

substringUTF8(s, offset, length)

The same as substring, but for Unicode code points. It works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it does not throw an exception.

appendTrailingCharIfAbsent(s, c)

If the s string is non-empty and does not contain the c character at the end, it appends the c character to the end.

convertCharset(s, from, to)

Returns the string s that was converted from the encoding in from to the encoding in to.

base64Encode(s)

Encodes the s string into base64.

base64Decode(s)

Decodes the base64-encoded string s into its original string. In case of failure, it raises an exception.

tryBase64Decode(s)

Similar to base64Decode, but in case of error an empty string is returned.

endsWith(s, suffix)

Returns 1 if the string ends with the specified suffix, otherwise it returns 0.

startsWith(s, prefix)

Returns 1 if the string starts with the specified prefix, otherwise it returns 0.

trimLeft(s)

Returns a string that removes the whitespace characters on the left side.

trimRight(s)

Returns a string that removes the whitespace characters on the right side.

trimBoth(s)

Returns a string that removes the whitespace characters on either side.

Functions for working with strings

Functions for Searching Strings

The search is case-sensitive by default in all these functions. There are separate variants for case insensitive search.

Function Description

position(haystack, needle), locate(haystack, needle)

Searches for the substring needle in the string haystack. It returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found.

For a case-insensitive search, use the function positionCaseInsensitive.

positionUTF8(haystack, needle)

The same as position, but the position is returned in Unicode code points. It works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it does not throw an exception.

For a case-insensitive search, use the function positionCaseInsensitiveUTF8.

multiSearchFirstPosition(haystack, [needle1, needle2, ..., needlen])

The same as position but returns the leftmost offset of the string haystack that matches some of the needle.

For a case-insensitive search or/and in UTF-8 format use functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8.

multiSearchFirstIndex(haystack, [needle1, needle2, ..., needlen])

Returns the index i (starting from 1) of the leftmost found needle in the string haystack and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions: multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8.

multiSearchAny(haystack, [needle1, needle2, ..., needlen])

Returns 1, if at least one string needle matches the string haystack and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions: multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8.

Note: In all multiSearch * functions, the number of needle parameters should be less than 28.

match(haystack, pattern)

Checks whether the string matches the pattern regular expression. The syntax of the re2 regular expressions is more limited than the syntax of the Perl regular expressions.

It returns 0 if it does not match, or 1 if it matches.

Note that the backslash symbol (\) is used for escaping in the regular expression. The same symbol is used for escaping in string literals. So, in order to escape the symbol in a regular expression, you must write two backslashes (\\) in a string literal.

The regular expression works with the string as if it is a set of bytes. The regular expression cannot contain null bytes. For patterns to search for substrings in a string, it is better to use LIKE or position, necause they work much faster.

multiMatchAny(haystack, [pattern1, pattern2, ..., patternn])

The same as match, but it returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. For patterns to search substrings in a string, it is better to use multiSearchAny, because it works much faster.

Note: The length of any of the haystack string must be less than 232 bytes, otherwise an exception is thrown.

multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternn])

The same as multiMatchAny, but it returns any index that matches the haystack.

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternn])

The same as multiMatchAny, but it returns 1 if any pattern matches the haystack within a constant edit distance. This function is in an experimental mode and can be extremely slow.

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternn])

The same as multiFuzzyMatchAny, but it returns any index that matches the haystack within a constant edit distance.

Note: multiFuzzyMatch * functions do not support UTF-8 regular expressions, and such expressions are treated as bytes.

extract(haystack, pattern)

Extracts a fragment of a string using a regular expression. If haystack does not match the pattern regex, an empty string is returned. If the regex doesn’t contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.

like(haystack, pattern), haystack LIKE pattern operator

Checks whether a string matches a simple regular expression. The regular expression can contain the metasymbols ‘%’ and ‘_’.

  • ‘%’: indicates any quantity of any bytes (including zero characters).

  • ‘_’: indicates any one byte..

Use the backslash (\) for escaping metasymbols. See the note on escaping in the description of the match function.

For regular expressions such as %needle% the code is more optimal and works as fast as the position function. For other regular expressions, the code is the same as for the match function.

notLike(haystack, pattern), haystack NOT LIKE pattern operator

The same thing as like, but negative.

Functions for searching strings

Functions for Replacing in Strings

Function Description

replaceOne(haystack, pattern, replacement)

Replaces the first occurrence, if it exists, of the pattern substring in haystack with the replacement substring. pattern and replacement must be constants.

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

Replaces all occurrences of the pattern substring in haystack with the replacement substring.

replaceRegexpOne(haystack, pattern, replacement)

Replacement using the pattern regular expression. A re2 regular expression. It replaces only the first occurrence, if it exists. A pattern can be specified as replacement. This pattern can include substitutions \0 - \9. The substitution \0 includes the entire regular expression. Substitutions \1 - \9 correspond to the subpattern numbers. To use the \ character in a template, escape it using \. Also keep in mind that a string literal requires an extra escape.

Example. Copying a string ten times:

SELECT replaceRegexpOne ('Hello, World!', '. *', '\\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0) AS res

Result:

Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! ¦

replaceRegexpAll(haystack, pattern, replacement)

This does the same thing as replaceRegexpOne, but replaces all the occurrences.

Example:

SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res

HHeelllloo,, WWoorrlldd!!

If a regular expression worked on an empty substring, the replacement is not made more than once.

Example:

SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res

here: Hello, World!

regexpQuoteMeta(s)

The function adds a backslash before some predefined characters in the string. Predefined characters: '0', '\', '|', '(', ')', '^', '$', '.', '[', ']', '?', '*' , '+', '{', ':', '-'.

This implementation slightly differs from re2. It escapes zero byte as \0 instead of \x00, and escapes only required characters.

Functions for replacing in strings

Mathematical Functions

All these functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.

Function Description

e()

Returns a Float64 number that is close to the number e.

pi()

Returns a Float64 number that is close to the number π.

exp(x)

Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.

log(x), ln(x)

Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.

exp2(x)

Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.

log2(x)

Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.

exp10(x)

Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.

log10(x)

Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.

sqrt(x)

Accepts a numeric argument and returns a Float64 number close to the square root of the argument.

cbrt(x)

Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.

erf(x)

If x is non-negative, then erf is the probability that a random variable having a normal distribution with standard deviation s takes the value that is separated from the expected value by more than x.

erfc(x)

Accepts a numeric argument and returns a Float64 number close to 1 - erf (x), but without loss of precision for large x values.

lgamma(x)

The logarithm of the gamma function.

tgamma(x)

Gamma function.

sin(x)

The sine.

cos(x)

The cosine.

tan(x)

The tangent.

asin(x)

The arc sine.

acos(x)

The arc cosine.

atan(x)

The arc tangent.

pow(x, y), power(x, y)

Takes two numeric arguments x and y. It returns a Float64 number close to x to the power of y.

intExp2

Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.

intExp10

Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x .

Arithmetic functions

Rounding Functions

Function Description

floor(x[, N])

Returns the largest round number that is less than or equal to x. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N is not exact. N is an integer constant, optional parameter. By default it is zero, which means to round to an integer. N may be negative.

Examples:

floor (123.45, 1) = 123.4

floor (123.45, -1) = 120.

x is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative N value. For a non-negative N, the function does not do anything.

ceil(x[, N]), ceiling(x[, N])

Returns the smallest round number that is greater than or equal to x. In every other way, it is the same as the floor function.

round(x[, N])

Rounds x to a specified number of decimal places (N). It rounds to the nearest even integer. In case when given number has equal distance to surrounding numbers, the function returns the number that has the closest even digit.

Parameters:

  • x: The number to be rounded. It can be any expression returning the numeric data type.

  • N: An integer value.

    • If N > 0 then the function rounds the value to the right of the decimal point.

    • If N < 0 then the function rounds the value to the left of the decimal point.

    • If N = 0 then the function rounds the value to integer. In this case, the argument can be omitted.

Returned value:

The rounded number of the same type as the input number.

roundToExp2(num)

Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two..

Rounding functions

Random Number Generation Functions

Non-cryptographic generators of pseudo-random numbers are used. All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The only purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.

Function Description

rand

Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers. It uses a linear congruential generator.

rand64

Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers. It uses a linear congruential generator.

randConstant

Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.

Functions for generating random numbers

Encoding Functions

Function Description

hex

Accepts arguments of types: String, UInt, Date, or DateTime. It returns a string containing the argument's hexadecimal representation. It uses uppercase letters A-F. It does not use 0x prefixes or h suffixes. For character strings, all bytes are simply encoded as two hexadecimal numbers. Numbers are converted to big endian format. Date is encoded as the number of days since the beginning of the Unix epoch. DateTime is encoded as the number of seconds since the beginning of the Unix epoch.

unhex(str)

Accepts a string containing any number of hexadecimal digits, and returns a string containing the corresponding bytes. It supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the younger half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, an exception is not thrown. If you want to convert the result to a number, you can use the reverse and reinterpretAsType functions.

bitmaskToList(num)

Accepts an integer. It returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.

Encoding functions

Functions for Working with URLs

Function Description

protocol

Returns the protocol. Examples: http, ftp, mailto, imap, etc.

domain

Returns the domain.

domainWithoutWWW

Returns the domain and removes no more than one 'www.' from the beginning of it, if present

topLevelDomain

Returns the top-level domain.

Example: .com.

firstSignificantSubdomain

Returns the "first significant subdomain".

  • The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'.

  • Otherwise, it is a third-level domain.

cutToFirstSignificantSubdomain

Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).

path

Returns the path. The path does not include the query string.

pathFull

The same as above, but including query string and fragment.

Example: /top/news.html?page=2#comments

queryString

Returns the query string. The query string does not include the initial question mark, #, or anything after #.

fragmenttext

Returns the fragment identifier. It does not include the initial hash # symbol.

queryStringAndFragment

Returns the query string and fragment identifier.

extractURLParameter(URL, name)

Returns the value of the name parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

extractURLParameters(URL)

Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.

extractURLParameterNames(URL)

Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.

URLHierarchy(URL)

Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.

Example:

URLPathHierarchy('https://example.com/browse/CONV-6788') =

[

'/browse/',

'/browse/CONV-6788'

]

URLPathHierarchy(URL)

The same as above, but without the protocol and host in the result. The /’ element (root) is not included.

decodeURLComponent(URL)

Returns the decoded URL.

cutWWW

Removes no more than one 'www.' from the beginning of the URL's domain, if present.

cutQueryString

Removes the query string. The question mark is also removed.

cutFragment

Removes the fragment identifier. The number sign is also removed.

cutQueryStringAndFragment

Removes the query string and fragment identifier. The question mark and number sign are also removed.

cutURLParameter(URL, name)

Removes the 'name' URL parameter, if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

Functions for working with URLs

Functions for Working with IP Addresses

Function Description

IPv4NumToString (num)

Takes a UInt32 number. Interprets it as an IPv4 address in big endian. It returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).

IPv4StringToNum(s)

The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.

IPv4NumToStringClassC(num)

Similar to IPv4NumToString, but using xxx instead of the last octet.

IPv6NumToString(x)

Accepts a FixedString(16) value containing the IPv6 address in binary format. It returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format :: ffff: 111.222.33.44.

IPv6StringToNum(s)

The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes. HEX can be uppercase or lowercase.

IPv4ToIPv6(x)

Takes a UInt32 number. Interprets it as an IPv4 address in big endian. It returns a FixedString(16) value containing the IPv6 address in binary format.

cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4)

Accepts a FixedString(16) value containing the IPv6 address in binary format. It returns a string containing the address of the specified number of bits removed in text format.

IPv4CIDRtoIPv4Range(ipv4, cidr)

Accepts an IPv4 and an UInt8 value containing the CIDR. It returns a tuple with two IPv4 containing the lower range and the higher range of the subnet.

IPv6CIDRtoIPv6Range(ipv6, cidr)

Accepts an IPv6 and an UInt8 value containing the CIDR. It returns a tuple with two IPv6 containing the lower range and the higher range of the subnet.

Functions for working with IP addresses

Functions for Working with Nullable Arguments

Function Description

isNull(x)

Checks whether the argument is NULL.

Parameters:

  • x: A value with a non-compound data type.

Returned value:

  • 1: If x is NULL.

  • 0: If x is not NULL.

isNotNull (x)

Parameters:

  • x: A value with a non-compound data type.

Returned value:

  • 0: If x is NULL.

  • 1: If x is not NULL.

coalesce(x,...)

  • Parameters: Any number of parameters of a non-compound type. All parameters must be compatible by data type.

  • Returned values: The first non-NULL argument. NULL, if all arguments are NULL.

ifNull

Returns an alternative value if the main argument is NULL.

ifNull (x, alt)

Parameters:

  • x: The value to check for NULL.

  • alt: The value that the function returns if x is NULL.

Returned values

  • The value x, if x is not NULL.

  • The value alt, if x is NULL.

nullIf(x,y)

Returns NULL if the arguments are equal.

Parameters:

  • x, y: Values for comparison. They must be compatible types, or the solution generates an exception.

Returned values:

  • NULL, if the arguments are equal.

  • The x value, if the arguments are not equal.

assumeNotNull(x)

Parameters:

  • x: The original value.

Returned values

  • The original value from the non-Nullable type, if it is not NULL.

  • The default value for the non-Nullable type if the original value was NULL.

toNullable(x)

Converts the argument type to Nullable.

Parameters:

  • x: The value of any non-compound type.

Returned value:

  • The input value with a non-Nullable type.

Functions for working with Nullable aggregates