Supported Data Types
This section describes the data types supported in Cytomic Orion and special considerations when you use them, if any.
Integers (INT and UINT)
Fixed-length integers, with or without a sign.
Signed Integer Range
-
Int8 - [-128 : 127]
-
Int16 - [-32768 : 32767]
-
Int32 - [-2147483648 : 2147483647]
-
Int64 - [-9223372036854775808 : 9223372036854775807]
Unsigned Integer Range
-
UInt8 - [0 : 255]
-
UInt16 - [0 : 65535]
-
UInt32 - [0 : 4294967295]
-
UInt64 - [0 : 18446744073709551615]
Decimal Numbers (DECIMALX)
Signed fixed point numbers that keep precision during add, subtract and multiply operations. For division, least significant digits are discarded (not rounded).
Parameters
-
P - Precision. Valid range: [1: 38]. Determines how many decimal digits the number can have (including the fraction).
-
S - Scale. Valid range: [0: P]. Determines how many decimal digits the fraction can have.
Decimal Value Ranges
-
Decimal32 (S) - (-1 * 10 ^ (9 - S), 1 * 10 ^ (9 - S))
-
Decimal64 (S) - (-1 * 10 ^ (18 - S), 1 * 10 ^ (18 - S))
-
Decimal128 (S) - (-1 * 10 ^ (38 - S), 1 * 10 ^ (38 - S))
For example, Decimal32(4) can contain numbers from -99999.9999 to 99999.9999 with 0.0001 step.
Internal Representation
Internally data is represented as normal signed integers with respective bit width. Real value ranges that can be stored in memory are a bit larger than specified above, which are checked only on convertion from string.
Because modern CPUs do not support 128 bit integers natively, operations on Decimal128 are emulated. Because of this, Decimal128 works significantly slower than Decimal32/Decimal64.
Operations and Result Type
Binary operations on decimal result in a wider result type (with any order of arguments).
-
Decimal64 (S1) Decimal32 (S2) -> Decimal64 (S)
-
Decimal128 (S1) Decimal32 (S2) -> Decimal128 (S)
-
Decimal128 (S1) Decimal64 (S2) -> Decimal128 (S)
Rules for scale:
-
add, subtract: S = max (S1, S2)
-
multiply: S = S1 + S2.
-
divide: S = S1.
For similar operations between decimal and integers, the result is a decimal of the same size as the argument.
Operations between decimal and Float32/Float64 are not defined. If you really need them, you can explicitly convert one of the arguments by using toDecimal32, toDecimal64, toDecimal128 o toFloat32, toFloat64
. Keep in mind that the result will lose precision and type conversion is a computationally expensive operation.
Some functions on decimal return result as Float64 (for example, var
stddev
). Intermediate calculations might still be performed in decimal, which might lead to different results between Float64 and decimal inputs with the same values.
Overflow Checks
During calculations on decimal, integer overflows might happen. Excessive digits in the fraction are discarded (not rounded). Excessive digits in the integer part will lead to an exception.
Overflow checks lead to operations slowdown. Overflow checks happen not only on arithmetic operations, but also on value comparison.
Boolean Values
There is not a separate type for boolean values. They use the UInt8 type, restricted to the values 0 or 1.
Character Strings (STRING)
Strings of an arbitrary length. The length is not limited. The value can contain an arbitrary set of bytes, including null bytes. The String type replaces the types VARCHAR, BLOB, CLOB, and others from other DBMSs.
Encodings
Cytomic Orion does not support the concept of encodings. Strings can contain an arbitrary set of bytes, which are stored and output as-is. If you need to store texts, we recommend using UTF-8 encoding. As the analysis console supports UTF-8, you can read and write your values without making conversions. Similarly, certain functions for working with strings have separate variations that work under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. For example, the length
function calculates the string length in bytes, while the lengthUTF8
function calculates the string length in Unicode code points, assuming that the value is UTF-8 encoded.
Fixed-length Character Strings (FIXEDSTRING)
FixedString(N) is a fixed-length string of N bytes (not characters or code points).
To declare a column of FixedString type, use this syntax:
|
Where N
is a natural number.
The FixedString type is efficient when data has the length of precisely N
bytes. In all other cases, it is likely to reduce efficiency.
Examples of values that can be efficiently stored in FixedString-typed columns:
-
The binary representation of IP addresses (FixedString(16) for IPv6).
-
Language codes (en_US, ru_RU, etc.).
-
Currency codes (USD, RUB, etc.).
-
The Binary representation of hashes (FixedString(16) for MD5, FixedString(32) for SHA256).
To store UUID values, use the UUID data type.
When inserting data, Cytomic Orion:
-
Complements the string with null bytes if the string contains fewer than
N
bytes. -
Throws the
Too large value for FixedString(N)
exception if the string contains more thanN
bytes.
When you select data, Cytomic Orion does not remove the null bytes at the end of the string. If you use the WHERE
clause, you should add null bytes manually to match the FixedString value. This behaviour differs from MySQL for the CHAR type (where strings are padded with spaces, and the spaces are removed for output),
Note that the length of the FixedString(N) value is constant. The length function returns N
even if the FixedString(N) value is filled only with null bytes, but the empty function returns 1 in this case
Date (DATE)
A date. Stored in two bytes as the number of days since 1970-01-01 It allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined in 2105. The minimum value is 0000-00-00.
The date value is stored without the time zone.
Date and Time (DATETIME)
DateTime is stored in four bytes as a Unix timestamp. It allows storing values in the same range as the one defined for the Date type. Time is stored with a precision of up to one second (without leap seconds).
Time Zones
The DateTime type is converted from text (divided into component parts) to binary and back, using the system's time zone at the time the server starts. In text format, information about daylight savings is lost.
By default, the client switches to the timezone of the server when it connects. So, when you work with a textual date (for example, when you save text dumps), keep in mind that there might be ambiguity during changes for daylight savings time, and there might be problems matching data if the time zone changed.
Nullable
Nullable (TypeName) allows to store a special marker (NULL) that denotes "missing value" alongside normal values allowed by TypeName. For example, a Nullable(Int8) type column can store Int8 type values, and the rows that do not have a value store NULL
.