uniqCombined64
Calculates the approximate number of different argument values. It is the same as uniqCombined, but uses a 64-bit hash for all data types rather than just for the String data type.
Parameters
HLL_precision: The base-2 logarithm of the number of cells in HyperLogLog. Optionally, you can use the function asuniqCombined64(x[, ...]). The default value forHLL_precisionis 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).X: A variable number of parameters. Parameters can beTuple,Array,Date,DateTime,String, or numeric types.
Returned value
- A number UInt64-type number.
Implementation details
The uniqCombined64 function:
- Calculates a hash (64-bit hash for all data types) for all parameters in the aggregate, then uses it in calculations.
- Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.
- For a small number of distinct elements, an array is used.
- When the set size is larger, a hash table is used.
- For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
- Provides the result deterministically (it does not depend on the query processing order).
Since it uses 64-bit hash for all types, the result does not suffer from very high error for cardinalities significantly larger than UINT_MAX like uniqCombined does, which uses a 32-bit hash for non-String types.
Compared to the uniq function, the uniqCombined64 function:
- Consumes several times less memory.
- Calculates with several times higher accuracy.
Example
In the example below uniqCombined64 is run on 1e10 different numbers returning a very close approximation of the number of different argument values.
Query:
Result:
By comparison the uniqCombined function returns a rather poor approximation for an input this size.
Query:
Result:
See Also
uniqCombined64
Introduced in: v20.1
Calculates the approximate number of different argument values.
It is the same as uniqCombined, but uses a 64-bit hash for all data types rather than just for the String data type.
This function provides the result deterministically (it does not depend on the query processing order).
Since it uses 64-bit hash for all types, the result does not suffer from very high error for cardinalities significantly larger than UINT_MAX like uniqCombined does, which uses a 32-bit hash for non-String types.
Compared to the uniq function, the uniqCombined64 function:
- Consumes several times less memory
- Calculates with several times higher accuracy
Details
Implementation details
This function calculates a 64-bit hash for all data types for all parameters in the aggregate, then uses it in calculations. It uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table:- For a small number of distinct elements, an array is used
- When the set size is larger, a hash table is used
- For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory
Syntax
Parameters
HLL_precision— Optional. The base-2 logarithm of the number of cells in HyperLogLog. The default value is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each). Range: [12, 20].UInt8
Arguments
x— A variable number of parameters.Tuple(T)orArray(T)orDateorDateTimeorStringor(U)Int*orFloat*orDecimal
Returned value
Returns a UInt64-type number representing the approximate number of different argument values. UInt64
Examples
Large dataset example
Comparison with uniqCombined