quantileGK
Computes the quantile of a numeric data sequence using the Greenwald-Khanna algorithm. The Greenwald-Khanna algorithm is an algorithm used to compute quantiles on a stream of data in a highly efficient manner. It was introduced by Michael Greenwald and Sanjeev Khanna in 2001. It is widely used in databases and big data systems where computing accurate quantiles on a large stream of data in real-time is necessary. The algorithm is highly efficient, taking only O(log n) space and O(log log n) time per item (where n is the size of the input). It is also highly accurate, providing an approximate quantile value with high probability.
quantileGK is different from other quantile functions in ClickHouse, because it enables user to control the accuracy of the approximate quantile result.
Syntax
Alias: medianGK.
Arguments
-
accuracy— Accuracy of quantile. Constant positive integer. Larger accuracy value means less error. For example, if the accuracy argument is set to 100, the computed quantile will have an error no greater than 1% with high probability. There is a trade-off between the accuracy of the computed quantiles and the computational complexity of the algorithm. A larger accuracy requires more memory and computational resources to compute the quantile accurately, while a smaller accuracy argument allows for a faster and more memory-efficient computation but with a slightly lower accuracy. -
level— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. Default value: 0.5. Atlevel=0.5the function calculates median. -
expr— Expression over the column values resulting in numeric data types, Date or DateTime.
Returned value
- Quantile of the specified level and accuracy.
Type:
- Float64 for numeric data type input.
- Date if input values have the
Datetype. - DateTime if input values have the
DateTimetype.
Example
See Also
quantileGK
Introduced in: v23.4
Computes the quantile of a numeric data sequence using the Greenwald-Khanna algorithm.
The Greenwald-Khanna algorithm is an algorithm used to compute quantiles on a stream of data in a highly efficient manner. It was introduced by Michael Greenwald and Sanjeev Khanna in 2001. It is widely used in databases and big data systems where computing accurate quantiles on a large stream of data in real-time is necessary. The algorithm is highly efficient, taking only O(log n) space and O(log log n) time per item (where n is the size of the input). It is also highly accurate, providing an approximate quantile value with high probability.
quantileGK is different from other quantile functions in ClickHouse, because it enables user to control the accuracy of the approximate quantile result.
Syntax
Aliases: medianGK
Parameters
accuracy— Accuracy of quantile. Constant positive integer. Larger accuracy value means less error. For example, if the accuracy argument is set to 100, the computed quantile will have an error no greater than 1% with high probability. There is a trade-off between the accuracy of the computed quantiles and the computational complexity of the algorithm. A larger accuracy requires more memory and computational resources to compute the quantile accurately, while a smaller accuracy argument allows for a faster and more memory-efficient computation but with a slightly lower accuracy.UInt*level— Optional. Level of quantile. Constant floating-point number from 0 to 1. Default value: 0.5. Atlevel=0.5the function calculates median.Float*
Arguments
expr— Expression over the column values resulting in numeric data types, Date or DateTime.(U)Int*orFloat*orDecimal*orDateorDateTime
Returned value
Returns the quantile of the specified level and accuracy. Float64 or Date or DateTime
Examples
Computing quantile with different accuracy levels
Higher accuracy quantile