Selectivity Estimation using the Generalized Cumulative Density Histogram 


Vol. 11,  No. 4, pp. 983-990, Aug.  2004
10.3745/KIPSTD.2004.11.4.983


PDF
  Abstract

Multiple-count problem is occurred when rectangle objects span across several buckets. The CD histogram is a technique which solves this problem by keeping four sub-histograms corresponding to the four points of rectangle. Although it provides exact results with constant response time, there is still a considerable issue. Since it is based on a query window which aligns with a given grid, a number of errors may be occurred when it is applied to real applications. In this paper, we propose selectivity estimation techniques using the generalized cumulative density histogram based on two probabilistic models : ① probabilistic model which considers the query window area ratio, ② probabilistic model which considers intersection area between a given grid and objects. Our method has the capability of eliminating an impact of the restriction on query window which the existing cumulative density histogram has. We experimented with real datasets to evaluate the proposed methods. Experimental results show that the proposed technique is superior to the existing selectivity estimation techniques. Furthermore, selectivity estimation technique based on probabilistic model considering the intersection area is very accurate(less than 5% errors) at 20% query window. The proposed techniques can be used to accurately quantify the selectivity of the spatial range query on rectangle objects.

  Statistics


  Cite this article

[IEEE Style]

J. H. Chi, S. H. Kim, K. H. Ryu, "Selectivity Estimation using the Generalized Cumulative Density Histogram," The KIPS Transactions:PartD, vol. 11, no. 4, pp. 983-990, 2004. DOI: 10.3745/KIPSTD.2004.11.4.983.

[ACM Style]

Jeong Hee Chi, Sang Ho Kim, and Keun Ho Ryu. 2004. Selectivity Estimation using the Generalized Cumulative Density Histogram. The KIPS Transactions:PartD, 11, 4, (2004), 983-990. DOI: 10.3745/KIPSTD.2004.11.4.983.