数据库设计 – 空间索引可以帮助“范围 – 按限制排序”查询
| 提出这个问题,特别是Postgres,因为它对R树/空间索引有很好的支持. 我们有下表,其中包含单词及其频率的树结构(嵌套集模型): lexikon ------- _id integer PRIMARY KEY word text frequency integer lset integer UNIQUE KEY rset integer UNIQUE KEY 和查询: SELECT word FROM lexikon WHERE lset BETWEEN @Low AND @High ORDER BY frequency DESC LIMIT @N 我认为(lset,frequency,word)上的覆盖索引是有用的,但是如果(@High,@ Low)范围内的lset值太多,我觉得它可能表现不佳. 当使用该索引的搜索早期产生与范围条件匹配的@N行时,(频率DESC)上的简单索引有时也可能是足够的. 但似乎性能很大程度上取决于参数值. 有没有一种方法可以让它快速执行,无论范围(@ Low,@ High)是宽还是窄,无论顶级频率字是否幸运地在(窄)选定范围内? R树/空间索引会有帮助吗? 添加索引,重写查询,重新设计表,没有限制. 解决方法通过先搜索频率较高的行,您可以获得更好的性能.这可以通过“粒化”频率然后在程序上逐步执行来实现,例如如下:–testbed和lexikon虚拟数据: begin; set role dba; create role stack; grant stack to dba; create schema authorization stack; set role stack; -- create table lexikon( _id serial,word text,frequency integer,lset integer,width_granule integer); -- insert into lexikon(word,lset) select word,(1000000/row_number() over(order by random()))::integer as frequency,lset from (select 'word'||generate_series(1,1000000) word,generate_series(1,1000000) lset) z; -- update lexikon set width_granule=ln(frequency)::integer; -- create index on lexikon(width_granule,lset); create index on lexikon(lset); -- the second index is not used with the function but is added to make the timings 'fair' 颗粒分析(主要用于信息和调整): create table granule as 
select width_granule,count(*) as freq,min(frequency) as granule_start,max(frequency) as granule_end 
from lexikon group by width_granule;
--
select * from granule order by 1;
/*
 width_granule |  freq  | granule_start | granule_end
---------------+--------+---------------+-------------
             0 | 500000 |             1 |           1
             1 | 300000 |             2 |           4
             2 | 123077 |             5 |          12
             3 |  47512 |            13 |          33
             4 |  18422 |            34 |          90
             5 |   6908 |            91 |         244
             6 |   2580 |           245 |         665
             7 |    949 |           666 |        1808
             8 |    349 |          1811 |        4901
             9 |    129 |          4926 |       13333
            10 |     47 |         13513 |       35714
            11 |     17 |         37037 |       90909
            12 |      7 |        100000 |      250000
            13 |      2 |        333333 |      500000
            14 |      1 |       1000000 |     1000000
*/
alter table granule drop column freq;
--首先扫描高频的功能: create function f(p_lset_low in integer,p_lset_high in integer,p_limit in integer)
       returns setof lexikon language plpgsql set search_path to 'stack' as $$
declare
  m integer;
  n integer := 0;
  r record;
begin 
  for r in (select width_granule from granule order by width_granule desc) loop
    return query( select * 
                  from lexikon 
                  where width_granule=r.width_granule 
                        and lset>=p_lset_low and lset<=p_lset_high );
    get diagnostics m = row_count;
    n = n+m;
    exit when n>=p_limit;
  end loop;
end;$$;结果(时间应该用一小撮盐来进行,但每次查询运行两次以对抗任何缓存) 首先使用我们编写的函数: timing on -- select * from f(20000,30000,5) order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 80.452 ms */ select * from f(20000,5) order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 0.510 ms */ 然后使用简单的索引扫描: select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 218.897 ms */ select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 51.250 ms */ timing off -- rollback; 根据您的实际数据,您可能希望改变颗粒的数量和用于将行放入其中的函数.频率的实际分布在这里是关键,因为极限条款的预期值和所寻求的lset范围的大小. (编辑:海洋资讯信息网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! | 

