single-table hash cluster practise

March 3rd, 2007 | Categories: Boring | Tags:

single-table hash cluster对于Where cluster_key = value 理论上最好情况下只需要1个逻辑IO读,比IOT的(Where PK = value)性能还好。但为了实现这个1个IO的最优化,cluster的参数设置(SIZE HASHKEYS)还真是不容易。

先简单计算一下每行的平均长度

SQL> create table t as select * from dba_objects where rownum<100 and owner!=’SYS’;
Table created.
SQL> analyze table t compute statistics;
Table analyzed.
SQL> select AVG_ROW_LEN from user_tables where table_name=’T';
AVG_ROW_LEN
———–
90

SIZE should be set to the average amount of space required to hold all rows for any given hash key. Therefore, to properly determine SIZE, you must be aware of the characteristics of your data:

  • If the hash cluster is to contain only a single table and the hash key values of the rows in that table are unique (one row for each value), SIZE can be set to the average row size in the cluster.

因此设置size最低为90,我选择100

SQL> CREATE CLUSTER trial_cluster (object_id NUMBER)
2 TABLESPACE cr_data
3 size 100 HASH IS object_id HASHKEYS 20000;
CREATE CLUSTER trial_cluster (object_id NUMBER)
*
ERROR at line 1:
ORA-02456: The HASH IS column specification must be NUMBER(*,0)

SQL> !oerr ora 2456
02456, 00000, “The HASH IS column specification must be NUMBER(*,0)”
// *Cause: The column specification must specify an integer.
// *Action: Specify the column definition as type NUMBER(precision, 0).

Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype, and contains uniformly distributed integers.要求还的是整数

SQL> CREATE CLUSTER trial_cluster (object_id NUMBER(32,0))
2 TABLESPACE cr_data
3 size 100 HASH IS object_id HASHKEYS 20000;
Cluster created.

察看为cluster分配的区间大小 TRIAL_CLUSTER CLUSTER 2M

SQL> col object_name format a25
SQL> select object_name,object_type from user_objects;
OBJECT_NAME OBJECT_TYPE
————————- ——————
TRIAL_CLUSTER CLUSTER

SQL> CREATE TABLE trial (
2 OWNER VARCHAR2(30),
3 OBJECT_NAME VARCHAR2(128),
4 SUBOBJECT_NAME VARCHAR2(30),
5 OBJECT_ID NUMBER primary key,
6 DATA_OBJECT_ID NUMBER,
7 OBJECT_TYPE VARCHAR2(18),
8 CREATED DATE,
9 LAST_DDL_TIME DATE,
10 TIMESTAMP VARCHAR2(19),
11 STATUS VARCHAR2(7),
12 TEMPORARY VARCHAR2(1),
13 GENERATED VARCHAR2(1),
14 SECONDARY VARCHAR2(1)
15 )
16 CLUSTER trial_cluster (object_id);
CLUSTER trial_cluster (object_id) *
ERROR at line 16:
ORA-01753: column definition incompatible with clustered column definition

SQL> CREATE TABLE trial (
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(128),
SUBOBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER(32,0) primary key,
DATA_OBJECT_ID NUMBER,
OBJECT_TYPE VARCHAR2(18),
CREATED DATE,
LAST_DDL_TIME DATE,
TIMESTAMP VARCHAR2(19),
STATUS VARCHAR2(7),
TEMPORARY VARCHAR2(1),
GENERATED VARCHAR2(1),
SECONDARY VARCHAR2(1)
)
CLUSTER trial_cluster (object_id);
Table created.

SQL> select object_name,object_type from user_objects;
OBJECT_NAME OBJECT_TYPE
————————- ——————
SYS_C009597 INDEX
TRIAL TABLE
TRIAL_CLUSTER CLUSTER

SQL> insert into trial select * from dba_objects;
18298 rows created. 小于设置的HASHKEYS(20000),应该可以得到1个IO吧

SQL> commit;
Commit complete.

SQL> alter session set sql_trace=true;

Session altered.

SQL> declare
2 a varchar2(128);
3 begin
4 for i in ( select object_id from dba_objects ) loop
5 select object_name into a from trial where object_id=i.object_id;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

=—————————————

SELECT OBJECT_NAME FROM TRIAL WHERE OBJECT_ID=:B1

call count cpu elapsed disk query current rows
——- —— ——– ———- ———-

Parse 1 0.00 0.00 0 0 0 0
Execute 18298 0.67 0.62 0 2 0 0
Fetch 18298 0.51 0.57 0 29544 0 18298
——- —— ——– ———- ———- –
total 36597 1.18 1.20 0 29546 0 18298

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 25 (recursive depth: 1)

Rows Row Source Operation
——- —————————————————
18298 TABLE ACCESS HASH TRIAL

29546 个buffer gets,执行18298次,看来存在collision

再调整参数进行测试

SQL> drop CLUSTER trial_cluster;
drop CLUSTER trial_cluster
*
ERROR at line 1:
ORA-00951: cluster not empty

SQL> alter table trial move;
alter table trial move
*
ERROR at line 1:
ORA-14512: cannot perform operation on a clustered object

SQL> truncate table trial;
truncate table trial
*
ERROR at line 1:
ORA-03292: Table to be truncated is part of a cluster

SQL> drop table trial;
Table dropped.
SQL> drop CLUSTER trial_cluster;
Cluster dropped.

CREATE CLUSTER trial_cluster (object_id NUMBER(32,0))
TABLESPACE cr_data
size 300 HASH IS object_id HASHKEYS 20000;

设置大一点的size

SELECT OBJECT_NAME
FROM
TRIAL WHERE OBJECT_ID=:B1
call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ——-
Parse 1 0.00 0.00 0 0 0 0
Execute 18298 0.63 0.64 0 2 0 0
Fetch 18298 0.59 0.53 0 23210 0 18298
——- —— ——– ———- ———- ——–
total 36597 1.22 1.18 0 23212 0 18298
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 25 (recursive depth: 1)
Rows Row Source Operation

——- ————————————–
18298 TABLE ACCESS HASH TRIAL

23212 个buffer gets,比上个测试好了点;但还不是最优(1 IO per exec)
设置更大的HASHKEYS再次进行测试

SQL> CREATE CLUSTER trial_cluster (object_id NUMBER(32,0))
2 TABLESPACE cr_data
3 size 300 HASH IS object_id HASHKEYS 40000;
Cluster created.

察看cluster size发现有13M,空间真是浪费啊,希望能够换到时间~

******************************************************

SELECT OBJECT_NAME
FROM
TRIAL WHERE OBJECT_ID=:B1

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———-
Parse 1 0.00 0.00 0 0 0 0
Execute 18298 0.84 0.68 0 2 0 0
Fetch 18298 0.48 0.40 0 18298 0 18298
——- —— ——– ———- ———- ———-
total 36597 1.32 1.09 0 18300 0 18298
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 25 (recursive depth: 1)
Rows Row Source Operation
——- —————————————————
18298 TABLE ACCESS HASH TRIAL

18300/18298 = 1,Great。总算以空间换来了时间。

SIZE &HASHKEYS 在Production使用single table hash cluster的时候还真是个技术活。

OK……..注意到上边的测试在create cluster的时候没有指定SINGLE TABLE !!!mmd,single-table hash cluster的测试要重新来过~

测试 size 300 HASHKEYS 20000

SQL> CREATE CLUSTER trial_cluster (object_id NUMBER(32,0))
TABLESPACE cr_data
size 300 SINGLE TABLE HASH IS object_id HASHKEYS 20000;

***********************************************
SELECT OBJECT_NAME
FROM
TRIAL WHERE OBJECT_ID=:B1
call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.00 0.00 0 0 0 0
Execute 18299 0.86 0.63 0 2 0 0
Fetch 18299 0.43 0.41 0 18299 0 18299
——- —— ——– ———- ———- ———- ———- ———-
total 36599 1.29 1.04 0 18301 0 18299

***************************************************

Cool~ 指定了SINGLE TABLE 后,IO降低了23212-18301=4911,达到了最好的IO性能(1 io per exec),差别挺大的;看来cluster的内部存储结构发生了微妙变化。

再测试 size 100 HASHKEYS 20000

SQL> CREATE CLUSTER trial_cluster (object_id NUMBER(32,0))
TABLESPACE cr_data
size 100 SINGLE TABLE HASH IS object_id HASHKEYS 20000;

***************************************************
SELECT OBJECT_NAME
FROM
TRIAL WHERE OBJECT_ID=:B1
call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.00 0.00 0 0 0 0
Execute 18299 0.69 0.60 0 0 0 0
Fetch 18299 0.60 0.53 0 29549 0 18299
——- —— ——– ———- ———- ———- ———- ———-
total 36599 1.29 1.14 0 29549 0 18299
***********************************************

NO-SINGLE TABLE的时候差不多,看来size设置的小了。看来只有估计参数size是个技术活.

不过怎么看collision的情况哪?难题~

Del.icio.us Google书签 Digg Live Bookmark Technorati Furl Yahoo书签 Facebook 百度搜藏 新浪 ViVi 365Key 网摘 天极网摘 和讯网摘 博拉网 POCO 网摘 饭否 QQ 书签 Digbuzz 我挖网 Mister Wong
  1. logzgh
    March 5th, 2007 at 10:26
    Quote | #1

    不错。

    这些参数其实是一个衡量的东东。
    设置大了,就会乱费空间。设置小了又影响性能。如果要达到最优的效果的话,size要比最大的行大小还要大一点。
    而hashkeys只要比行数大一点即可。

    在实际当中的难点在于hashkeys难以准确的估计。对于小表做single hash tables没什么必要,往往是大表。而大表的记录不太好估计,太大了,空间太乱费了。太小了冲突又多。

    不过要量化地估计冲突有多少确实是一个难点。

  2. anysql
    March 7th, 2007 at 10:19
    Quote | #2

    好处是还可以在cluster的列上面加索引