Greenplum在创建表的时候可以选择表存储方式,比如是行存储,还是列存储,是普通的heap表,还是append optimized表。
下面分别介绍以下
行存储是行为单位存储数据,一行中越是靠后的列,那么查询需要的cost相对越大,这个以前oracle做过相应比较,都是一样的道理,行存储更适合OLTP的系统
列存储是以列为单位存储数据,物理上一列会对应一个或者多个数据文件,而且列存储的压缩比比较高,但是如果查询的时候,如果返回的列很多,那么效率不如行存储,列存储更适合对某一列做相关统计,列存储更适合OLAP的系统
堆表,我们普通的创建的表默认都是堆表,适合频繁的更新删除操作的小表,适合OLTP系统
AO表,适合批量数据写入,不适合单行的insert,适合大表使用,所以一般用在数据仓库系统,适合OLAP系统
创建一个heap表
CREATE TABLE foo (a int, b text) DISTRIBUTED BY (a);
不带压缩的ao表
CREATE TABLE bar (a int, b text)
WITH (appendonly=true)
DISTRIBUTED BY (a);
创建列存储表
CREATE TABLE bar (a int, b text)
WITH (appendonly=true, orientation=column)
DISTRIBUTED BY (a);
另外压缩,可以分为列压缩,以及表压缩
列压缩和表压缩支持的算法
表存储类型 | 可用的压缩类型 | 支持的算法 |
Row | Table | ZLIB and QUICKLZ1 |
Column | Column and Table | RLE_TYPE, ZLIB, andQUICKLZ1 |
这里可以看到,行存储类型的只能在表级别压缩,支持ZLIB和QUICKLZ压缩算法
列存储类型可以选择表级别的压缩,也可以选择列级别的压缩,支持的压缩算法多了RLE_TYPE
创建压缩表
CREATE TABLE foo (a int, b text)
WITH (appendonly=true, compresstype=zlib, compresslevel=5);
检查AO表的压缩率和分布的内置函数
函数 | 返回类型 | 描述 |
get_ao_distribution(name) get_ao_distribution(oid) | Set of (dbid, tuplecount) rows | Shows the distribution of an append-optimized table's rows across the array. Returns a set of rows, each of which includes a segment dbid |
get_ao_compression_ratio(name) get_ao_compression_ratio(oid) | float8 | Calculates the compression ratio for a compressed append-optimized table. If information is not available, this function returns a value of -1. |
列级别压缩参数介绍
- Compression type 压缩类型
- Compression level 压缩级别
- Block size for a column 列的块大小
Name | Definition | Values | Comment |
COMPRESSTYPE | 压缩类型 | zlib: deflate algorithm quicklz: fast compression RLE_TYPE: run-length encoding none: no compression | Values are not case-sensitive. |
COMPRESSLEVEL | 压缩级别 | zlib compression: 1-9 | 1最快,但是压缩比最低 9最慢,但是压缩比最高 |
QuickLZcompression: 1 | 默认值为1 | ||
RLE_TYPEcompression: 1 – 4 1 2 3 4 | 1 is the fastest method with the least compression. 4 is the slowest method with the most compression. 1 | ||
BLOCKSIZE | 表中每一个块的大小 | 8192 – 2097152 | 值必须是8192的倍数,最大值为2097152,也就是2M |
如果什么参数都不写,那么默认不压缩,块大小就是服务端参数设置的块大小
压缩参数设置的优先权
表级别指定的列压缩设置优先于整个表的压缩设置
分区指定的列压缩设置优先于任何表或者列的压缩设置
子分区指定的列压缩设置优先于任何表,列,以及分区的压缩设置
当一个encoding设置的参数和with设置的参数冲突,那么encoding参数的优先级高于with设置的参数
压缩参数设置的相关例子:
例1:
CREATE TABLE T1 (c1 int ENCODING (compresstype=zlib),
c2 char ENCODING (compresstype=quicklz, blocksize=65536),
c3 char ) WITH (appendonly=true, orientation=column); --官方文档少一又括号
c1 zlib压缩类型,c2 quicklz压缩类型,块大小为65536, c3 不压缩,块大小为服务端配置文件的数据块大小,整个表为列存储的AO表
例2:
CREATE TABLE T2 (c1 int ENCODING (compresstype=zlib),
c2 char ENCODING (compresstype=quicklz, blocksize=65536),
c3 char,
COLUMN c3 ENCODING (compresstype=RLE_TYPE)
)
WITH (appendonly=true, orientation=column)
C1压缩类型为zlib,C2压缩类型为quicklz,块大小为65536,c3压缩类型为RLE_TYPE,块大小为系统默认块大小
例3:
CREATE TABLE T3 (c1 int ENCODING (compresstype=zlib),
c2 char ENCODING (compresstype=quicklz, blocksize=65536),
c3 char, COLUMN c3 ENCODING (compresstype=RLE_TYPE) )
WITH (appendonly=true, orientation=column)
PARTITION BY RANGE (c3) (START ('1900-01-01'::DATE)
END ('2100-12-31'::DATE),
COLUMN c3 ENCODING (zlib));
这里c1,zlib压缩,c2,为quicklz压缩,块大小65536,c3,注意partition为zlib,不是表级别设置的RLE_TYPE.因为partition的优先级高于table的优先级。
例4:
CREATE TABLE T4 (c1 int ENCODING (compresstype=zlib),
c2 char,
c4 smallint ENCODING (compresstype=none),
DEFAULT COLUMN ENCODING (compresstype=quicklz,
blocksize=65536),
COLUMN c3 ENCODING (compresstype=RLE_TYPE)
)
WITH (appendonly=true, orientation=column);
c1为zlib压缩,c2为quicklz压缩,块大小为65536,c3为RLE_TYPE,块大小为65536,c4无压缩类型,块大小为65536
这里注意设置的默认列的参数,如果没有在encoding命令指定参数,那么久用默认设置的参数。
例5:
CREATE TABLE T5(i int, j int, k int, l int)
WITH (appendonly=true, orientation=column)
PARTITION BY range(i) SUBPARTITION BY range(j)
(
partition p1 start(1) end(2)
( subpartition sp1 start(1) end(2)
column i encoding(compresstype=zlib, blocksize=65536)
),
partition p2 start(2) end(3)
( subpartition sp1 start(1) end(2)
column i encoding(compresstype=rle_type)
column k encoding(blocksize=8192)
)
); --官方文档又少一个关键字partition
这里p1分区的子分区sp1中的列i压缩类型为zlib,块大小为65536,p2的sp1中的i压缩类型为rle_type,块大小为默认的块大小
自定义一个type,然后把type指定给表的列
CREATE TYPE comptype (
internallength = 4,
input = comptype_in,
output = comptype_out,
alignment = int4,
default = 123,
passedbyvalue,
compresstype="quicklz",
blocksize=65536,
compresslevel=1
);
CREATE TABLE t2 (c1 comptype) WITH (APPENDONLY=true, ORIENTATION=column);
alter修改表
设置非空约束
ALTER TABLE address ALTER COLUMN street SET NOT NULL;
修改表的分布策略,修改分布策略后,数据会自动重新分布
ALTER TABLE address ALTER COLUMN street SET NOT NULL;
修改表的存储模式
创建一个空表,从原表插入数据,删除原表,重命名新表,赋予所有权限
CREATE TABLE sales2 (LIKE sales)
WITH (appendonly=true, compresstype=quicklz,
compresslevel=1, orientation=column);
INSERT INTO sales2 SELECT * FROM sales;
DROP TABLE sales;
ALTER TABLE sales2 RENAME TO sales;
GRANT ALL PRIVILEGES ON sales TO admin;
GRANT SELECT ON sales TO guest;
给一个表增加一个压缩列
ALTER TABLE T1
ADD COLUMN c4 int DEFAULT 0
ENCODING (COMPRESSTYPE=zlib);
压缩设置的继承
CREATE TABLE ccddl (i int, j int, k int, l int)
WITH
(APPENDONLY = TRUE, ORIENTATION=COLUMN)
PARTITION BY range(j)
SUBPARTITION BY list (k)
SUBPARTITION template(
SUBPARTITION sp1 values(1, 2, 3, 4, 5),
COLUMN i ENCODING(COMPRESSTYPE=ZLIB),
COLUMN j ENCODING(COMPRESSTYPE=QUICKLZ),
COLUMN k ENCODING(COMPRESSTYPE=ZLIB),
COLUMN l ENCODING(COMPRESSTYPE=ZLIB))
(PARTITION p1 START(1) END(10),
PARTITION p2 START(10) END(20))
;
ALTER TABLE ccddl
ADD PARTITION p3 START(20) END(30)
;
gpadmin=# \dt+ ccddl*
List of relations
Schema | Name | Type | Owner | Storage | Description
--------+--------------------------+-------+---------+----------------------+-------------
public | ccddl | table | gpadmin | append only columnar |
public | ccddl_1_prt_p1 | table | gpadmin | append only columnar |
public | ccddl_1_prt_p1_2_prt_sp1 | table | gpadmin | append only columnar |
public | ccddl_1_prt_p2 | table | gpadmin | append only columnar |
public | ccddl_1_prt_p2_2_prt_sp1 | table | gpadmin | append only columnar |
public | ccddl_1_prt_p3 | table | gpadmin | append only columnar |
public | ccddl_1_prt_p3_2_prt_sp1 | table | gpadmin | append only columnar |
gpadmin=# \d+ ccddl_1_prt_p1_2_prt_sp1
Append-Only Columnar Table "public.ccddl_1_prt_p1_2_prt_sp1"
Column | Type | Modifiers | Storage | Compression Type | Compression Level | Block Size | Description
--------+---------+-----------+---------+------------------+-------------------+------------+-------------
i | integer | | plain | zlib | 1 | 32768 |
j | integer | | plain | quicklz | 1 | 32768 |
k | integer | | plain | zlib | 1 | 32768 |
l | integer | | plain | zlib | 1 | 32768 |
Checksum: t
Check constraints:
"ccddl_1_prt_p1_2_prt_sp1_check" CHECK (k = 1 OR k = 2 OR k = 3 OR k = 4 OR k = 5)
"ccddl_1_prt_p1_check" CHECK (j >= 1 AND j < 10)
Inherits: ccddl_1_prt_p1
Has OIDs: no
Options: appendonly=true, orientation=column
Distributed by: (i)
gpadmin=# \d+ ccddl_1_prt_p3_2_prt_sp1
Append-Only Columnar Table "public.ccddl_1_prt_p3_2_prt_sp1"
Column | Type | Modifiers | Storage | Compression Type | Compression Level | Block Size | Description
--------+---------+-----------+---------+------------------+-------------------+------------+-------------
i | integer | | plain | zlib | 1 | 32768 |
j | integer | | plain | quicklz | 1 | 32768 |
k | integer | | plain | zlib | 1 | 32768 |
l | integer | | plain | zlib | 1 | 32768 |
Checksum: t
Check constraints:
"ccddl_1_prt_p3_2_prt_sp1_check" CHECK (k = 1 OR k = 2 OR k = 3 OR k = 4 OR k = 5)
"ccddl_1_prt_p3_check" CHECK (j >= 20 AND j < 30)
Inherits: ccddl_1_prt_p3
Has OIDs: no
Options: appendonly=true, orientation=column
Distributed by: (i)
通过以上的实例看到新增p3的sp1子分区ccddl_1_prt_p3_2_prt_sp1继承了子分区设置的压缩属性