|
我注意到,AWS Redshift建议通过它使用数据(通过COPY)加载到空表时自动创建的列压缩编码不同的列压缩编码。; G* ]$ d4 o# u) e. U
例如,我创建了一个表并从S加载数据如下:
& k3 n. |: j8 s0 h: KCREATE TABLE Client (Id varchar(511) ,ClientId integer ,CreatedOn timestamp,UpdatedOn timestamp , DeletedOn timestamp ,LockVersion integer ,RegionId varchar(511) ,OfficeId varchar(511) ,CountryId varchar(511) , FirstContactDate timestamp ,DidExistPre boolean ,IsActive boolean ,StatusReason integer , CreatedById varchar(511) ,IsLocked boolean ,LockType integer ,KeyWorker varchar(511) , InactiveDate timestamp ,Current_Flag varchar(511) );Table Client执行时间:0.3秒
% A5 r4 ?+ Y& Q1 e; n# {copy Client from 's3://<strong>//Client.csv' credentials 'aws_access_key_id=; aws_secret_access_key=' csv fillrecord truncatecolumns ignoreheader 1 timeformat as 'YYYY-MM-DDTHH:MI:SS' gzip acceptinvchars compupdate on region 'ap-southeast-2';警告:已成功加载到表 client”中,成功加载了24284条记录。完成加载到表“- G4 w9 g0 I* y8 M1 Z
client已加载6条记录并更换ACCEPTINVCHARS。检查“ stl_replacements了解详细信息的系统表。
/ c, Q5 D7 g1 F: e2 `! V0行受影响的COPY成功执行% I6 y( r7 a0 c. D& \
执行时间:3.39s
4 L: c% Q/ N+ }# {完成此操作后,我可以查看COPY列压缩编码:- ^8 P9 m; R0 c( `
select "column",type,encoding,distkey,sortkey,"notnull" from pg_table_def where tablename = 'client';给予:9 L+ m: H! }+ h0 |# B* f
╔══════════════════╦═════════════════════════════╦═══════╦═══════╦═══╦═══════╗║ id ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ clientid ║ integer ║ delta ║ false ║ 0 ║ false ║║ createdon ║ timestamp without time zone ║ lzo ║ false ║ 0 ║ false ║║ updatedon ║ timestamp without time zone ║ lzo ║ false ║ 0 ║ false ║║ deletedon ║ timestamp without time zone ║ none ║ false ║ 0 ║ false ║║ lockversion ║ integer ║ delta ║ false ║ 0 ║ false ║║ regionid ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ officeid ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ countryid ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ firstcontactdate ║ timestamp without time zone ║ lzo ║ false ║ 0 ║ false ║║ didexistprecirts ║ boolean ║ none ║ false ║ 0 ║ false ║║ isactive ║ boolean ║ none ║ false ║ 0 ║ false ║║ statusreason ║ integer ║ none ║ false ║ 0 ║ false ║║ createdbyid ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ islocked ║ boolean ║ none ║ false ║ 0 ║ false ║║ locktype ║ integer ║ lzo ║ false ║ 0 ║ false ║║ keyworker ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║║ inactivedate ║ timestamp without time zone ║ lzo ║ false ║ 0 ║ false ║║ current_flag ║ character varying(511) ║ lzo ║ false ║ 0 ║ false ║╚══════════════════╩═════════════════════════════╩═══════╩═══════╩═══╩═══════╝然后,我可以这样做:
3 ~; O! J+ x: q. Y; f) |analyze compression client;给予:* k1 @% Z6 Y" R0 D
玉玉︹玉玉︹玉玉?钰? client 钰? id zstd 钰? 40.59 钰钰 client 钰? clientid delta 钰? 0. 钰 client 钰? createdon zstd 钰? 19.85 钰钰 client 钰? updatedon zstd 钰? 12.59 钰钰 client 钰? deletedon raw 钰 0. 钰 client 钰? lockversion 钰 zstd 钰? 39.12 钰钰 client 钰? regionid zstd 钰? 54.47 钰钰 client 钰? officeid zstd 钰? 88.84 钰钰 client 钰? countryid zstd 钰? 79.13 钰钰 client 钰? firstcontactdate 钰? zstd 钰? 22.31 钰?钰? client 钰? didexistprecirts 钰? raw 钰 0. 钰 client 钰? isactive raw 钰 0. 钰 client 钰? statusreason raw 钰 0. 钰 client 钰? createdbyid 钰 zstd 钰? 52.43 钰钰 client 钰? islocked raw 钰 0. 钰 client 钰? locktype zstd 钰? 63.01 钰 client 钰? keyworker zstd 钰? 38.79 钰 client 钰? inactivedate zstd 钰? 25.40 钰 client 钰? current_flag zstd 钰? 90.51 钰?钰气玉玉悫暕钰悫玉玉悫玉玉悫晲钰┾玉玉┾玉玉?也就是说,结果完全不同。
2 C/ N1 j9 W4 m) g我很想知道为什么会这样。我发现24K记录少于AWS指定有意义的压缩分析样本所需的100K但对于相同的24K行表,COPY和ANALYZE给出不同的结果仍然很奇怪。7 _! x8 P7 j- K1 }
! e5 J: d3 y7 P7 y: o# B9 T1 A
解决方案: |
|