您好,登錄后才能下訂單哦!
hbase的hbase.hregion.max.filesize屬性值用來指定region分割的閥值, 該值默認為268435456(256MB), 當一個列族文件大小超過該值時,將會分裂成兩個region。
hbase的列可以有很多,設計時有兩種方式可選擇, 寬表(一行有很多列)和窄表
如有一個存儲用戶郵件的表
按寬表設計時,可以表示成(一個用戶的所有郵件存成一行)
userid1 email1 emali2 email3 ... ... ... ... ... emailn
userid2 email1 emali2 email3 ... ... ... ... ... emailn
useridn
按窄表設計時,可以表示成(rowkey由用ID和emailID組成)
userid1_emialid1 email1
userid1_emialid2 email2
userid1_emialid3 email2
userid1_emialidn emailn
userid2_emialid1 email1
userid2_emialid2 email2
userid2_emialid3 email3
userid2_emialidn emailn
這兩種設計方法會對region的分割造成影響, 今天在看HFileOutputFormat代碼時發現它new出的RecordWriter對 region分割有一定的限制,
只有當rowkey不同是才會做分割, 而rowkey相同時即使region大小已經超過hbase.hregion.max.filesize值, 也不會分割
RecordWriter代碼:
- public void write(ImmutableBytesWritable row, KeyValue kv)
- throws IOException {
- long length = kv.getLength();
- byte [] family = kv.getFamily();
- WriterLength wl = this.writers.get(family);
- if (wl == null || ((length + wl.written) >= maxsize) &&
- Bytes.compareTo(this.previousRow, 0, this.previousRow.length,
- kv.getBuffer(), kv.getRowOffset(), kv.getRowLength()) != 0) {
- // Get a new writer.
- Path basedir = new Path(outputdir, Bytes.toString(family));
- if (wl == null) {
- wl = new WriterLength();
- this.writers.put(family, wl);
- if (this.writers.size() > 1) throw new IOException("One family only");
- // If wl == null, first file in family. Ensure family dir exits.
- if (!fs.exists(basedir)) fs.mkdirs(basedir);
- }
- wl.writer = getNewWriter(wl.writer, basedir);
- LOG.info("Writer=" + wl.writer.getPath() +
- ((wl.written == 0)? "": ", wrote=" + wl.written));
- wl.written = 0;
- }
- kv.updateLatestStamp(this.now);
- wl.writer.append(kv);
- wl.written += length;
- // Copy the row so we know when a row transition.
- this.previousRow = kv.getRow();
- }
標紅加粗部分說明當塊大小大于hbase.hregion.max.filesize值, 并卻當前行與上一次插入的行不同時才會分割region.
1. 寬表情況下, 單獨一行大小超過hbase.hregion.max.filesize值, 不會做分割
2. 相同rowkey下插入很多不同版本的記錄,即使大小超過hbase.hregion.max.filesize值, 也不會做分割
下面就來驗證下:
為了盡早看到效果, 需要在hbase-site.xml中修改兩個配置參數
- <property>
- <name>hbase.hregion.memstore.flush.size</name>
- <value>5</value>
- <description>
- Memstore will be flushed to disk if size of the memstore
- exceeds this number of bytes. Value is checked by a thread that runs
- every hbase.server.thread.wakefrequency.
- </description>
- </property>
- <property>
- <name>hbase.hregion.max.filesize</name>
- <value>10</value>
- <description>
- Maximum HStoreFile size. If any one of a column families' HStoreFiles has
- grown to exceed this value, the hosting HRegion is split in two.
- Default: 256M.
- </description>
- </property>
建測試表t1和t2
- hbase(main):076:0* create 't1','f1'
- 0 row(s) in 1.6460 seconds
- hbase(main):077:0> create 't2','f1'
- 0 row(s) in 1.1790 seconds
查看系統表 .META.
- hbase(main):081:0* scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314720672241, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED => 16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123
- .
- 2 row(s) in 0.0230 seconds
可以看到此時,t1,t2都已有一個region
先往t1表插入10條記錄,rowkwy相同
- hbase(main):086:0* for i in 0..9 do\
- hbase(main):087:1* put 't1','row1',"f1:c#{i}","swallow#{i}"\
- hbase(main):088:1* end
- 0 row(s) in 0.0180 seconds
- 0 row(s) in 0.0070 seconds
- 0 row(s) in 0.0420 seconds
- 0 row(s) in 0.0620 seconds
- 0 row(s) in 0.0120 seconds
- 0 row(s) in 0.0770 seconds
- 0 row(s) in 0.0150 seconds
- 0 row(s) in 0.1290 seconds
- 0 row(s) in 10.0740 seconds
- 0 row(s) in 0.1230 seconds
- => 0..9
- hbase(main):089:0>
查看t1記錄
- hbase(main):089:0> scan 't1'
- ROW COLUMN+CELL
- row1 column=f1:c0, timestamp=1314720946495, value=swallow0
- row1 column=f1:c1, timestamp=1314720946507, value=swallow1
- row1 column=f1:c2, timestamp=1314720946903, value=swallow2
- row1 column=f1:c3, timestamp=1314720946939, value=swallow3
- row1 column=f1:c4, timestamp=1314720946976, value=swallow4
- row1 column=f1:c5, timestamp=1314720947055, value=swallow5
- row1 column=f1:c6, timestamp=1314720947070, value=swallow6
- row1 column=f1:c7, timestamp=1314720947198, value=swallow7
- row1 column=f1:c8, timestamp=1314720957272, value=swallow8
- row1 column=f1:c9, timestamp=1314720957392, value=swallow9
- 1 row(s) in 0.0300 seconds
查看 .META.
- hbase(main):090:0> scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314720672241, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED => 16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123
- .
- 2 row(s) in 0.0210 seconds
可以看到t1仍舊只有一個region
接下去往往t2表插入10條相同記錄,但rowkwy不同
- hbase(main):091:0> for i in 0..9 do\
- hbase(main):092:1* put 't2',"row#{i}","f1:c#{i}","swallow#{i}"\
- hbase(main):093:1* end
- 0 row(s) in 0.1140 seconds
- 0 row(s) in 0.0080 seconds
- 0 row(s) in 0.0410 seconds
- 0 row(s) in 0.0820 seconds
- 0 row(s) in 0.0210 seconds
- 0 row(s) in 0.0410 seconds
- 0 row(s) in 0.0200 seconds
- 0 row(s) in 0.1210 seconds
- 0 row(s) in 0.0140 seconds
- 0 row(s) in 0.0360 seconds
- => 0..9
查看t2記錄
- hbase(main):097:0* scan 't2'
- ROW COLUMN+CELL
- row0 column=f1:c0, timestamp=1314721110769, value=swallow0
- row1 column=f1:c1, timestamp=1314721110787, value=swallow1
- row2 column=f1:c2, timestamp=1314721110830, value=swallow2
- row3 column=f1:c3, timestamp=1314721110916, value=swallow3
- row4 column=f1:c4, timestamp=1314721110932, value=swallow4
- row5 column=f1:c5, timestamp=1314721110971, value=swallow5
- row6 column=f1:c6, timestamp=1314721110989, value=swallow6
- row7 column=f1:c7, timestamp=1314721111121, value=swallow7
- row8 column=f1:c8, timestamp=1314721111130, value=swallow8
- row9 column=f1:c9, timestamp=1314721111172, value=swallow9
- 10 row(s) in 1.0450 seconds
查看 .META.
- hbase(main):102:0> scan '.META.'
- ROW COLUMN+CELL
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.', STARTKEY => '', ENDK
- . EY => '', ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020
- .
- t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314721112130, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.', STARTKEY => '', ENDK
- . EY => '', ENCODED => 16bb3d2563eab3b4e25477c64e007e71, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILT
- ER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOC
- KCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123
- .
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitA, timestamp=1314721112130, value=REGION => {NAME => 't2,,1314721111490.71df02214242923574b71fe5e2a19360.', STARTKEY => '', ENDKEY =
- . > 'row0', ENCODED => 71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
- => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitB, timestamp=1314721112130, value=REGION => {NAME => 't2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.', STARTKEY => 'row0',
- . ENDKEY => '', ENCODED => 915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SC
- OPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:regioninfo, timestamp=1314721112267, value=REGION => {NAME => 't2,,1314721111490.71df02214242923574b71fe5e2a19360.', STARTKEY => '', ENDK
- . EY => 'row0', ENCODED => 71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SC
- OPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:server, timestamp=1314721112267, value=yinjie:60020
- .
- t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:serverstartcode, timestamp=1314721112267, value=1314716290123
- .
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:regioninfo, timestamp=1314721112627, value=REGION => {NAME => 't2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.', STARTKEY => 'row
- 61ca. 0', ENDKEY => '', ENCODED => 915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2', FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATIO
- N_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:server, timestamp=1314721112627, value=yinjie:60020
- 61ca.
- t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:serverstartcode, timestamp=1314721112627, value=1314716290123
- 61ca.
- 4 row(s) in 0.0380 seconds
可以看到t2的region已經分裂.
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。