您好,登錄后才能下訂單哦!
這篇文章主要介紹Hive數據導入HBase的方法有哪些,文中介紹的非常詳細,具有一定的參考價值,感興趣的小伙伴們一定要看完!
Hive數據導入到HBase基本有2個方案:
1、HBase中建表,然后Hive中建一個外部表,這樣當Hive中寫入數據后,HBase中也會同時更新
2、MapReduce讀取Hive數據,然后寫入(API或者Bulkload)到HBase
創建hbase表
(1) 建立一個表格classes具有1個列族user
create 'classes','user'
(2) 查看表的構造
hbase(main):005:0> describe 'classes' DESCRIPTION ENABLED 'classes', {NAME => 'user', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', true VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => ' false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
(3) 加入2行數據
put 'classes','001','user:name','jack' put 'classes','001','user:age','20' put 'classes','002','user:name','liza' put 'classes','002','user:age','18'
(4) 查看classes中的數據
hbase(main):016:0> scan 'classes' ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza
(5) 創建外部hive表,查詢驗證
create external table classes(id int, name string, age int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,user:name,user:age") TBLPROPERTIES("hbase.table.name" = "classes"); select * from classes; OK 1 jack 20 2 liza 18
(6)再添加數據到HBase
put 'classes','003','user:age','1820183291839132' hbase(main):025:0> scan 'classes' ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132
(7)Hive查詢,看看新數據
select * from classes; OK 1 jack 20 2 liza 18 3 NULL NULL --這里是null了,因為003沒有name,所以補位Null,而age為Null是因為超過最大值
(8)如下作為驗證
put 'classes','004','user:name','test' put 'classes','004','user:age','1820183291839112312' -- 已經超int了 hbase(main):030:0> scan 'classes' ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test select * from classes; 1 jack 20 2 liza 18 3 NULL NULL 4 test NULL -- 超int后也認為是null put 'classes','005','user:age','1231342' hbase(main):034:0* scan 'classes' ROW COLUMN+CELL 001 column=user:age, timestamp=1404980824151, value=20 001 column=user:name, timestamp=1404980772073, value=jack 002 column=user:age, timestamp=1404980963764, value=18 002 column=user:name, timestamp=1404980953897, value=liza 003 column=user:age, timestamp=1404981476497, value=1820183291839132 004 column=user:age, timestamp=1404981558125, value=1820183291839112312 004 column=user:name, timestamp=1404981551508, value=test 005 column=user:age, timestamp=1404981720600, value=1231342 select * from classes; 1 jack 20 2 liza 18 3 NULL NULL 4 test NULL 5 NULL 1231342
注意點:
1、hbase中的空cell在hive中會補null
2、hive和hbase中不匹配的字段會補null
3、Bytes類型的數據,建hive表示加#b
http://stackoverflow.com/questions/12909118/number-type-value-in-hbase-not-recognized-by-hive
http://www.aboutyun.com/thread-8023-1-1.html
4、HBase CF to hive Map
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
MR寫入到HBase有2個常用方法,1是直接調用HBase Api,使用Table 、Put寫入;2是通過MR生成HFile,然后Bulkload到HBase,數據量很大的時候推薦使用。
注意點:
1、如果需要從hive的路徑中讀取一些值怎么辦
private String reg = "stat_date=(.*?)\\/softid=([\\d]+)/"; private String stat_date; private String softid; ------------廈門map函數中寫入------------- String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString(); ///user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0 // 解析stat_date 和softid Pattern pattern = Pattern.compile(reg); Matcher matcher = pattern.matcher(filePathString); while(matcher.find()){ stat_date = matcher.group(1); softid = matcher.group(2); }
2、hive中的map和list怎么處理
hive中的分隔符主要有8種,分別是\001-----> \008
默認 ^A \001 , ^B \002 : ^C \003
Hive中保存的Lis,最底層的數據格式為 jerrick, liza, tom, jerry , Map的數據格式為 jerrick:23, liza:18, tom:0
所以在MR讀入時需要簡單處理下,例如map需要: "{"+ mapkey.replace("\002", ",").replace("\003", ":")+"}", 由此再轉為JSON, toString后再保存到HBase。
3、簡單實例,代碼刪減很多,僅可參考!
public void map( LongWritable key, Text value, Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue>.Context context) { String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString(); ///user/hive/warehouse/snapshot.db/stat_all_info/stat_date=20150820/softid=201/000000_0 // 解析stat_date 和softid Pattern pattern = Pattern.compile(reg); Matcher matcher = pattern.matcher(filePathString); while(matcher.find()){ stat_date = matcher.group(1); softid = matcher.group(2); } rowMap.put("stat_date", stat_date); rowMap.put("softid", softid); String[] vals = value.toString().split("\001"); try { Configuration conf = context.getConfiguration(); String cf = conf.get("hbase.table.cf", HBASE_TABLE_COLUME_FAMILY); String arow = rowkey; for(int index=10; index < vals.length; index++){ byte[] row = Bytes.toBytes(arow); ImmutableBytesWritable k = new ImmutableBytesWritable(row); KeyValue kv = new KeyValue(); if(index == vals.length-1){ //dict need logger.info("d is :" + vals[index]); logger.info("d is :" + "{"+vals[index].replace("\002", ",").replace("\003", ":")+"}"); JSONObject json = new JSONObject("{"+vals[index].replace("\002", ",").replace("\003", ":")+"}"); kv = new KeyValue(row, cf.getBytes(),Bytes.toBytes(valueKeys[index]), Bytes.toBytes(json.toString())); }else{ kv = new KeyValue(row, cf.getBytes(),Bytes.toBytes(valueKeys[index]), Bytes.toBytes(vals[index])); } context.write(k, kv); } } catch (Exception e1) { context.getCounter("offile2HBase", "Map ERROR").increment(1); logger.info("map error:" + e1.toString()); } context.getCounter("offile2HBase", "Map TOTAL").increment(1); } }
4、bulkload
int jobResult = (job.waitForCompletion(true)) ? 0 : 1; logger.info("jobResult=" + jobResult); Boolean bulkloadHfileToHbase = Boolean.valueOf(conf.getBoolean("hbase.table.hfile.bulkload", false)); if ((jobResult == 0) && (bulkloadHfileToHbase.booleanValue())) { LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); loader.doBulkLoad(outputDir, hTable); }
以上是“Hive數據導入HBase的方法有哪些”這篇文章的所有內容,感謝各位的閱讀!希望分享的內容對大家有幫助,更多相關知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。