hive不支持非等值join 錯誤:select from a inner join b on a.id<>b.id 替代方法:select from a inner join b on a.id=b.id and a.id is null;
hive不支持非join連接 錯誤:select from dual a,dual b where a.key = b.key; 正確:select from dual a join dual b on a.key = b.key;
hive不支持or 錯誤:select from a inner join b on a.id=b.id or a.name=b.name 替代方法:select from a inner join b on a.id=b.id union all select * from a inner join b on a.name=b.name
sortby、orderby、distributeby order by會引發全局排序;會導致所有的數據集中在一臺reducer節點上,然后進行排序,這樣很可能會超過單個節點的磁盤和內存存儲能力導致任務失敗。 distribute by + sort by就是該替代方案,被distribute by設定的字段為KEY,數據會被HASH分發到不同的reducer機器上,然后sort by會對同一個reducer機器上的每組數據進行局部排序。