中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

hadoop重寫方法有哪些

發布時間:2021-12-23 13:51:34 來源:億速云 閱讀:140 作者:iii 欄目:云計算

這篇文章主要介紹“hadoop重寫方法有哪些”,在日常操作中,相信很多人在hadoop重寫方法有哪些問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”hadoop重寫方法有哪些”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!

1.   下載(略)

2.   編譯(略)

3.   配置(偽分布、集群略)

4.   Hdfs

1.   Web interface:http://namenode-name:50070/(顯示datanode列表和集群統計信息)

2.   shell command & dfsadmin comman

3.   checkpoint node & backup node

1.   fsimage和edits文件merge原理

2.   (猜測是早期版本的特性)手動恢復宕掉的集群:import checkpoint;

3.   backupnode: Backup Node在內存中維護了一份從Namenode同步過來的fsimage,同時它還從namenode接收edits文件的日志流,并把它們持久化硬盤,Backup Node把收到的這些edits文件和內存中的fsimage文件進行合并,創建一份元數據備份。Backup Node高效的秘密就在這兒,它不需要從Namenode下載fsimage和edit,把內存中的元數據持久化到磁盤然后進行合并即可。

4.   banlancer:平衡各rock和datanodes數據不均衡

5.   Rock awareness:機架感知

6.   Safemode:當數據文件不完整或者手動進入safemode時,hdfs只讀,當集群檢查達到閾值或手動離開安全模式時,集群恢復讀寫。

7.   Fsck:塊文件檢查命令

8.   Fetchdt:獲取token(安全)

9.   Recovery mode:恢復模式

10. Upgrade and Rollback:升級、回滾

11. File Permissions and Security

12. Scalability

13.  

5.   Mapreduce

1.    

public class MyMapper extends Mapper<Object, Text, Text, IntWritable>{

   private Text word = new Text();

   private IntWritable one = new IntWritable(1);

   // 重寫map方法

   @Override

   public void map(Object key, Text value, Context context)

        throws IOException, InterruptedException {

      StringTokenizer stringTokenizer = new StringTokenizer(value.toString());

      while(stringTokenizer.hasMoreTokens()){

        word.set(stringTokenizer.nextToken());

        // (word,1)進行傳遞

        context.write(word, one);

      }

   }

}

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

   private IntWritable result = new IntWritable(0);

   // 重寫reduce方法

   @Override

   protected void reduce(Text key, Iterable<IntWritable> iterator,

        Context context) throws IOException, InterruptedException {

      int sum = 0;

      for(IntWritable i : iterator){

        sum += i.get();

      }

      result.set(sum);

      // reduce輸出的值

      context.write(key, result);

   }

}

public class WordCountDemo {

   public static void main(String[] args) throws Exception {

      Configuration conf = new Configuration();

      Job job = Job.getInstance(conf, "word count");

      job.setJarByClass(WordCountDemo.class);

      // 設置map、reduce class

      job.setMapperClass(MyMapper.class);

      job.setReducerClass(MyReducer.class);

      job.setCombinerClass(MyReducer.class);

      // 設置最終輸出的格式

      job.setOutputKeyClass(Text.class);

      job.setOutputValueClass(IntWritable.class);

      // 設置FileInputFormat outputFormat

      FileInputFormat.addInputPath(job, new Path(args[0]));

      FileOutputFormat.setOutputPath(job, new Path(args[1]));

      System.exit(job.waitForCompletion(true) ? 0 : 1);

   }

}

2. Job.setGroupingComparatorClass(Class).

3.  Job.setCombinerClass(Class),

4. CompressionCodec

5. Map數:Configuration.set(MRJobConfig.NUM_MAPS, int) => dataSize/blockSize

6. Reducer數:Job.setNumReduceTasks(int).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

7. Reduce->shuffle: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. –> reduce是mapper排序后的輸出的結果。在這一階段,框架通過http抓取所有mapper輸出的有關分區。

8. Reduce ->sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.-> 在這一階段,框架按照輸入的key(不同的mapper可能輸出相同的key)分組reducer。Shuffle和sort會同時發生,當map輸出被捕捉時,他們又會進行合并。

9. Reduce ->reduce:

10.  Secondary sort

11.  Partitioner

12.  Counter :Mapper and Reducer implementations can use the Counter to report statistics.

13.  Job conf:配置 -> speculative manner ( setMapSpeculativeExecution(boolean))/setReduceSpeculativeExecution(boolean)), maximum number of attempts per task (setMaxMapAttempts(int)/ setMaxReduceAttempts(int)) etc.  

Or

 Configuration.set(String, String)/ Configuration.get(String)

14.  Task executor & environment ->  The user can specify additional options to the child-jvm via the mapreduce.{map|reduce}.java.opts and configuration parameter in the Job such as non-standard paths for the run-time linker to search shared libraries via -Djava.library.path=<> etc. If the mapreduce.{map|reduce}.java.opts parameters contains the symbol @taskid@ it is interpolated with value of taskid of the MapReduce task.

15.  Memory management - > Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapreduce.{map|reduce}.memory.mb. Note that the value set here is a per process limit. The value for mapreduce.{map|reduce}.memory.mb should be specified in mega bytes (MB). And also the value must be greater than or equal to the -Xmx passed to JavaVM, else the VM might not start.

16.  Map Parameters ...... (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)

17.  Parameters ()

18.  Job submission and monitoring:

1.Job provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information and so on.

2. The job submission process involves:

1. Checking the input and output specifications of the job.

2. Computing the InputSplit values for the job.

3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.

4. Copying the job's jar and configuration to the MapReduce system directory on the FileSystem.

5. Submitting the job to the ResourceManager and optionally monitoring it's status.

3. Job history

19.  Job controller

1. Job.submit() || Job.waitForCompletion(boolean)

2. 多Mapreduce job

1. 迭代式mapreduce(上一個mr作為下一個mr的輸入,缺點:創建job對象的開銷、本地磁盤讀寫io和網絡開銷大)

2. MapReduce-JobControl:job封裝各個job的依賴關系,jobcontrol線程管理各個作業的狀態。

3. MapReduce-ChainMapper/ChainReduce:(chainMapper.addMap().可以在一個job中鏈接多個mapper任務,不可用于多reduce的job)。

20.  Job input & output

1. InputFormat TextInputFormat FileInputFormat

2. InputSplit FileSplit

3. RecordReader

4. OutputFormat OutputCommitter

到此,關于“hadoop重寫方法有哪些”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

朝阳县| 浦江县| 泰顺县| 大庆市| 乐陵市| 沙湾县| 绥宁县| 天祝| 蒙城县| 大连市| 临夏市| 浠水县| 陇西县| 萝北县| 云浮市| 巴楚县| 杭锦后旗| 莱西市| 岳阳市| 高雄市| 太谷县| 鞍山市| 阳原县| 绥芬河市| 蚌埠市| 兴山县| 阿城市| 阿拉善盟| 葫芦岛市| 阿巴嘎旗| 隆林| 泸水县| 微山县| 抚松县| 神木县| 临邑县| 娱乐| 霍山县| 玛多县| 宁都县| 滦南县|