中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

hadoop2.9.1偽分布式環境搭建以及文件系統的簡單操作

發布時間:2020-06-08 15:56:07 來源:網絡 閱讀:4072 作者:斷臂人 欄目:大數據

1、準備

1.1、在vmware上安裝centos7的虛擬機


1.2、系統配置

配置網絡

# vi /etc/sysconfig/network-scripts/ifcfg-ens33

BOOTPROTO=static

ONBOOT=yes

IPADDR=192.168.120.131

GATEWAY=192.168.120.2

NETMASK=255.255.255.0

DNS1=8.8.8.8

DNS2=4.4.4.4


1.3、配置主機名

# hostnamectl set-hostname master1

# hostname master1


1.4、指定時區(如果時區不是上海)

# ll /etc/localtime

lrwxrwxrwx. 1 root root 35 6月   4 19:25 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai


如果時區不對的話需要修改時區,方法:

# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime


1.5、上傳包

hadoop-2.9.1.tar

jdk-8u171-linux-x64.tar


2、開始搭建環境

2.1、創建用戶和組

[root@master1 ~]# groupadd hadoop

[root@master1 ~]# useradd -g hadoop hadoop

[root@master1 ~]# passwd hadoop


2.2、解壓包

切換用戶

[root@master1 ~]# su hadoop


創建存放包的目錄

[hadoop@master1 root]$ cd

[hadoop@master1 ~]$ mkdir src

[hadoop@master1 ~]$ mv *.tar src


解壓包

[hadoop@master1 ~]$ cd src

[hadoop@master1 src]$ tar -xf jdk-8u171-linux-x64.tar -C ../

[hadoop@master1 src]$ tar xf hadoop-2.9.1.tar -C ../

[hadoop@master1 src]$ cd

[hadoop@master1 ~]$ mv jdk1.8.0_171 jdk

[hadoop@master1 ~]$ mv hadoop-2.9.1 hadoop


2.3、配置環境變量

[hadoop@master1 ~]$ vi .bashrc

export JAVA_HOME=/home/hadoop/jdk

export JRE_HOME=/$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/home/hadoop/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin


使配置文件生效

[hadoop@master1 ~]$ source .bashrc


驗證

[hadoop@master1 ~]$ java -version

java version "1.8.0_171"

Java(TM) SE Runtime Environment (build 1.8.0_171-b11)

Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)


[hadoop@master1 ~]$ hadoop version

Hadoop 2.9.1

Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e

Compiled by root on 2018-04-16T09:33Z

Compiled with protoc 2.5.0

From source with checksum 7d6d2b655115c6cc336d662cc2b919bd

This command was run using /home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.9.1.jar


2.4、修改hadoop配置文件

[hadoop@master1 ~]$ cd hadoop/etc/hadoop/

[hadoop@master1 hadoop]$ vi hadoop-env.sh

export JAVA_HOME=/home/hadoop/jdk


[hadoop@master1 hadoop]$ vi core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.120.131:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/data/hadoop/hadoop_tmp_dir</value>

</property>

</configuration>


說明:

fs.defaultFS:這個屬性用來指定namenode的hdfs協議的文件系統通信地址,可以指定一個主機+端口,也可以指定一個namenode服務(這個服務內部可以有多臺namenode實現ha的namenode服務)

hadoop.tmp.dir:hadoop集群在工作的時候存儲的一些臨時文件的目錄


[hadoop@master1 hadoop]$ vi hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>


說明:

dfs.replication:hdfs的副本數設置。也就是上傳一個文件,其分割的block塊后,每個block的冗余副本個數,默認配置是3。


下面的參數以配置就會出現datanode無法啟動的問題,所以不做配置,尚未搞明白怎么出現的。

dfs.namenode.name.dir:namenode數據的存放目錄。也就是namenode元數據存放的目錄,記錄了hdfs系統中文件的元數據。

dfs.datanode.data.dir:datanode數據的存放目錄。也就是block塊的存放目錄。


下面貼出異常信息

[hadoop@master1 logs]$ pwd

/home/hadoop/hadoop/logs

[hadoop@master1 logs]$ tail -f hadoop-hadoop-datanode-master1.log


2018-06-12 22:30:14,749 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/data/hadoop/hdfs/dn/

java.io.IOException: Incompatible clusterIDs in /data/hadoop/hdfs/dn: namenode clusterID = CID-5bbc555b-4622-4781-9a7f-c2e5131e4869; datanode clusterID = CID-29ec402d-95f8-4148-8d18-f7e4b965be4f

at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:760)


2018-06-12 22:30:14,752 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4) service to master1/192.168.120.131:9000. Exiting.

java.io.IOException: All specified directories have failed to load.

at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:557)


2018-06-12 22:30:14,753 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4) service to master1/192.168.120.131:9000

2018-06-12 22:30:14,854 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid f39576ae-b7af-44aa-841a-48ba03b956f4)

2018-06-12 22:30:16,855 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode

2018-06-12 22:30:16,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at master1/192.168.120.131


[hadoop@master1 hadoop]$ cp mapred-site.xml.template mapred-site.xml

[hadoop@master1 hadoop]$ vi mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>


說明:

mapreduce.framework.name:指定mr框架為yarn方式,Hadoop二代MP也基于Yarn來運行。


[hadoop@master1 hadoop]$ vi yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<!-- 指定ResourceManager的地址-->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>192.168.120.131</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>


說明:

yarn.resourcemanager.hostname:yarn總管理器的IPC通訊地址,可以是IP也可以是主機名。

yarn.nodemanager.aux-service:集群為MapReduce程序提供的shuffle服務


2.5、創建目錄并賦予權限

[hadoop@master1 hadoop]$ exit

[root@master1 ~]# mkdir -p /data/hadoop/hadoop_tmp_dir

[root@master1 ~]# mkdir -p /data/hadoop/hdfs/{nn,dn}

[root@master1 ~]# chown -R hadoop:hadoop /data


3、格式化文件系統并啟動服務

3.1、格式化文件系統

[root@master1 ~]# su hadoop

[hadoop@master1 ~]$ cd hadoop/bin

[hadoop@master1 bin]$ ./hdfs namenode -format


注意:

如果是集群環境,HDFS初始化只能在主節點上運行


3.2、啟動HDFS

[hadoop@master1 bin]$ cd sbin

[hadoop@master1 sbin]$ ./start-dfs.sh


注意:

如果是集群環境,不管在集群中的哪個節點都可以運行

如果有個別服務啟動失敗,配置也沒有問題的話,很有可能是創建的目錄權限問題


3.3、啟動YARN

[hadoop@master1 sbin]$ ./start-yarn.sh


注意:

如果是集群環境,只能在主節點中運行


查看服務狀態

[hadoop@master1 sbin]$ jps

6708 NameNode

6966 SecondaryNameNode

6808 DataNode

7116 Jps

5791 ResourceManager

5903 NodeManager


3.4、瀏覽器查看服務狀態

使用web查看HSFS運行狀態

在瀏覽器輸入

http://192.168.120.131:50070


使用web查看YARN運行狀態

在瀏覽器輸入

http://192.168.120.131:8088


4、啟動ssh無密碼驗證

上面啟動服務時還需要輸入用戶名登錄密碼,如下所示:

[hadoop@master1 sbin]$ ./start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master1.out

hadoop@localhost's password:


如果想要做到無密碼啟動服務的話需要配置ssh

[hadoop@master1 sbin]$ cd ~/.ssh/

[hadoop@master1 .ssh]$ ll

總用量 4

-rw-r--r--. 1 hadoop hadoop 372 6月  12 18:36 known_hosts


[hadoop@master1 .ssh]$ ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:D14LpPKZbih0K+kVoTl23zGsKK1xOVlNuSugDvrkjJA hadoop@master1

The key's randomart image is:

+---[RSA 2048]----+

|                 |

|         .       |

|    .   +        |

|   o . * .       |

|  = = o S .      |

| o.=.@ * O .     |

|E.=oOoB + o      |

|oB+*oo..         |

|ooBo ..          |

+----[SHA256]-----+


一路按下enter鍵就行


[hadoop@master1 .ssh]$ ll

總用量 12

-rw-------. 1 hadoop hadoop 1675 6月  12 18:46 id_rsa

-rw-r--r--. 1 hadoop hadoop  396 6月  12 18:46 id_rsa.pub

-rw-r--r--. 1 hadoop hadoop  372 6月  12 18:36 known_hosts


[hadoop@master1 .ssh]$ cat id_rsa.pub >> ~/.ssh/authorized_keys


[hadoop@master1 .ssh]$ ll

總用量 16

-rw-rw-r--. 1 hadoop hadoop  396 6月  12 18:47 authorized_keys

-rw-------. 1 hadoop hadoop 1675 6月  12 18:46 id_rsa

-rw-r--r--. 1 hadoop hadoop  396 6月  12 18:46 id_rsa.pub

-rw-r--r--. 1 hadoop hadoop  372 6月  12 18:36 known_hosts


如果發現還需要輸入密碼才能登錄,這是因為文件權限的問題,改下權限就可以

[hadoop@master1 .ssh]$ chmod 600 authorized_keys


發現可以實現無密碼登錄了

[hadoop@master1 .ssh]$ ssh localhost

Last login: Tue Jun 12 18:48:38 2018 from fe80::e961:7d5b:6a72:a2a9%ens33

[hadoop@master1 ~]$

 

當然無密登錄的實現還可以用另一種方法實現

在執行完ssh-keygen之后

執行下面的命令

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1


5、文件系統的簡單應用及遇到的一些問題

5.1、創建目錄

在文件系統中創建目錄

[hadoop@master1 bin]$ hdfs dfs -mkdir -p /user/hadoop

18/06/12 21:25:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


列出創建的目錄

[hadoop@master1 bin]$ hdfs dfs -ls /

18/06/12 21:29:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

drwxr-xr-x   - hadoop supergroup          0 2018-06-12 21:25 /user

   

5.2、解決警告問題

有WARN警告,但是并不影響Hadoop正常使用。


兩種方式可以解決這個報警問題,方法一是重新編譯源碼,方法二是在日志中取消告警信息,我采用的是第二種方式。


[hadoop@master1 ]$ cd /home/hadoop/hadoop/etc/hadoop/

[hadoop@master1 hadoop]$ vi log4j.properties

添加

#native WARN

log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR


可以看到效果了

[hadoop@master1 hadoop]$ hdfs dfs -ls /

Found 1 items

drwxr-xr-x   - hadoop supergroup          0 2018-06-12 21:25 /user


5.3、上傳文件到hdfs文件系統中

[hadoop@master1 bin]$ hdfs dfs -mkdir -p input

[hadoop@master1 hadoop]$ hdfs dfs -put /home/hadoop/hadoop/etc/hadoop input


Hadoop默認附帶了豐富的例子:包括wordcoun,terasort,join,grep等,執行下面的命令查看:

[hadoop@master1 bin]$ hadoop jar /home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

dbcount: An example job that count the pageview counts from a database.

distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

wordmean: A map/reduce program that counts the average length of the words in the input files.

wordmedian: A map/reduce program that counts the median length of the words in the input files.

wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.


偽分布式運行MapReduce作業的方式跟單機模式相同,區別在于偽分布式方式讀取的是HDFS中的文件(可以將單機步驟中創建的本地input文件夾,輸出結果output文件夾都刪除來驗證這一點)。

[hadoop@master1 sbin]$ hadoop jar /home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z]+'

18/06/12 22:57:05 INFO client.RMProxy: Connecting to ResourceManager at /192.168.120.131:8032

18/06/12 22:57:07 INFO input.FileInputFormat: Total input files to process : 30

省略。。。

18/06/12 22:57:08 INFO mapreduce.Job: Running job: job_1528815135795_0001

18/06/12 22:57:23 INFO mapreduce.Job: Job job_1528815135795_0001 running in uber mode : false

18/06/12 22:57:23 INFO mapreduce.Job:  map 0% reduce 0%

18/06/12 22:58:02 INFO mapreduce.Job:  map 13% reduce 0%

省略。。。

18/06/12 23:00:17 INFO mapreduce.Job:  map 97% reduce 32%

18/06/12 23:00:18 INFO mapreduce.Job:  map 100% reduce 32%

18/06/12 23:00:19 INFO mapreduce.Job:  map 100% reduce 100%

18/06/12 23:00:20 INFO mapreduce.Job: Job job_1528815135795_0001 completed successfully

18/06/12 23:00:20 INFO mapreduce.Job: Counters: 50

File System Counters

FILE: Number of bytes read=46

FILE: Number of bytes written=6136681

FILE: Number of read operations=0

省略。。。

File Input Format Counters

Bytes Read=138

File Output Format Counters

Bytes Written=24


查看結果

[hadoop@master1 sbin]$ hdfs dfs -cat output/*

1 dfsmetrics

1 dfsadmin


把結果取到本地

[hadoop@master1 sbin]$ hdfs dfs -get output /data

[hadoop@master1 sbin]$ ll /data

總用量 0

drwxrwxrwx. 5 hadoop hadoop 52 6月  12 19:20 hadoop

drwxrwxr-x. 2 hadoop hadoop 42 6月  12 23:03 output

[hadoop@master1 sbin]$ cat /data/output/*

1 dfsmetrics

1 dfsadmin


6、開啟歷史服務器

歷史服務器服務用來在web中查看任務運行情況

[hadoop@master1 sbin]$ mr-jobhistory-daemon.sh start historyserver

starting historyserver, logging to /home/hadoop/hadoop/logs/mapred-hadoop-historyserver-master1.out

[hadoop@master1 sbin]$ jps

19985 Jps

15778 ResourceManager

15890 NodeManager

14516 NameNode

14827 SecondaryNameNode

19948 JobHistoryServer

14653 DataNode


在初學時盡可能的把配置簡單化,有助于出錯后的排查。


參考:

https://www.cnblogs.com/wangxin37/p/6501484.html

https://www.cnblogs.com/xing901022/p/5713585.html


向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

遂平县| 阜康市| 汨罗市| 河津市| 平遥县| 乌鲁木齐市| 介休市| 伊川县| 朝阳区| 侯马市| 出国| 宁乡县| 遵化市| 临澧县| 资讯| 华蓥市| 富民县| 上杭县| 镇原县| 南丰县| 桂阳县| 南澳县| 岑巩县| 乌苏市| 本溪市| 江华| 温泉县| 泗洪县| 洮南市| 田东县| 醴陵市| 平和县| 莱芜市| 荣成市| 沭阳县| 沙雅县| 连州市| 闸北区| 宜宾市| 沽源县| 都昌县|