您好,登錄后才能下訂單哦!
本篇內容介紹了“Spark Eclipse開發環境的搭建方法”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!
首先下載與集群 Hadoop 版本對應的 Spark 編譯好的版本,解壓縮到指定位置,注意用戶權限
進入解壓縮之后的 SPARK_HOME 目錄
配置 /etc/profile 或者 ~/.bashrc 中配置 SPARK_HOME
cd $SPARK_HOME/conf cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SCALA_HOME=/home/hadoop/cluster/scala-2.10.5 export JAVA_HOME=/home/hadoop/cluster/jdk1.7.0_79 export HADOOP_HOME=/home/hadoop/cluster/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #注意這個地方一定要指定為IP,否則下面的eclipse去連接的時候會報: #All masters are unresponsive! Giving up. 這個錯誤的。 SPARK_MASTER_IP=10.16.112.121 SPARK_LOCAL_DIRS=/home/hadoop/cluster/spark-1.4.0-bin-hadoop2.6 SPARK_DRIVER_MEMORY=1G
sbin/start-master.sh sbin/start-slave.sh
此時可以在瀏覽器中輸入:http://yourip:8080 查看Spark集群的情況
此時默認的 Spark-Master 為: spark://10.16.112.121:7077
首先下載 Scala-Eclipse IDE 去 scala 官網下載即可
打開IDE, 新建 Maven 項目, pom.xml 填寫如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>spark.test</groupId> <artifactId>FirstTrySpark</artifactId> <version>0.0.1-SNAPSHOT</version> <properties> <!-- 填寫對應版本 --> <hadoop.version>2.6.0</hadoop.version> <spark.version>1.4.0</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> <!-- 記得排除servlet依賴,否則會報沖突 --> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/java</sourceDirectory> <plugins> <!-- bind the maven-assembly-plugin to the package phase this will create a jar file without the storm dependencies suitable for deployment to a cluster. --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>2.10</scalaVersion> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.5.5</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> <resources> <resource> <directory>src/main/resources</directory> </resource> </resources> </build> </project>
新建幾個 Source Folder
src/main/java #編寫 java 代碼 src/main/scala #編寫 scala 代碼 src/main/resources #存放資源文件 src/test/java #編寫測試 java 代碼 src/test/scala #編寫測試 scala 代碼 src/test/resources #存放資源文件
此時環境全部搭建完畢!
測試代碼如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark.SparkContext /** * @author clebeg */ object FirstTry { def main(args: Array[String]): Unit = { val conf = new SparkConf conf.setMaster("spark://yourip:7077") conf.set("spark.app.name", "first-tryspark") val sc = new SparkContext(conf) val rawblocks = sc.textFile("hdfs://yourip:9000/user/hadoop/linkage") println(rawblocks.first) } }
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
分析問題:點開運行ID對應的運行日志發現下面的錯誤:
15/10/10 08:49:01 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/10/10 08:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/10 08:49:02 INFO spark.SecurityManager: Changing view acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: Changing modify acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, Administrator); users with modify permissions: Set(hadoop, Administrator) 15/10/10 08:49:02 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/10/10 08:49:02 INFO Remoting: Starting remoting 15/10/10 08:49:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.16.112.121:58708] 15/10/10 08:49:02 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58708. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more 15/10/10 08:51:02 INFO util.Utils: Shutdown hook called
仔細一看原來是權限的問題:立馬關閉 Hadoop, 在 etc/hadoop/core-site.xml 中添加:
<property> <name>hadoop.security.authorization</name> <value>false</value> </property>
設置任何人都可以讀取,問題立馬搞定。
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
到地址http://www.barik.net/archive/2015/01/19/172716/ 下載包含 winutils.exe 的 hadoop2.6 重新編譯的版本。注意一定要下載對應自己的Hadoop版本。
減壓縮到指定位置,設置 HADOOP_HOME 環境變量。注意一定要重新啟動 eclipse。 搞定!
本文中提到的數據在哪里獲取? http://bit.ly/1Aoywaq 操作代碼如下:
mkdir linkage cd linkage/ curl -o donation.zip http://bit.ly/1Aoywaq unzip donation.zip unzip "block_*.zip" hdfs dfs -mkdir /user/hadoop/linkage hdfs dfs -put block_*.csv /user/hadoop/linkage
“Spark Eclipse開發環境的搭建方法”的內容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業相關的知識可以關注億速云網站,小編將為大家輸出更多高質量的實用文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。