通过submit命令启动后
${SPARK_HOME}/bin/spark-submit --master yarn-cluster --class com.bigdata.WordCount --executor-memory 2G \
--num-executors 4 ${SPARK_HOME}/wordcount-1.0-SNAPSHOT.jar hdfs://spark-master:9000 /temp/inputdir /temp/outputdir
实际上启动的是org.apache.spark.deploy.SparkSubmit类,
在prepareSubmitEnviroment()这个函数中,有一段取得YARNCLUSTER启动类的代码,
private[deploy] def prepareSubmitEnvironment(args: SparkSubmitArguments,conf: Option[HadoopConfiguration] = None): (Seq[String], Seq[String], SparkConf, String) = {// In yarn-cluster mode, use yarn.Client as a wrapper around the user classif (isYarnCluster) {childMainClass = YARN_CLUSTER_SUBMIT_CLASS}}object SparkSubmit extends CommandLineUtils with Logging {private val CLASS_NOT_FOUND_EXIT_STATUS = 101// Following constants are visible for testing.private[deploy] val YARN_CLUSTER_SUBMIT_CLASS ="org.apache.spark.deploy.yarn.YarnClusterApplication"private[deploy] val REST_CLUSTER_SUBMIT_CLASS = classOf[RestSubmissionClientApp].getName()private[deploy] val STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()private[deploy] val KUBERNETES_CLUSTER_SUBMIT_CLASS ="org.apache.spark.deploy.k8s.submit.KubernetesClientApplication"
}
从中可以看出,会启动org.apache.spark.deploy.yarn.YarnClusterApplication这个类,而这个类在org.apache.spark.deploy.yarn.Client中,Client会通过向YARN提交应用程序,YARN会选择其中一个NodeManager启动一个container,container中再用命令行的方式启动or.apache.spark.delploy.yarn.ApplicationMaster,ApplicationMaster最终会直接运行你的应用程序。