我們得到以下錯誤在運行時10。x和11所示。當編寫通過saveAsNewAPIHadoopFile s3函數x。相同的工作是運行時上運行好9。倍和7.倍。前後9的區別。x和10。x是前者與火花hadoop 2.7綁定3.1而後者則hadoop 3.2 3.2綁定與火花。是磚運行時缺少一些罐子嗎?任何幫助都是感激。
. lang。RuntimeException: . lang。RuntimeException: . lang。類NotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found
org.apache.hadoop.conf.Configuration.getClass (Configuration.java: 2720)
org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory.getCommitterFactory (PathOutputCommitterFactory.java: 179)
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter (FileOutputFormat.java: 336)
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter (HadoopMapReduceCommitProtocol.scala: 116)
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob (HadoopMapReduceCommitProtocol.scala: 195)
org.apache.spark.internal.io.SparkHadoopWriter .write美元(SparkHadoopWriter.scala: 83)
在org.apache.spark.rdd.PairRDDFunctions。anonfun saveAsNewAPIHadoopDataset美元1美元(PairRDDFunctions.scala: 1078)
在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 165)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 125)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 112)
org.apache.spark.rdd.RDD.withScope (RDD.scala: 411)
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset (PairRDDFunctions.scala: 1076)
在org.apache.spark.rdd.PairRDDFunctions。anonfun saveAsNewAPIHadoopFile美元2美元(PairRDDFunctions.scala: 995)
在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 165)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 125)
org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 112)
org.apache.spark.rdd.RDD.withScope (RDD.scala: 411)
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile (PairRDDFunctions.scala: 986)
我們可以複製上麵的錯誤在運行時10。x和11所示。x在筆記本中使用以下代碼。
進口org.apache.hadoop.io.IntWritable
進口org.apache.hadoop.io.Text
進口org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
進口org.apache.spark.rdd.PairRDDFunctions
val l =列表((10 a), (20,“b”), (30,“c”),(40歲,“d”))
val抽樣= sc.parallelize(左)
val rddWritable =抽樣。地圖(x = >(新IntWritable (x._1),新的文本(x._2)))
val pairRDD = new PairRDDFunctions (rddWritable)
pairRDD.saveAsNewAPIHadoopFile (“s3a: / /桶/ testout.dat”,
名為[IntWritable], classOf
名為[文本],classOf
名為[classOf TextOutputFormat [IntWritable,文本]],
spark.sparkContext.hadoopConfiguration)