Re:磚S3A錯誤- java.lang.ClassNotFound…-磚- 33769

77796年 · ‎08-23-2022

我們得到以下錯誤在運行時10。x和11所示。當編寫通過saveAsNewAPIHadoopFile s3函數x。相同的工作是運行時上運行好9。倍和7.倍。前後9的區別。x和10。x是前者與火花hadoop 2.7綁定3.1而後者則hadoop 3.2 3.2綁定與火花。是磚運行時缺少一些罐子嗎?任何幫助都是感激。

. lang。RuntimeException: . lang。RuntimeException: . lang。類NotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found

org.apache.hadoop.conf.Configuration.getClass (Configuration.java: 2720)

org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory.getCommitterFactory (PathOutputCommitterFactory.java: 179)

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter (FileOutputFormat.java: 336)

org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter (HadoopMapReduceCommitProtocol.scala: 116)

org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob (HadoopMapReduceCommitProtocol.scala: 195)

org.apache.spark.internal.io.SparkHadoopWriter .write美元(SparkHadoopWriter.scala: 83)

在org.apache.spark.rdd.PairRDDFunctions。anonfun saveAsNewAPIHadoopDataset美元1美元(PairRDDFunctions.scala: 1078)

在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 165)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 125)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 112)

org.apache.spark.rdd.RDD.withScope (RDD.scala: 411)

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset (PairRDDFunctions.scala: 1076)

在org.apache.spark.rdd.PairRDDFunctions。anonfun saveAsNewAPIHadoopFile美元2美元(PairRDDFunctions.scala: 995)

在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 165)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 125)

org.apache.spark.rdd.RDDOperationScope .withScope美元(RDDOperationScope.scala: 112)

org.apache.spark.rdd.RDD.withScope (RDD.scala: 411)

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile (PairRDDFunctions.scala: 986)

匿名 · ‎08-23-2022

你為什麼使用saveAsNewAPI函數?

77796年 · ‎08-23-2022

我們有一些內部OutputFileFormatter大型機和fixedlength數據格式支持我們的數據集成和數據質量工具。我們一直在使用他們遺留原因,工作到9。x運行時版本。

77796年 · ‎08-27-2022

我們可以複製上麵的錯誤在運行時10。x和11所示。x在筆記本中使用以下代碼。

進口org.apache.hadoop.io.IntWritable

進口org.apache.hadoop.io.Text

進口org.apache.hadoop.mapreduce.lib.output.TextOutputFormat

進口org.apache.spark.rdd.PairRDDFunctions

val l =列表((10 a), (20,“b”), (30,“c”),(40歲,“d”))

val抽樣= sc.parallelize(左)

val rddWritable =抽樣。地圖(x = >(新IntWritable (x._1),新的文本(x._2)))

val pairRDD = new PairRDDFunctions (rddWritable)

pairRDD.saveAsNewAPIHadoopFile (“s3a: / /桶/ testout.dat”,

名為[IntWritable], classOf

名為[文本],classOf

名為[classOf TextOutputFormat [IntWritable,文本]],

spark.sparkContext.hadoopConfiguration)

77796年 · ‎08-28-2022

我們已經解決了這個問題通過使用s3計劃而不是s3a即pairRDD.saveAsNewAPIHadoopFile (“s3: / /桶/ testout.dat”,

磚

磚S3A錯誤——. lang。類NotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found