版本的代碼:
磚:7.3 LTS毫升(包括Apache火花3.0.1,Scala 2.12)
AWS EMR: 6.1.0(火花3.0.0,Scala 2.12)
https://docs.aws.amazon.com/emr/latest/releaseguide/emr - 610 release.html
存在的問題:
錯誤在磚複製工作,在AWS EMR工作
描述和設置:
我們有火花的工作本質上運行
“‘ALSModel。recommendForAllUsers (recommendations_ct) ' ' '
功能和它在AWS EMR AWS S3寫道。
我們正在試圖遷移這個磚的環境。我們已經複製相同的集群配置,並引發配置值和python代碼是相同的。
可以執行的配置EMR磚和失敗:
6 r5.8xlarge工人(256 gb, 32芯)
1 r5.2xlarge司機(64 gb, 8芯)
火花配置值:
' ' '
火花。序列化器org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 2000
spark.driver。memoryOverhead 4096
spark.executor。核心5
spark.executor。內存35克
spark.driver。核心5
spark.executor。memoryOverhead 4096
spark.sql.shuffle。350年分區
spark.broadcast。blockSize 12米
spark.executor。實例35
spark.driver。內存35克
spark.default.parallelism 350
fs.s3a。server-side-encryption-algorithm SSE-KMS
spark.hadoop.fs.s3a.stsAssumeRole。在攻擊攻擊:aws:我::* * *屏蔽* * *:/ databricks-s3-egress角色
spark.hadoop.fs.s3a.acl.default BucketOwnerFullControl
spark.hadoop.fs.s3a。credentialsType AssumeRole
' ' '
我們觀察的誤差:
一致的錯誤失去了執行人由於JVM伯父和其他問題。這是什麼奇怪的EMR的範圍之內運行。
開始取1時錯誤RetryingBlockFetcher:異常突出
. io .IOException:連接失敗/ * * *屏蔽* * *
org.apache.spark.network.client.TransportClientFactory.createClient (TransportClientFactory.java: 253)
org.apache.spark.network.client.TransportClientFactory.createClient (TransportClientFactory.java: 195)
在另一次2.美元美元org.apache.spark.network.netty.NettyBlockTransferService createandstart (NettyBlockTransferService.scala: 122)
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding (RetryingBlockFetcher.java: 141)
org.apache.spark.network.shuffle.RetryingBlockFetcher.start (RetryingBlockFetcher.java: 121)
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks (NettyBlockTransferService.scala: 143)
org.apache.spark.network.BlockTransferService.fetchBlockSync (BlockTransferService.scala: 103)
org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer (BlockManager.scala: 1011)
在org.apache.spark.storage.BlockManager。anonfun getRemoteBlock美元8美元(BlockManager.scala: 955)
scala.Option.orElse (Option.scala: 447)
org.apache.spark.storage.BlockManager.getRemoteBlock (BlockManager.scala: 955)
org.apache.spark.storage.BlockManager.getRemoteBytes (BlockManager.scala: 1093)
在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBlocks美元1美元(TorrentBroadcast.scala: 195)
在scala.runtime.java8.JFunction1 mcVI sp.apply美元(JFunction1 mcVI sp.java美元:23)
scala.collection.immutable.List.foreach (List.scala: 392)
org.apache.spark.broadcast.TorrentBroadcast.readBlocks (TorrentBroadcast.scala: 184)
在org.apache.spark.broadcast.TorrentBroadcast。美元anonfun readBroadcastBlock 4美元(TorrentBroadcast.scala: 268)
scala.Option.getOrElse (Option.scala: 189)
在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBroadcastBlock美元2美元(TorrentBroadcast.scala: 246)
org.apache.spark.util.KeyLock.withLock (KeyLock.scala: 64)
在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBroadcastBlock美元1美元(TorrentBroadcast.scala: 241)
org.apache.spark.util.Utils .tryOrIOException美元(Utils.scala: 1558)
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock (TorrentBroadcast.scala: 241)
org.apache.spark.broadcast.TorrentBroadcast.getValue (TorrentBroadcast.scala: 118)
org.apache.spark.broadcast.Broadcast.value (Broadcast.scala: 78)
在org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat。anonfun buildReaderWithPartitionValues美元1美元(ParquetFileFormat.scala: 309)
在org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元立刻2.美元美元getnext (FileScanRDD.scala: 291)
org.apache.spark.util.NextIterator.hasNext (NextIterator.scala: 73)
在org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元。美元anonfun prepareNextFile 1美元(FileScanRDD.scala: 499)
在scala.concurrent.Future。美元anonfun應用1美元(Future.scala: 659)
scala.util.Success。anonfun地圖1美元美元(Try.scala: 255)
scala.util.Success.map (Try.scala: 213)
scala.concurrent.Future。anonfun地圖1美元美元(Future.scala: 292)
在scala.concurrent.impl.Promise.liftedTree1 1美元(Promise.scala: 33)
scala.concurrent.impl.Promise。anonfun轉換美元1美元(Promise.scala: 33)
scala.concurrent.impl.CallbackRunnable.run (Promise.scala: 64)
在org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable。anonfun運行$ 1美元(SparkThreadLocalForwardingThreadPoolExecutor.scala: 104)
在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)
org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured (SparkThreadLocalForwardingThreadPoolExecutor.scala: 68)
在org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured $ (SparkThreadLocalForwardingThreadPoolExecutor.scala: 54)
org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured (SparkThreadLocalForwardingThreadPoolExecutor.scala: 101)
org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run (SparkThreadLocalForwardingThreadPoolExecutor.scala: 104)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java: 1149)
java.util.concurrent.ThreadPoolExecutor Worker.run美元(ThreadPoolExecutor.java: 624)
java.lang.Thread.run (Thread.java: 748)
引起的:io.netty.channel.AbstractChannel AnnotatedConnectException美元:連接拒絕:/ 10.203.234.49:34347
引起的:java.net.ConnectException:連接拒絕了
在sun.nio.ch.SocketChannelImpl。checkConnect(本地方法)
sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java: 716)
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect (NioSocketChannel.java: 330)
io.netty.channel.nio.AbstractNioChannel AbstractNioUnsafe.finishConnect美元(AbstractNioChannel.java: 334)
io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java: 702)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java: 650)
io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java: 576)
io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java: 493)
在io.netty.util.concurrent.SingleThreadEventExecutor 4.美元運行(SingleThreadEventExecutor.java: 989)
在io.netty.util.internal.ThreadExecutorMap 2.美元運行(ThreadExecutorMap.java: 74)
io.netty.util.concurrent.FastThreadLocalRunnable.run (FastThreadLocalRunnable.java: 30)
java.lang.Thread.run (Thread.java: 748)
/思想的任何幫助將不勝感激。