取消
顯示的結果
而不是尋找
你的意思是:

重大性能問題/退化到EMR磚在遷移的工作

643926年
新的貢獻者二世

版本的代碼:

磚:7.3 LTS毫升(包括Apache火花3.0.1,Scala 2.12)

AWS EMR: 6.1.0(火花3.0.0,Scala 2.12)

https://docs.aws.amazon.com/emr/latest/releaseguide/emr - 610 release.html

存在的問題:

錯誤在磚複製工作,在AWS EMR工作

描述和設置:

我們有火花的工作本質上運行

“‘ALSModel。recommendForAllUsers (recommendations_ct) ' ' '

功能和它在AWS EMR AWS S3寫道。

我們正在試圖遷移這個磚的環境。我們已經複製相同的集群配置,並引發配置值和python代碼是相同的。

可以執行的配置EMR磚和失敗:

6 r5.8xlarge工人(256 gb, 32芯)

1 r5.2xlarge司機(64 gb, 8芯)

火花配置值:

' ' '

火花。序列化器org.apache.spark.serializer.KryoSerializer

spark.kryoserializer.buffer.max 2000

spark.driver。memoryOverhead 4096

spark.executor。核心5

spark.executor。內存35克

spark.driver。核心5

spark.executor。memoryOverhead 4096

spark.sql.shuffle。350年分區

spark.broadcast。blockSize 12米

spark.executor。實例35

spark.driver。內存35克

spark.default.parallelism 350

fs.s3a。server-side-encryption-algorithm SSE-KMS

spark.hadoop.fs.s3a.stsAssumeRole。在攻擊攻擊:aws:我::* * *屏蔽* * *:/ databricks-s3-egress角色

spark.hadoop.fs.s3a.acl.default BucketOwnerFullControl

spark.hadoop.fs.s3a。credentialsType AssumeRole

' ' '

我們觀察的誤差:

一致的錯誤失去了執行人由於JVM伯父和其他問題。這是什麼奇怪的EMR的範圍之內運行。

開始取1時錯誤RetryingBlockFetcher:異常突出

. io .IOException:連接失敗/ * * *屏蔽* * *

org.apache.spark.network.client.TransportClientFactory.createClient (TransportClientFactory.java: 253)

org.apache.spark.network.client.TransportClientFactory.createClient (TransportClientFactory.java: 195)

在另一次2.美元美元org.apache.spark.network.netty.NettyBlockTransferService createandstart (NettyBlockTransferService.scala: 122)

org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding (RetryingBlockFetcher.java: 141)

org.apache.spark.network.shuffle.RetryingBlockFetcher.start (RetryingBlockFetcher.java: 121)

org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks (NettyBlockTransferService.scala: 143)

org.apache.spark.network.BlockTransferService.fetchBlockSync (BlockTransferService.scala: 103)

org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer (BlockManager.scala: 1011)

在org.apache.spark.storage.BlockManager。anonfun getRemoteBlock美元8美元(BlockManager.scala: 955)

scala.Option.orElse (Option.scala: 447)

org.apache.spark.storage.BlockManager.getRemoteBlock (BlockManager.scala: 955)

org.apache.spark.storage.BlockManager.getRemoteBytes (BlockManager.scala: 1093)

在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBlocks美元1美元(TorrentBroadcast.scala: 195)

在scala.runtime.java8.JFunction1 mcVI sp.apply美元(JFunction1 mcVI sp.java美元:23)

scala.collection.immutable.List.foreach (List.scala: 392)

org.apache.spark.broadcast.TorrentBroadcast.readBlocks (TorrentBroadcast.scala: 184)

在org.apache.spark.broadcast.TorrentBroadcast。美元anonfun readBroadcastBlock 4美元(TorrentBroadcast.scala: 268)

scala.Option.getOrElse (Option.scala: 189)

在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBroadcastBlock美元2美元(TorrentBroadcast.scala: 246)

org.apache.spark.util.KeyLock.withLock (KeyLock.scala: 64)

在org.apache.spark.broadcast.TorrentBroadcast。anonfun readBroadcastBlock美元1美元(TorrentBroadcast.scala: 241)

org.apache.spark.util.Utils .tryOrIOException美元(Utils.scala: 1558)

org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock (TorrentBroadcast.scala: 241)

org.apache.spark.broadcast.TorrentBroadcast.getValue (TorrentBroadcast.scala: 118)

org.apache.spark.broadcast.Broadcast.value (Broadcast.scala: 78)

在org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat。anonfun buildReaderWithPartitionValues美元1美元(ParquetFileFormat.scala: 309)

在org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元立刻2.美元美元getnext (FileScanRDD.scala: 291)

org.apache.spark.util.NextIterator.hasNext (NextIterator.scala: 73)

在org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元。美元anonfun prepareNextFile 1美元(FileScanRDD.scala: 499)

在scala.concurrent.Future。美元anonfun應用1美元(Future.scala: 659)

scala.util.Success。anonfun地圖1美元美元(Try.scala: 255)

scala.util.Success.map (Try.scala: 213)

scala.concurrent.Future。anonfun地圖1美元美元(Future.scala: 292)

在scala.concurrent.impl.Promise.liftedTree1 1美元(Promise.scala: 33)

scala.concurrent.impl.Promise。anonfun轉換美元1美元(Promise.scala: 33)

scala.concurrent.impl.CallbackRunnable.run (Promise.scala: 64)

在org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable。anonfun運行$ 1美元(SparkThreadLocalForwardingThreadPoolExecutor.scala: 104)

在scala.runtime.java8.JFunction0專門sp.apply美元(美元JFunction0 mcV $ sp.java: 23)

org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured (SparkThreadLocalForwardingThreadPoolExecutor.scala: 68)

在org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured $ (SparkThreadLocalForwardingThreadPoolExecutor.scala: 54)

org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured (SparkThreadLocalForwardingThreadPoolExecutor.scala: 101)

org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run (SparkThreadLocalForwardingThreadPoolExecutor.scala: 104)

java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java: 1149)

java.util.concurrent.ThreadPoolExecutor Worker.run美元(ThreadPoolExecutor.java: 624)

java.lang.Thread.run (Thread.java: 748)

引起的:io.netty.channel.AbstractChannel AnnotatedConnectException美元:連接拒絕:/ 10.203.234.49:34347

引起的:java.net.ConnectException:連接拒絕了

在sun.nio.ch.SocketChannelImpl。checkConnect(本地方法)

sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java: 716)

io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect (NioSocketChannel.java: 330)

io.netty.channel.nio.AbstractNioChannel AbstractNioUnsafe.finishConnect美元(AbstractNioChannel.java: 334)

io.netty.channel.nio.NioEventLoop.processSelectedKey (NioEventLoop.java: 702)

io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (NioEventLoop.java: 650)

io.netty.channel.nio.NioEventLoop.processSelectedKeys (NioEventLoop.java: 576)

io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java: 493)

在io.netty.util.concurrent.SingleThreadEventExecutor 4.美元運行(SingleThreadEventExecutor.java: 989)

在io.netty.util.internal.ThreadExecutorMap 2.美元運行(ThreadExecutorMap.java: 74)

io.netty.util.concurrent.FastThreadLocalRunnable.run (FastThreadLocalRunnable.java: 30)

java.lang.Thread.run (Thread.java: 748)

/思想的任何幫助將不勝感激。

0回答0
歡迎來到磚社區:讓學習、網絡和一起慶祝

加入我們的快速增長的數據專業人員和專家的80 k +社區成員,準備發現,幫助和合作而做出有意義的聯係。

點擊在這裏注冊今天,加入!

參與令人興奮的技術討論,加入一個組與你的同事和滿足我們的成員。

Baidu
map