. lang。OutOfMemoryError: GC開銷限製超過在coutn行動在一個文件中。
文件是217 gb zise CSV文件
我使用10 r3.8xlarge (ubuntu)機器鼎暉5.3.6和1.2.0火花
configutation:
spark.app.id:當地- 1443956477103
spark.app.name:火花殼
spark.cores.max: 100
spark.driver.cores: 24
spark.driver.extraLibraryPath: / opt / cloudera /包裹/鼎暉5.3.6 - 1. cdh5.3.6.p0.11 / lib / hadoop / lib /本地spark.driver.host: ip - 172 - 31 - 34 - 242. - 2. -西方- compute.internal
spark.driver.maxResultSize: 300克
spark.driver.port: 55123
spark.eventLog.dir: hdfs: / / ip - 172 - 31 - 34 - 242. - 2. -西方- compute.internal: 8020 / user /火花/ applicationHistory spark.eventLog.enabled:真的
spark.executor.extraLibraryPath: / opt / cloudera /包裹/鼎暉5.3.6 - 1. cdh5.3.6.p0.11 / lib / hadoop / lib /本地
spark.executor。id:司機spark.executor.memory: 200 g
spark.fileserver.uri: http://172.31.34.242:51424
火花。jar: spark.master:當地[*]
spark.repl.class.uri: http://172.31.34.242:58244
spark.scheduler.mode:先進先出
spark.serializer: org.apache.spark.serializer.KryoSerializer
spark.storage.memoryFraction: 0.9
spark.tachyonStore.folderName:火花- 88 - bd9c44 d626 - 4 - ad2 - 8 df3 f89df4cb30de
spark.yarn.historyServer.address: http://ip - 172 - 31 - 34 - 242. - 2. -西方- compute.internal: 18088
這就是我跑:
val testrdd = sc.textFile (“