衝突的目錄結構錯誤

您應該使用不同的路徑的存儲位置,否則矛盾的目錄結構可能會導致一個錯誤。

寫的阿施施

去年發表在:2022年5月19日

問題

你有一個Apache火花工作與Java斷言失敗錯誤. lang。AssertionError:斷言失敗:檢測到衝突的目錄結構。

例子堆棧跟蹤

引起的:org.apache.spark.sql.streaming。流媒體QueryException: There was an error when trying to infer the partition schema of the current batch of files. Please provide your partition columns explicitly by using: .option('cloudFiles.partitionColumns', 'comma-separated-list') === Streaming Query === Identifier: [id = aabc5549-cb4b-4e4e-9403-4e793f4824a0, runId = 4e743dda-909f-4932-9489-3dd0b364d811] Current Committed Offsets: {} Current Available Offsets: {CloudFilesSource[://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt]: {'seqNum':423,'sourceVersion':1}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: CloudFilesSource[://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:385) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:268) Caused by: java.lang.RuntimeException: There was an error when trying to infer the partition schema of the current batch of files. Please provide your partition columns explicitly by using: .option('cloudFiles.partitionColumns', 'comma-separated-list') at com.databricks.sql.fileNotification.autoIngest.CloudFilesErrors$.partitionInferenceError(CloudFilesErrors.scala:115) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.liftedTree1$1(CloudFilesSourceFileIndex.scala:65) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.partitionSpec(CloudFilesSourceFileIndex.scala:63) at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSource.getBatch(CloudFilesSource.scala:361) ... 1 more Caused by: java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths: ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt If provided paths are partition directories, please set 'basePath' in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them. at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:204) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parseP

導致

你有衝突的目錄路徑的存儲位置。

在堆棧跟蹤的例子中,我們看到兩個相互矛盾的目錄路徑。

  • <文件係統>:/ /domain.com/km/gold/cfy_gold/clfy_x_clfy_evt
  • <文件係統>:/ /domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt

因為這些目錄出現在相同層次結構,更新根或一個分支水平會導致衝突。

解決方案

避免分層目錄結構中的多個並發更新或更新發生在同一分區內。

你應該讓多個不同的路徑更新一次衝突檢測。或者,您可以添加更多的分區。

這些示例目錄並不衝突。

  • <文件係統>:/ /domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt1
  • <文件係統>:/ /domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt2
這篇文章有用嗎?