插入數據到CDF-enabled三角洲表扔……-磚- 4840

約翰尼 · ‎05-03-2023

我建立一個青銅表與CDF-enables這些步驟:

最初,從著陸區讀取json文件和寫入表的位置

df = spark.readStream.format (cloudFiles) \ .option (“cloudFiles。schemaLocation”, < schema_loc >) \ .option (“cloudFiles。格式”、“json”) \ .option (“cloudFiles。在ferColumnTypes", "true") \ .option("cloudFiles.schemaEvolutionMode", "addNewColumns") \ .option("cloudFiles.includeExistingFiles", "true") \ .load() df.writeStream \ .format("delta") \ .trigger(once = True) \ .option("mergeSchema", "true") \ .option("checkpointLocation", )

創建一個增量表並使CDF

創建表銅。mytable使用三角洲位置“<文件位置>”;ALTER TABLE銅牌。mytable TBLPROPERTIES(δ。enableChangeDataFeed = true);

閱讀更多數據從著陸區完全相同的結構和插入銅表

df = spark.readStream.format (cloudFiles) \ .option (“cloudFiles。schemaLocation”, schema_loc) \ .option (“cloudFiles。格式”、“json”) \ .option (“cloudFiles。在ferColumnTypes", "true") \ .option("cloudFiles.schemaEvolutionMode", "addNewColumns") \ .option("cloudFiles.includeExistingFiles", "true") \ .load() df.createOrReplaceTempView("bronze_company_info_dataset") sql_query = "INSERT INTO bronze.mytable TABLE bronze_dataset" spark.sql(sql_query)

它拋出. lang。StackOverflowError當sql_query執行:

/磚/火花/ python / pyspark / instrumentation_utils。py在包裝器(* args, * * kwargs) 46開始= time.perf_counter() 47個試題:- - - - - - > 48 res = func (* args, * * kwargs) 49記錄器。function_name log_success (50 module_name class_name, time.perf_counter()——開始,簽名/磚/火花/ python / pyspark / sql /會話。py在sql(自我,sqlQuery, * * kwargs) 1117 sqlQuery =格式化程序。形式at(sqlQuery, **kwargs) 1118 try: -> 1119 return DataFrame(self._jsparkSession.sql(sqlQuery), self) 1120 finally: 1121 if len(kwargs) > 0: /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args) 1319 1320 answer = self.gateway_client.send_command(command) -> 1321 return_value = get_return_value( 1322 answer, self.gateway_client, self.target_id, self.name)

我使用Community Edition,運行時版本11.3 LTS(包括Apache火花3.3.0,Scala 2.12)

Priyag1 · ‎05-03-2023

一個運行時錯誤,檢查是否有任何遞歸調用,如果一切正常運行又新鮮

約翰尼 · ‎05-04-2023

我試著用一個簡單的csv文件,隻有一個列。我得到了同樣的錯誤。

磚

插入數據表扔java.lang.StackOverflowError CDF-enabled三角洲