火花3.3.1支持brotli壓縮編解碼器,但是當我用它來讀鋪文件從S3,得到:
INVALID_ARGUMENT:不支持的編解碼器鋪頁麵:BROTLI
示例代碼:
df = (spark.read.format(“鋪”).option .load(“壓縮”、“brotli”) (“s3: / / <桶> / <路徑> / <文件> .parquet”) df.write.saveAsTable (“tmp_test”)
我有大量數據存儲壓縮,所以現在切換是困難的。它看起來像考拉支持手動或者我可以攝取它通過旋轉自己的火花,但這將失敗的點磚/三角洲湖/自動裝卸機。工作有什麼建議嗎?
編輯:
更多的輸出:
引起的:. lang。RuntimeException: INVALID_ARGUMENT:不支持的編解碼器鋪頁麵:BROTLI com.databricks.sql.io.caching.NativePageWriter美元。創建(本機方法)在com.databricks.sql.io.caching.DiskCache PageWriter美元。< init > (DiskCache.scala: 318)美元com.databricks.sql.io.parquet.CachingPageReadStore UnifiedCacheColumn.populate (CachingPageReadStore.java: 1183) com.databricks.sql.io.parquet.CachingPageReadStore UnifiedCacheColumn.lambda getPageReader美元0 (com.databricks.sql.io.caching.NativeDiskCache CachingPageReadStore.java: 1177)美元。(本機方法)在com.databricks.sql.io.caching.DiskCache.get (DiskCache.scala: 515) com.databricks.sql.io.parquet.CachingPageReadStore UnifiedCacheColumn.getPageReader美元(CachingPageReadStore.java: 1178) com.databricks.sql.io.parquet.CachingPageReadStore.getPageReader (CachingPageReadStore.java: 1012) com.databricks.sql.io.parquet.DatabricksVectorizedParquetRecordReader.checkEndOfRowGroup (DatabricksVectorizedParquetRecordReader.java: 741) com.databricks.sql.io.parquet.DatabricksVectorizedParquetRecordReader.nextBatch (DatabricksVectorizedParquetRecordReader.java: 603)