數據加載和複製

複製SQL命令允許您的數據文件位置加載到三角洲表。這是一個re-triable和冪等操作;文件已經被加載的源位置跳過。

請注意

更具有可伸縮性和健壯的文件攝取經驗,磚建議SQL用戶利用流表。

需求

一個帳戶管理必須遵循的步驟_配置訪問雲中的數據對象存儲在用戶可以使用加載數據複製

源格式支持

支持源格式複製包括CSV、JSON、Avro獸人,拚花,文本和二進製文件。源可以在任何地方,你的磚工作空間的訪問權。

例如:數據加載到一個無模式三角洲湖表

請注意

這個特性可以在磚運行時11.0及以上。

您可以創建空的占位符三角洲表模式後推斷出在一個複製命令:

創建如果存在my_table(評論<- - - - - -描述>](TBLPROPERTIES(<- - - - - -屬性>));複製my_table“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);

上麵的SQL語句是冪等的,可以調度運行攝取數據隻有一次到三角洲表。

請注意

空三角洲表之外不是可用的複製插入合並不支持將數據寫入無模式三角洲表。在數據插入到表中複製,表就可查詢。

看到創建複製到目標表

例如:設置模式和數據加載到一個三角洲湖表

下麵的例子顯示了如何創建一個增量表,然後使用複製SQL命令加載示例數據磚的數據集到桌子上。您可以運行Python的例子中,R, Scala中,或從一個SQL代碼筆記本附加到一個磚集群。您還可以運行的SQL代碼查詢關聯到一個SQL倉庫磚的SQL

table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'source_format=“鋪”火花sql(如果存在刪除表”+table_name)火花sql(“CREATE TABLE”+table_name+”(“\“loan_id BIGINT。”+\“funded_amnt INT。”+\“paid_amnt加倍,”+\“addr_state字符串)”)火花sql(“複製到”+table_name+\“從”+source_data+“”+\" FILEFORMAT = "+source_format)loan_risks_upload_data=火花sql(“SELECT * FROM”+table_name)顯示(loan_risks_upload_data)“‘結果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +“‘
圖書館(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”source_format=“鋪”sql(粘貼(如果存在刪除表”,table_name,9月=”“))sql(粘貼(“CREATE TABLE”,table_name,”(“,“loan_id BIGINT。”,“funded_amnt INT。”,“paid_amnt加倍,”,“addr_state字符串)”,9月=”“))sql(粘貼(“複製到”,table_name,“從”,source_data,“”," FILEFORMAT = ",source_format,9月=”“))loan_risks_upload_data=tableToDF(table_name)顯示(loan_risks_upload_data)結果:# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | loan_id | funded_amnt | paid_amnt | addr_state |# + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +# | 0 | 1000 | 182.22 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 1 | 1000 | 361.19 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 2 | 1000 | 176.26 | TX |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +#……
瓦爾table_name=“default.loan_risks_upload”瓦爾source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”瓦爾source_format=“鋪”火花sql(如果存在刪除表”+table_name)火花sql(“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”)火花sql(“複製到”+table_name+“從”+source_data+“”+" FILEFORMAT = "+source_format)瓦爾loan_risks_upload_data=火花(table_name)顯示(loan_risks_upload_data)/ *結果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +* /
下降如果存在默認的loan_risks_upload;創建默認的loan_risks_upload(loan_id長整型數字,funded_amntINT,paid_amnt,addr_state字符串);複製默認的loan_risks_upload/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'FILEFORMAT=拚花;選擇*默認的loan_risks_upload;——結果:- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| loan_id | funded_amnt | paid_amnt | addr_state |- + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +——| 0 | 1000 | 182.22 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 1 | 1000 | 361.19 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 2 | 1000 | 176.26 | TX |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——……

清理,運行以下代碼,刪除表:

火花sql(“刪除表”+table_name)
sql(粘貼(“刪除表”,table_name,9月=”“))
火花sql(“刪除表”+table_name)
下降默認的loan_risks_upload

參考

額外的資源