數據加載和複製
的複製成
SQL命令允許您的數據文件位置加載到三角洲表。這是一個re-triable和冪等操作;文件已經被加載的源位置跳過。
請注意
更具有可伸縮性和健壯的文件攝取經驗,磚建議SQL用戶利用流表。
例如:數據加載到一個無模式三角洲湖表
請注意
這個特性可以在磚運行時11.0及以上。
您可以創建空的占位符三角洲表模式後推斷出在一個複製成
命令:
創建表如果不存在my_table(評論<表- - - - - -描述>](TBLPROPERTIES(<表- - - - - -屬性>));複製成my_table從“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);
上麵的SQL語句是冪等的,可以調度運行攝取數據隻有一次到三角洲表。
請注意
空三角洲表之外不是可用的複製成
。插入成
和合並成
不支持將數據寫入無模式三角洲表。在數據插入到表中複製成
,表就可查詢。
看到創建複製到目標表。
例如:設置模式和數據加載到一個三角洲湖表
下麵的例子顯示了如何創建一個增量表,然後使用複製成
SQL命令加載示例數據磚的數據集到桌子上。您可以運行Python的例子中,R, Scala中,或從一個SQL代碼筆記本附加到一個磚集群。您還可以運行的SQL代碼查詢關聯到一個SQL倉庫在磚的SQL。
table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'source_format=“鋪”火花。sql(如果存在刪除表”+table_name)火花。sql(“CREATE TABLE”+table_name+”(“\“loan_id BIGINT。”+\“funded_amnt INT。”+\“paid_amnt加倍,”+\“addr_state字符串)”)火花。sql(“複製到”+table_name+\“從”+source_data+“”+\" FILEFORMAT = "+source_format)loan_risks_upload_data=火花。sql(“SELECT * FROM”+table_name)顯示(loan_risks_upload_data)“‘結果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +…“‘
圖書館(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”source_format=“鋪”sql(粘貼(如果存在刪除表”,table_name,9月=”“))sql(粘貼(“CREATE TABLE”,table_name,”(“,“loan_id BIGINT。”,“funded_amnt INT。”,“paid_amnt加倍,”,“addr_state字符串)”,9月=”“))sql(粘貼(“複製到”,table_name,“從”,source_data,“”," FILEFORMAT = ",source_format,9月=”“))loan_risks_upload_data=tableToDF(table_name)顯示(loan_risks_upload_data)結果:# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | loan_id | funded_amnt | paid_amnt | addr_state |# + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +# | 0 | 1000 | 182.22 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 1 | 1000 | 361.19 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 2 | 1000 | 176.26 | TX |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +#……
瓦爾table_name=“default.loan_risks_upload”瓦爾source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”瓦爾source_format=“鋪”火花。sql(如果存在刪除表”+table_name)火花。sql(“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”)火花。sql(“複製到”+table_name+“從”+source_data+“”+" FILEFORMAT = "+source_format)瓦爾loan_risks_upload_data=火花。表(table_name)顯示(loan_risks_upload_data)/ *結果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +…* /
下降表如果存在默認的。loan_risks_upload;創建表默認的。loan_risks_upload(loan_id長整型數字,funded_amntINT,paid_amnt雙,addr_state字符串);複製成默認的。loan_risks_upload從/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'FILEFORMAT=拚花;選擇*從默認的。loan_risks_upload;——結果:- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| loan_id | funded_amnt | paid_amnt | addr_state |- + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +——| 0 | 1000 | 182.22 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 1 | 1000 | 361.19 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 2 | 1000 | 176.26 | TX |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——……
清理,運行以下代碼,刪除表:
火花。sql(“刪除表”+table_name)
sql(粘貼(“刪除表”,table_name,9月=”“))
火花。sql(“刪除表”+table_name)
下降表默認的。loan_risks_upload
額外的資源
_
常見的使用模式,包括多個的例子
複製成
對相同的三角洲表操作,明白了常見的數據加載模式使用副本。