用COPY INTO加載數據
的複製成
SQL命令允許您將數據從文件位置加載到Delta表中。這是一個可重試的冪等運算;源位置中已經加載的文件將被跳過。
COPY INTO以多種方式支持安全訪問,包括使用臨時憑據用COPY INTO加載數據.
三角洲湖的空桌子
請注意
該特性在Databricks Runtime 11.0及以上版本中可用。
可以創建空的占位符Delta表,以便稍後在執行過程中推斷模式複製成
命令:
創建表格如果不存在my_table[評論<table_description>][TBLPROPERTIES(<table_properties>));複製成my_table從“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);
上麵的SQL語句是冪等的,可以計劃運行以精確地將數據攝取到Delta表中——一次。
請注意
的外部不能使用空的Delta表複製成
.插入成
而且合並成
不支持將數據寫入無模式的Delta表。將數據插入到表中複製成
,表就可以查詢了。
例子
有關常用的使用模式,請參見COPY INTO的常見數據加載模式
下麵的示例演示如何創建Delta表,然後使用複製成
裝入樣本數據的SQL命令樣本數據集到桌子上。您可以運行示例Python、R、Scala或SQL代碼筆記本附在數據庫上集群.您還可以從查詢關聯到一個SQL倉庫在磚的SQL.
table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'source_format=“鋪”火花.sql("刪除表如果存在"+table_name)火花.sql(“CREATE TABLE”+table_name+”(“\“loan_id BIGINT。”+\“funded_amnt INT。”+\“paid_amnt加倍,”+\“addr_state字符串)”)火花.sql(“複製到”+table_name+\“從”+source_data+“”+\" Fileformat = "+source_format)loan_risks_upload_data=火花.sql(" select * from "+table_name)顯示(loan_risks_upload_data)“‘結果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...“‘
圖書館(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”source_format=“鋪”sql(粘貼("刪除表如果存在",table_name,9月=""))sql(粘貼(“CREATE TABLE”,table_name,”(“,“loan_id BIGINT。”,“funded_amnt INT。”,“paid_amnt加倍,”,“addr_state字符串)”,9月=""))sql(粘貼(“複製到”,table_name,“從”,source_data,“”," Fileformat = ",source_format,9月=""))loan_risks_upload_data=tableToDF(table_name)顯示(loan_risks_upload_data)結果:# +---------+-------------+-----------+------------+# | loan_id | funded_amnt | paid_amnt | addr_state |# +=========+=============+===========+============+# | 0 | 1000 | 182.22 | ca |# +---------+-------------+-----------+------------+# | 1 | 1000 | 361.19 | wa |# +---------+-------------+-----------+------------+# | 2 | 1000 | 176.26 | tx |# +---------+-------------+-----------+------------+#……
瓦爾table_name=“default.loan_risks_upload”瓦爾source_data=“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”瓦爾source_format=“鋪”火花.sql("刪除表如果存在"+table_name)火花.sql(“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”)火花.sql(“複製到”+table_name+“從”+source_data+“”+" Fileformat = "+source_format)瓦爾loan_risks_upload_data=火花.表格(table_name)顯示(loan_risks_upload_data)/*結果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...* /
下降表格如果存在默認的.loan_risks_upload;創建表格默認的.loan_risks_upload(loan_id長整型數字,funded_amntINT,paid_amnt雙,addr_state字符串);複製成默認的.loan_risks_upload從/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'FILEFORMAT=拚花;選擇*從默認的.loan_risks_upload;——結果:-- +---------+-------------+-----------+------------+——| loan_id | funded_amnt | paid_amnt | addr_state |-- +=========+=============+===========+============+——| 0 | 1000 | 182.22 | ca |-- +---------+-------------+-----------+------------+——| 1 | 1000 | 361.19 | wa |-- +---------+-------------+-----------+------------+——| 2 | 1000 | 176.26 | tx |-- +---------+-------------+-----------+------------+——……
要清理,運行以下代碼,刪除表:
火花.sql(“刪除表”+table_name)
sql(粘貼(“刪除表”,table_name,9月=""))
火花.sql(“刪除表”+table_name)
下降表格默認的.loan_risks_upload
參考
磚運行時7。x,上圖:複製到