用COPY INTO加載數據

複製SQL命令允許您將數據從文件位置加載到Delta表中。這是一個可重試的冪等運算;源位置中已經加載的文件將被跳過。

COPY INTO以多種方式支持安全訪問,包括使用臨時憑據用COPY INTO加載數據

三角洲湖的空桌子

請注意

該特性在Databricks Runtime 11.0及以上版本中可用。

可以創建空的占位符Delta表,以便稍後在執行過程中推斷模式複製命令:

創建表格如果存在my_table評論<table_description>TBLPROPERTIES<table_properties>));複製my_table“/道路/ /文件”FILEFORMAT<格式>FORMAT_OPTIONS“mergeSchema”“真正的”COPY_OPTIONS“mergeSchema”“真正的”);

上麵的SQL語句是冪等的,可以計劃運行以精確地將數據攝取到Delta表中——一次。

請注意

的外部不能使用空的Delta表複製插入而且合並不支持將數據寫入無模式的Delta表。將數據插入到表中複製,表就可以查詢了。

看到為COPY INTO創建目標表

例子

有關常用的使用模式,請參見COPY INTO的常見數據加載模式

下麵的示例演示如何創建Delta表,然後使用複製裝入樣本數據的SQL命令樣本數據集到桌子上。您可以運行示例Python、R、Scala或SQL代碼筆記本附在數據庫上集群.您還可以從查詢關聯到一個SQL倉庫磚的SQL

table_name“default.loan_risks_upload”source_data/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'source_format“鋪”火花sql"刪除表如果存在"+table_name火花sql“CREATE TABLE”+table_name+”(““loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”火花sql“複製到”+table_name+“從”+source_data+“”+" Fileformat = "+source_formatloan_risks_upload_data火花sql" select * from "+table_name顯示loan_risks_upload_data“‘結果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...“‘
圖書館SparkRsparkR.session()table_name“default.loan_risks_upload”source_data“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”source_format“鋪”sql粘貼"刪除表如果存在"table_name9月""))sql粘貼“CREATE TABLE”table_name”(““loan_id BIGINT。”“funded_amnt INT。”“paid_amnt加倍,”“addr_state字符串)”9月""))sql粘貼“複製到”table_name“從”source_data“”" Fileformat = "source_format9月""))loan_risks_upload_datatableToDFtable_name顯示loan_risks_upload_data結果:# +---------+-------------+-----------+------------+# | loan_id | funded_amnt | paid_amnt | addr_state |# +=========+=============+===========+============+# | 0 | 1000 | 182.22 | ca |# +---------+-------------+-----------+------------+# | 1 | 1000 | 361.19 | wa |# +---------+-------------+-----------+------------+# | 2 | 1000 | 176.26 | tx |# +---------+-------------+-----------+------------+#……
瓦爾table_name“default.loan_risks_upload”瓦爾source_data“/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet”瓦爾source_format“鋪”火花sql"刪除表如果存在"+table_name火花sql“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”火花sql“複製到”+table_name+“從”+source_data+“”+" Fileformat = "+source_format瓦爾loan_risks_upload_data火花表格table_name顯示loan_risks_upload_data/*結果:+---------+-------------+-----------+------------+| loan_id | funded_amnt | paid_amnt | addr_state |+=========+=============+===========+============+| 0 | 1000 | 182.22 | ca |+---------+-------------+-----------+------------+| 1 | 1000 | 361.19 | wa |+---------+-------------+-----------+------------+| 2 | 1000 | 176.26 | tx |+---------+-------------+-----------+------------+...* /
下降表格如果存在默認的loan_risks_upload創建表格默認的loan_risks_uploadloan_id長整型數字funded_amntINTpaid_amntaddr_state字符串);複製默認的loan_risks_upload/ databricks-datasets / learning-spark-v2 /貸款/ loan-risks.snappy.parquet 'FILEFORMAT拚花選擇默認的loan_risks_upload——結果:-- +---------+-------------+-----------+------------+——| loan_id | funded_amnt | paid_amnt | addr_state |-- +=========+=============+===========+============+——| 0 | 1000 | 182.22 | ca |-- +---------+-------------+-----------+------------+——| 1 | 1000 | 361.19 | wa |-- +---------+-------------+-----------+------------+——| 2 | 1000 | 176.26 | tx |-- +---------+-------------+-----------+------------+——……

要清理,運行以下代碼,刪除表:

火花sql“刪除表”+table_name
sql粘貼“刪除表”table_name9月""))
火花sql“刪除表”+table_name
下降表格默認的loan_risks_upload

參考