b1123451020 - 502”、“,”{" " m ":{“差異”“:60}}”,“”,“”,“”,2022 - 02 - 12 t15:40:00.783z b1456741975 - 266 ", " {" " m ":{“差異”“:60}}"," "," "," ",2022 - 02 - 04 - t17:03:59.566z b1789753479 - 460,“”,“”,“”,“”,“”, 2022 - 02年- 18 t14:46:57.332z b1456741977 - 123 ", " {" " m ":{“差異”“:60}}"," "," "," ",2022 - 02 - 04 - t17:03:59.566z
df_inputfile = (spark.read.format .option (“com.databricks.spark.csv”) (“inferSchema”、“true”) .option .option(“頭”,“假”)(“quotedstring”、“\”) .option(“逃脫”,“\”).option .option(“多行”、“true”)(“分隔符”,",").load (csv <路徑>))打印(df_inputfile.count()) # 3打印打印(df_inputfile.distinct () .count()) #打印4
我想從CSV文件讀取上麵的數據,最終得到一個錯誤的統計,雖然dataframe包含所有預期的記錄。df_inputfile.count()打印3盡管它應該是4。
看來這一切都因為一個逗號的第四列第三行。有人能解釋為什麼嗎?
嗨Debayan,沒有語法錯誤的代碼片段。使用.option(“逃脫”,“”)方麵沒有區別。我仍然會錯誤的數量。