pyspark.sql.DataFrameReader.csv¶

DataFrameReader。 csv ( 路徑:聯盟(str,列表(str]],模式:(pyspark.sql.types聯盟。StructType str,沒有)=沒有一個,9月:可選(str]=沒有一個,編碼:可選(str]=沒有一個,報價:可選(str]=沒有一個,逃避:可選(str]=沒有一個,評論:可選(str]=沒有一個,頭:聯盟(bool, str,沒有)=沒有一個,inferSchema:聯盟(bool, str,沒有)=沒有一個,ignoreLeadingWhiteSpace:聯盟(bool, str,沒有)=沒有一個,ignoreTrailingWhiteSpace:聯盟(bool, str,沒有)=沒有一個,nullValue:可選(str]=沒有一個,nanValue:可選(str]=沒有一個,positiveInf:可選(str]=沒有一個,negativeInf:可選(str]=沒有一個,dateFormat:可選(str]=沒有一個,timestampFormat:可選(str]=沒有一個,maxColumns:聯盟(str, int,沒有)=沒有一個,maxCharsPerColumn:聯盟(str, int,沒有)=沒有一個,maxMalformedLogPerPartition:聯盟(str, int,沒有)=沒有一個,模式:可選(str]=沒有一個,columnNameOfCorruptRecord:可選(str]=沒有一個,多行:聯盟(bool, str,沒有)=沒有一個,charToEscapeQuoteEscaping:可選(str]=沒有一個,samplingRatio:聯盟(str,漂浮,沒有)=沒有一個,enforceSchema:聯盟(bool, str,沒有)=沒有一個,emptyValue:可選(str]=沒有一個,語言環境:可選(str]=沒有一個,lineSep:可選(str]=沒有一個,pathGlobFilter:聯盟(bool, str,沒有)=沒有一個,recursiveFileLookup:聯盟(bool, str,沒有)=沒有一個,modifiedBefore:聯盟(bool, str,沒有)=沒有一個,modifiedAfter:聯盟(bool, str,沒有)=沒有一個,unescapedQuoteHandling:可選(str]=沒有一個 )→DataFrame¶

加載一個CSV文件,並返回結果DataFrame。

這個函數將通過確定輸入模式如果輸入一次inferSchema啟用。為了避免經曆整個數據一次,禁用inferSchema選項或顯式地指定模式使用模式。

參數

路徑 str或列表: 字符串或字符串列表,輸入路徑(s),或抽樣的字符串存儲CSV行。
模式 pyspark.sql.types.StructType或str,可選: 一個可選的pyspark.sql.types.StructType輸入模式或DDL-formatted字符串(例如col0INT,col1雙)。

其他參數

額外的選項: 額外的選項,請參考數據源的選擇在你使用的版本。

例子

           > > >df=火花。讀。csv(“python / test_support / sql / ages.csv”)> > >df。dtypes((“_c0”、“字符串”)(“_c1”、“字符串”)]> > >抽樣=sc。文本文件(“python / test_support / sql / ages.csv”)> > >df2=火花。讀。csv(抽樣)> > >df2。dtypes((“_c0”、“字符串”)(“_c1”、“字符串”)]
          

以前的

輸入/輸出

下一個

pyspark.sql.DataFrameReader.format