DataStreamReader.
text
Loads a text file stream and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any. The text files must be encoded as UTF-8.
DataFrame
By default, each line in the text file is a new row in the resulting DataFrame.
New in version 2.0.0.
string, or list of strings, for input path(s).
if true, read each file from input path(s) as a single row.
defines the line separator that should be used for parsing. If None is set, it covers all \r, \r\n and \n.
\r
\r\n
\n
an optional glob pattern to only include files with paths matching the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. It does not change the behavior of partition discovery.
recursively scan a directory for files. Using this option disables partition discovery. # noqa
Notes
This API is evolving.
Examples
>>> text_sdf = spark.readStream.text(tempfile.mkdtemp()) >>> text_sdf.isStreaming True >>> "value" in str(text_sdf.schema) True