pyspark.sql.streaming.DataStreamReader.text

DataStreamReader.text(path, wholetext=False, lineSep=None, pathGlobFilter=None, recursiveFileLookup=None)[source]

Loads a text file stream and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any. The text files must be encoded as UTF-8.

By default, each line in the text file is a new row in the resulting DataFrame.

New in version 2.0.0.

Parameters:
pathsstr or list

string, or list of strings, for input path(s).

wholetextstr or bool, optional

if true, read each file from input path(s) as a single row.

lineSepstr, optional

defines the line separator that should be used for parsing. If None is set, it covers all \r, \r\n and \n.

pathGlobFilterstr or bool, optional

an optional glob pattern to only include files with paths matching the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. It does not change the behavior of partition discovery.

recursiveFileLookupstr or bool, optional

recursively scan a directory for files. Using this option disables partition discovery. # noqa

Notes

This API is evolving.

Examples

>>> text_sdf = spark.readStream.text(tempfile.mkdtemp())
>>> text_sdf.isStreaming
True
>>> "value" in str(text_sdf.schema)
True