pyspark.sql.functions.sentences#

pyspark.sql.functions.sentences(string, language=None, country=None)[source]#

Splits a string into arrays of sentences, where each sentence is an array of words. The language and country arguments are optional, When they are omitted: 1.If they are both omitted, the Locale.ROOT - locale(language=’’, country=’’) is used. The Locale.ROOT is regarded as the base locale of all locales, and is used as the language/country neutral locale for the locale sensitive operations. 2.If the country is omitted, the locale(language, country=’’) is used. When they are null: 1.If they are both null, the Locale.US - locale(language=’en’, country=’US’) is used. 2.If the language is null and the country is not null, the Locale.US - locale(language=’en’, country=’US’) is used. 3.If the language is not null and the country is null, the locale(language) is used. 4.If neither is null, the locale(language, country) is used.

New in version 3.2.0.

Changed in version 3.4.0: Supports Spark Connect.

Changed in version 4.0.0: Supports sentences(string, language).

Parameters
stringColumn or column name

a string to be split

languageColumn or column name, optional

a language of the locale

countryColumn or column name, optional

a country of the locale

Returns
Column

arrays of split sentences.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("This is an example sentence.", )], ["s"])
>>> df.select("*", sf.sentences(df.s, sf.lit("en"), sf.lit("US"))).show(truncate=False)
+----------------------------+-----------------------------------+
|s                           |sentences(s, en, US)               |
+----------------------------+-----------------------------------+
|This is an example sentence.|[[This, is, an, example, sentence]]|
+----------------------------+-----------------------------------+
>>> df.select("*", sf.sentences(df.s, sf.lit("en"))).show(truncate=False)
+----------------------------+-----------------------------------+
|s                           |sentences(s, en, )                 |
+----------------------------+-----------------------------------+
|This is an example sentence.|[[This, is, an, example, sentence]]|
+----------------------------+-----------------------------------+
>>> df.select("*", sf.sentences(df.s)).show(truncate=False)
+----------------------------+-----------------------------------+
|s                           |sentences(s, , )                   |
+----------------------------+-----------------------------------+
|This is an example sentence.|[[This, is, an, example, sentence]]|
+----------------------------+-----------------------------------+