public class SQLContext extends Object implements Logging, scala.Serializable
DataFrame
objects as well as the execution of SQL queries.
Modifier and Type | Class and Description |
---|---|
class |
SQLContext.implicits |
Constructor and Description |
---|
SQLContext(JavaSparkContext sparkContext) |
SQLContext(SparkContext sparkContext) |
Modifier and Type | Method and Description |
---|---|
DataFrame |
applySchema(JavaRDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
DataFrame |
applySchema(JavaRDD<Row> rowRDD,
StructType schema) |
DataFrame |
applySchema(RDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
DataFrame |
applySchema(RDD<Row> rowRDD,
StructType schema)
|
DataFrame |
baseRelationToDataFrame(BaseRelation baseRelation)
Convert a
BaseRelation created for external data sources into a DataFrame . |
void |
cacheTable(String tableName)
Caches the specified table in-memory.
|
void |
clearCache()
Removes all cached tables from the in-memory cache.
|
DataFrame |
createDataFrame(JavaRDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
DataFrame |
createDataFrame(JavaRDD<Row> rowRDD,
java.util.List<String> columns)
|
DataFrame |
createDataFrame(JavaRDD<Row> rowRDD,
StructType schema)
|
DataFrame |
createDataFrame(RDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
<A extends scala.Product> |
createDataFrame(RDD<A> rdd,
scala.reflect.api.TypeTags.TypeTag<A> evidence$3)
:: Experimental ::
Creates a DataFrame from an RDD of case classes.
|
DataFrame |
createDataFrame(RDD<Row> rowRDD,
StructType schema)
|
DataFrame |
createDataFrame(RDD<Row> rowRDD,
StructType schema,
boolean needsConversion)
Creates a DataFrame from an RDD[Row].
|
<A extends scala.Product> |
createDataFrame(scala.collection.Seq<A> data,
scala.reflect.api.TypeTags.TypeTag<A> evidence$4)
:: Experimental ::
Creates a DataFrame from a local Seq of Product.
|
DataFrame |
createExternalTable(String tableName,
String path)
:: Experimental ::
Creates an external table from the given path and returns the corresponding DataFrame.
|
DataFrame |
createExternalTable(String tableName,
String source,
java.util.Map<String,String> options)
:: Experimental ::
Creates an external table from the given path based on a data source and a set of options.
|
DataFrame |
createExternalTable(String tableName,
String source,
scala.collection.immutable.Map<String,String> options)
:: Experimental ::
(Scala-specific)
Creates an external table from the given path based on a data source and a set of options.
|
DataFrame |
createExternalTable(String tableName,
String path,
String source)
:: Experimental ::
Creates an external table from the given path based on a data source
and returns the corresponding DataFrame.
|
DataFrame |
createExternalTable(String tableName,
String source,
StructType schema,
java.util.Map<String,String> options)
:: Experimental ::
Create an external table from the given path based on a data source, a schema and
a set of options.
|
DataFrame |
createExternalTable(String tableName,
String source,
StructType schema,
scala.collection.immutable.Map<String,String> options)
:: Experimental ::
(Scala-specific)
Create an external table from the given path based on a data source, a schema and
a set of options.
|
void |
dropTempTable(String tableName)
Drops the temporary table with the given table name in the catalog.
|
DataFrame |
emptyDataFrame()
:: Experimental ::
Returns a
DataFrame with no rows or columns. |
ExperimentalMethods |
experimental()
:: Experimental ::
A collection of methods that are considered experimental, but can be used to hook into
the query planner for advanced functionality.
|
scala.collection.immutable.Map<String,String> |
getAllConfs()
Return all the configuration properties that have been set (i.e.
|
String |
getConf(String key)
Return the value of Spark SQL configuration property for the given key.
|
String |
getConf(String key,
String defaultValue)
Return the value of Spark SQL configuration property for the given key.
|
org.apache.spark.sql.SQLContext.implicits$ |
implicits() |
boolean |
isCached(String tableName)
Returns true if the table is currently cached in-memory.
|
DataFrame |
jdbc(String url,
String table)
:: Experimental ::
Construct a
DataFrame representing the database table accessible via JDBC URL
url named table. |
DataFrame |
jdbc(String url,
String table,
String[] theParts)
:: Experimental ::
Construct a
DataFrame representing the database table accessible via JDBC URL
url named table. |
DataFrame |
jdbc(String url,
String table,
String columnName,
long lowerBound,
long upperBound,
int numPartitions)
:: Experimental ::
Construct a
DataFrame representing the database table accessible via JDBC URL
url named table. |
DataFrame |
jsonFile(String path)
Loads a JSON file (one object per line), returning the result as a
DataFrame . |
DataFrame |
jsonFile(String path,
double samplingRatio)
:: Experimental ::
|
DataFrame |
jsonFile(String path,
StructType schema)
:: Experimental ::
Loads a JSON file (one object per line) and applies the given schema,
returning the result as a
DataFrame . |
DataFrame |
jsonRDD(JavaRDD<String> json)
Loads an RDD[String] storing JSON objects (one object per record), returning the result as a
DataFrame . |
DataFrame |
jsonRDD(JavaRDD<String> json,
double samplingRatio)
:: Experimental ::
Loads a JavaRDD[String] storing JSON objects (one object per record) inferring the
schema, returning the result as a
DataFrame . |
DataFrame |
jsonRDD(JavaRDD<String> json,
StructType schema)
:: Experimental ::
Loads an JavaRDD
DataFrame . |
DataFrame |
jsonRDD(RDD<String> json)
Loads an RDD[String] storing JSON objects (one object per record), returning the result as a
DataFrame . |
DataFrame |
jsonRDD(RDD<String> json,
double samplingRatio)
:: Experimental ::
Loads an RDD[String] storing JSON objects (one object per record) inferring the
schema, returning the result as a
DataFrame . |
DataFrame |
jsonRDD(RDD<String> json,
StructType schema)
:: Experimental ::
Loads an RDD[String] storing JSON objects (one object per record) and applies the given schema,
returning the result as a
DataFrame . |
DataFrame |
load(String path)
:: Experimental ::
Returns the dataset stored at path as a DataFrame,
using the default data source configured by spark.sql.sources.default.
|
DataFrame |
load(String source,
java.util.Map<String,String> options)
:: Experimental ::
(Java-specific) Returns the dataset specified by the given data source and
a set of options as a DataFrame.
|
DataFrame |
load(String source,
scala.collection.immutable.Map<String,String> options)
:: Experimental ::
(Scala-specific) Returns the dataset specified by the given data source and
a set of options as a DataFrame.
|
DataFrame |
load(String path,
String source)
:: Experimental ::
Returns the dataset stored at path as a DataFrame, using the given data source.
|
DataFrame |
load(String source,
StructType schema,
java.util.Map<String,String> options)
:: Experimental ::
(Java-specific) Returns the dataset specified by the given data source and
a set of options as a DataFrame, using the given schema as the schema of the DataFrame.
|
DataFrame |
load(String source,
StructType schema,
scala.collection.immutable.Map<String,String> options)
:: Experimental ::
(Scala-specific) Returns the dataset specified by the given data source and
a set of options as a DataFrame, using the given schema as the schema of the DataFrame.
|
DataFrame |
parquetFile(scala.collection.Seq<String> paths)
Loads a Parquet file, returning the result as a
DataFrame . |
DataFrame |
parquetFile(String... paths)
Loads a Parquet file, returning the result as a
DataFrame . |
void |
registerDataFrameAsTable(DataFrame df,
String tableName)
Registers the given
DataFrame as a temporary table in the catalog. |
void |
setConf(java.util.Properties props)
Set Spark SQL configuration properties.
|
void |
setConf(String key,
String value)
Set the given Spark SQL configuration property.
|
SparkContext |
sparkContext() |
DataFrame |
sql(String sqlText)
Executes a SQL query using Spark, returning the result as a
DataFrame . |
DataFrame |
table(String tableName)
Returns the specified table as a
DataFrame . |
String[] |
tableNames()
Returns the names of tables in the current database as an array.
|
String[] |
tableNames(String databaseName)
Returns the names of tables in the given database as an array.
|
DataFrame |
tables()
Returns a
DataFrame containing names of existing tables in the current database. |
DataFrame |
tables(String databaseName)
Returns a
DataFrame containing names of existing tables in the given database. |
UDFRegistration |
udf()
A collection of methods for registering user-defined functions (UDF).
|
void |
uncacheTable(String tableName)
Removes the specified table from the in-memory cache.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public SQLContext(SparkContext sparkContext)
public SQLContext(JavaSparkContext sparkContext)
public DataFrame parquetFile(String... paths)
public SparkContext sparkContext()
public void setConf(java.util.Properties props)
public void setConf(String key, String value)
public String getConf(String key)
public String getConf(String key, String defaultValue)
defaultValue
.
public scala.collection.immutable.Map<String,String> getAllConfs()
public ExperimentalMethods experimental()
public DataFrame emptyDataFrame()
DataFrame
with no rows or columns.
public UDFRegistration udf()
The following example registers a Scala closure as UDF:
sqlContext.udf.register("myUdf", (arg1: Int, arg2: String) => arg2 + arg1)
The following example registers a UDF in Java:
sqlContext.udf().register("myUDF",
new UDF2<Integer, String, String>() {
@Override
public String call(Integer arg1, String arg2) {
return arg2 + arg1;
}
}, DataTypes.StringType);
Or, to use Java 8 lambda syntax:
sqlContext.udf().register("myUDF",
(Integer arg1, String arg2) -> arg2 + arg1),
DataTypes.StringType);
public boolean isCached(String tableName)
public void cacheTable(String tableName)
public void uncacheTable(String tableName)
public void clearCache()
public org.apache.spark.sql.SQLContext.implicits$ implicits()
public <A extends scala.Product> DataFrame createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$3)
public <A extends scala.Product> DataFrame createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$4)
public DataFrame baseRelationToDataFrame(BaseRelation baseRelation)
BaseRelation
created for external data sources into a DataFrame
.
public DataFrame createDataFrame(RDD<Row> rowRDD, StructType schema)
DataFrame
from an RDD
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
Example:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema =
StructType(
StructField("name", StringType, false) ::
StructField("age", IntegerType, true) :: Nil)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(
_.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sqlContext.createDataFrame(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)
dataFrame.registerTempTable("people")
sqlContext.sql("select name from people").collect.foreach(println)
public DataFrame createDataFrame(RDD<Row> rowRDD, StructType schema, boolean needsConversion)
public DataFrame createDataFrame(JavaRDD<Row> rowRDD, StructType schema)
public DataFrame createDataFrame(JavaRDD<Row> rowRDD, java.util.List<String> columns)
DataFrame
from an JavaRDD
containing Row
s by applying
a seq of names of columns to this RDD, the data type for each column will
be inferred by the first row.
rowRDD
- an JavaRDD of Rowcolumns
- names for each columnpublic DataFrame createDataFrame(RDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
public DataFrame createDataFrame(JavaRDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
public DataFrame applySchema(RDD<Row> rowRDD, StructType schema)
DataFrame
from an RDD
containing Row
s by applying a schema to this RDD.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
Example:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema =
StructType(
StructField("name", StringType, false) ::
StructField("age", IntegerType, true) :: Nil)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(
_.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sqlContext. applySchema(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)
dataFrame.registerTempTable("people")
sqlContext.sql("select name from people").collect.foreach(println)
public DataFrame applySchema(JavaRDD<Row> rowRDD, StructType schema)
public DataFrame applySchema(RDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
public DataFrame applySchema(JavaRDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
public DataFrame parquetFile(scala.collection.Seq<String> paths)
public DataFrame jsonFile(String path)
DataFrame
.
It goes through the entire dataset once to determine the schema.
public DataFrame jsonFile(String path, StructType schema)
DataFrame
.
public DataFrame jsonFile(String path, double samplingRatio)
public DataFrame jsonRDD(RDD<String> json)
DataFrame
.
It goes through the entire dataset once to determine the schema.
public DataFrame jsonRDD(JavaRDD<String> json)
DataFrame
.
It goes through the entire dataset once to determine the schema.
public DataFrame jsonRDD(RDD<String> json, StructType schema)
DataFrame
.
public DataFrame jsonRDD(JavaRDD<String> json, StructType schema)
DataFrame
.
public DataFrame jsonRDD(RDD<String> json, double samplingRatio)
DataFrame
.
public DataFrame jsonRDD(JavaRDD<String> json, double samplingRatio)
DataFrame
.
public DataFrame load(String path)
public DataFrame load(String path, String source)
public DataFrame load(String source, java.util.Map<String,String> options)
public DataFrame load(String source, scala.collection.immutable.Map<String,String> options)
public DataFrame load(String source, StructType schema, java.util.Map<String,String> options)
public DataFrame load(String source, StructType schema, scala.collection.immutable.Map<String,String> options)
public DataFrame createExternalTable(String tableName, String path)
public DataFrame createExternalTable(String tableName, String path, String source)
public DataFrame createExternalTable(String tableName, String source, java.util.Map<String,String> options)
public DataFrame createExternalTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)
public DataFrame createExternalTable(String tableName, String source, StructType schema, java.util.Map<String,String> options)
public DataFrame createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)
public DataFrame jdbc(String url, String table)
DataFrame
representing the database table accessible via JDBC URL
url named table.
public DataFrame jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions)
DataFrame
representing the database table accessible via JDBC URL
url named table. Partitions of the table will be retrieved in parallel based on the parameters
passed to this function.
columnName
- the name of a column of integral type that will be used for partitioning.lowerBound
- the minimum value of columnName
to retrieveupperBound
- the maximum value of columnName
to retrievenumPartitions
- the number of partitions. the range minValue
-maxValue
will be split
evenly into this many partitions
public DataFrame jdbc(String url, String table, String[] theParts)
public void registerDataFrameAsTable(DataFrame df, String tableName)
DataFrame
as a temporary table in the catalog. Temporary tables exist
only during the lifetime of this instance of SQLContext.public void dropTempTable(String tableName)
tableName
- the name of the table to be unregistered.
public DataFrame sql(String sqlText)
DataFrame
. The dialect that is
used for SQL parsing can be configured with 'spark.sql.dialect'.
public DataFrame tables()
DataFrame
containing names of existing tables in the current database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
public DataFrame tables(String databaseName)
DataFrame
containing names of existing tables in the given database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
public String[] tableNames()
public String[] tableNames(String databaseName)