public class SparkStrategies.HashJoin extends org.apache.spark.sql.catalyst.planning.GenericStrategy<SparkPlan> implements org.apache.spark.sql.catalyst.expressions.PredicateHelper
| Constructor and Description |
|---|
SparkStrategies.HashJoin()
Uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be
evaluated by matching hash keys.
|
| Modifier and Type | Method and Description |
|---|---|
scala.collection.Seq<SparkPlan> |
apply(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan) |
isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcanEvaluate, splitConjunctivePredicatesinitializeIfNecessary, initializeLogging, log_public SparkStrategies.HashJoin()
This strategy applies a simple optimization based on the estimates of the physical sizes of
the two join sides. When planning a BroadcastHashJoin, if one side has an
estimated physical size smaller than the user-settable threshold
org.apache.spark.sql.SQLConf.AUTO_BROADCASTJOIN_THRESHOLD, the planner would mark it as the
''build'' relation and mark the other relation as the ''stream'' side. The build table will be
''broadcasted'' to all of the executors involved in the join, as a
Broadcast object. If both estimates exceed the threshold, they
will instead be used to decide the build side in a ShuffledHashJoin.