org.apache.spark.scheduler
Class InputFormatInfo
Object
org.apache.spark.scheduler.InputFormatInfo
- All Implemented Interfaces:
- Logging
public class InputFormatInfo
- extends Object
- implements Logging
:: DeveloperApi ::
Parses and holds information about inputFormat (and files) specified as a parameter.
Constructor Summary |
InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
Class<?> inputFormatClazz,
String path)
|
Methods inherited from class Object |
getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.apache.spark.Logging |
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
InputFormatInfo
public InputFormatInfo(org.apache.hadoop.conf.Configuration configuration,
Class<?> inputFormatClazz,
String path)
computePreferredLocations
public static scala.collection.immutable.Map<String,scala.collection.immutable.Set<SplitInfo>> computePreferredLocations(scala.collection.Seq<InputFormatInfo> formats)
- Computes the preferred locations based on input(s) and returned a location to block map.
Typical use of this method for allocation would follow some algo like this:
a) For each host, count number of splits hosted on that host.
b) Decrement the currently allocated containers on that host.
c) Compute rack info for each host and update rack -> count map based on (b).
d) Allocate nodes based on (c)
e) On the allocation result, ensure that we dont allocate "too many" jobs on a single node
(even if data locality on that is very high) : this is to prevent fragility of job if a
single (or small set of) hosts go down.
go to (a) until required nodes are allocated.
If a node 'dies', follow same procedure.
PS: I know the wording here is weird, hopefully it makes some sense !
- Parameters:
formats
- (undocumented)
- Returns:
- (undocumented)
configuration
public org.apache.hadoop.conf.Configuration configuration()
inputFormatClazz
public Object inputFormatClazz()
path
public String path()
mapreduceInputFormat
public boolean mapreduceInputFormat()
mapredInputFormat
public boolean mapredInputFormat()
toString
public String toString()
- Overrides:
toString
in class Object
hashCode
public int hashCode()
- Overrides:
hashCode
in class Object
equals
public boolean equals(Object other)
- Overrides:
equals
in class Object