You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm submitting my python job using the bin/spark-submit script. I want to configure the parameter in the python code through SparkConf & SparkContext class. I tried to set appName, master, spark.cores.max and spark.scheduler.mode in SparkConf object and pass it as an argument(conf) to the SparkContext. It turned out that the job was not sent to the master of the standalone cluster I set up(it just ran locally). Through the print statements below, it's clearly to see the SparkConf I passed to the SparkContext has all 4 configurations() but the conf variable from the SparkContext object doesn't have any of my updates despite of the default conf.
For other ways, I tried using conf/spark-defaults.conf or --properties-file my-spark.conf --master spark://spark-master:7077. It works.
I also tried to set master using master parameter seperately sc = pyspark.SparkContext(master="spark://spark-master:7077", conf=conf) and it worked as well!
So it seems that only the conf parameter cannot be congested by the SparkContext correctly.
import operator
import pyspark
def main():
'''Program entry point'''
import socket
print(socket.gethostbyname(socket.gethostname()))
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
num_cores = 4
conf = pyspark.SparkConf()
conf.setAppName("word-count").setMaster("spark://spark-master:7077")#setExecutorEnv("CLASSPATH", path)
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.cores.max", num_cores)
print("Conf: {}".format(conf.getAll()))
sc = pyspark.SparkContext(conf=conf)
print("SparkContextConf: {}".format(sc.getConf().getAll()))
#Intialize a spark context
# with pyspark.SparkContext(conf=conf) as sc:
# #Get a RDD containing lines from this script file
# lines = sc.textFile("/mnt/scratch/yidi/docker-volume/war_and_peace.txt")
# #Split each line into words and assign a frequency of 1 to each word
# words = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1))
# #count the frequency for words
# counts = words.reduceByKey(operator.add)
# #Sort the counts in descending order based on the word frequency
# sorted_counts = counts.sortBy(lambda x: x[1], False)
# #Get an iterator over the counts to print a word and its frequency
# for word,count in sorted_counts.toLocalIterator():
# print("{} --> {}".format(word.encode('utf-8'), count))
# print "Spark conf: ", conf.getAll()
if __name__ == "__main__":
main()
Console output running the job above:
root@bdd1ba99bf4f:/usr/spark-2.1.0# bin/spark-submit /mnt/scratch/yidi/docker-volume/test.py
10.0.0.5
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Conf: dict_items([('spark.scheduler.mode', 'FAIR'), ('spark.app.name', 'word-count'), ('spark.master', 'spark://spark-master:7077'), ('spark.cores.max', '4')])
17/04/03 23:24:27 INFO spark.SparkContext: Running Spark version 2.1.0
17/04/03 23:24:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls to: root
17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls to: root
17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls groups to:
17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls groups to:
17/04/03 23:24:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 38193.
17/04/03 23:24:27 INFO spark.SparkEnv: Registering MapOutputTracker
17/04/03 23:24:27 INFO spark.SparkEnv: Registering BlockManagerMaster
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/03 23:24:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-7b545c86-9344-4f73-a3db-e891c562714d
17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
17/04/03 23:24:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/04/03 23:24:27 INFO util.log: Logging initialized @1090ms
17/04/03 23:24:27 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3950ddbb{/jobs,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@286430d5{/jobs/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@404eb034{/jobs/job,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@79e3feec{/jobs/job/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4661596e{/stages,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4f8a3bef{/stages/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a70fd3a{/stages/stage,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c826806{/stages/stage/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50e4e8d1{/stages/pool,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e2fe461{/stages/pool/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@33cb59b3{/storage,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c06c594{/storage/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4b5300a5{/storage/rdd,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a6ef942{/storage/rdd/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1301217d{/environment,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19217cec{/environment/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a241145{/executors,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@470d45aa{/executors/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d9d8eff{/executors/threadDump,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4fc94fbc{/executors/threadDump/json,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@258dd139{/static,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@880f037{/,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@395b7dae{/api,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3cea7196{/jobs/job/kill,null,AVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77257b2b{/stages/stage/kill,null,AVAILABLE}
17/04/03 23:24:27 INFO server.ServerConnector: Started ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040}
17/04/03 23:24:27 INFO server.Server: Started @1154ms
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/04/03 23:24:27 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://localhost:4040
17/04/03 23:24:27 INFO spark.SparkContext: Added file file:/mnt/scratch/yidi/docker-volume/test.py at file:/mnt/scratch/yidi/docker-volume/test.py with timestamp 1491261867734
17/04/03 23:24:27 INFO util.Utils: Copying /mnt/scratch/yidi/docker-volume/test.py to /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/userFiles-ed0a6692-b96a-4b3d-be16-215a21cf836b/test.py
17/04/03 23:24:27 INFO executor.Executor: Starting executor ID driver on host localhost
17/04/03 23:24:27 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44223.
17/04/03 23:24:27 INFO netty.NettyBlockTransferService: Server created on 10.0.0.5:44223
17/04/03 23:24:27 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.5:44223 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.5, 44223, None)
17/04/03 23:24:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c06133a{/metrics/json,null,AVAILABLE}
SparkContextConf: [('spark.app.name', 'test.py'), ('spark.rdd.compress', 'True'), ('spark.serializer.objectStreamReset', '100'), ('spark.master', 'local[*]'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/mnt/scratch/yidi/docker-volume/test.py'), ('spark.driver.host', '10.0.0.5'), ('spark.app.id', 'local-1491261867756'), ('spark.driver.port', '38193')]
17/04/03 23:24:27 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/04/03 23:24:27 INFO server.ServerConnector: Stopped ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77257b2b{/stages/stage/kill,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3cea7196{/jobs/job/kill,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@395b7dae{/api,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@880f037{/,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@258dd139{/static,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4fc94fbc{/executors/threadDump/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5d9d8eff{/executors/threadDump,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@470d45aa{/executors/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4a241145{/executors,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@19217cec{/environment/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1301217d{/environment,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7a6ef942{/storage/rdd/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4b5300a5{/storage/rdd,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3c06c594{/storage/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@33cb59b3{/storage,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4e2fe461{/stages/pool/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50e4e8d1{/stages/pool,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1c826806{/stages/stage/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7a70fd3a{/stages/stage,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4f8a3bef{/stages/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4661596e{/stages,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@79e3feec{/jobs/job/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@404eb034{/jobs/job,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@286430d5{/jobs/json,null,UNAVAILABLE}
17/04/03 23:24:27 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3950ddbb{/jobs,null,UNAVAILABLE}
17/04/03 23:24:27 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040
17/04/03 23:24:27 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore cleared
17/04/03 23:24:27 INFO storage.BlockManager: BlockManager stopped
17/04/03 23:24:27 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/04/03 23:24:27 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/03 23:24:27 INFO spark.SparkContext: Successfully stopped SparkContext
17/04/03 23:24:27 INFO util.ShutdownHookManager: Shutdown hook called
17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/pyspark-50d75419-36b8-4dd2-a7c6-7d500c7c5bca
17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6
The text was updated successfully, but these errors were encountered:
I'm submitting my python job using the bin/spark-submit script. I want to configure the parameter in the python code through SparkConf & SparkContext class. I tried to set appName, master, spark.cores.max and spark.scheduler.mode in SparkConf object and pass it as an argument(conf) to the SparkContext. It turned out that the job was not sent to the master of the standalone cluster I set up(it just ran locally). Through the print statements below, it's clearly to see the SparkConf I passed to the SparkContext has all 4 configurations() but the conf variable from the SparkContext object doesn't have any of my updates despite of the default conf.
For other ways, I tried using conf/spark-defaults.conf or --properties-file my-spark.conf --master spark://spark-master:7077. It works.
I also tried to set master using master parameter seperately
sc = pyspark.SparkContext(master="spark://spark-master:7077", conf=conf)
and it worked as well!So it seems that only the conf parameter cannot be congested by the SparkContext correctly.
You can find the master got updated.
This is my python job code.
Console output running the job above:
The text was updated successfully, but these errors were encountered: