Placeholder for my scripts in MOOC final project
Hadoop and MapReduce part
Spark part
- AWS Python SDK
pip install boto3
pip install awscli
- mrjob as MapReduce framework in Python
pip install mrjob
-
Download Transportation dataset
-
Run DynamoDB
-
Set up a Hadoop cluster (or run in single computer)
Execute the python scripts on master NameNode which dispatches MapReduce jobs to its Hadoop cluster, the result will be saved in HDFS (and DynamoDB)
python mr_job_capstone2-1.py -r hadoop hdfs://<namenode address>:/data/orig_dest/*.csv -o hdfs://<namenode address>:/data/q2_1_output/
python dynamodb-crud/query_table2-1.py <airport name>
Demo videos:
- https://www.youtube.com/watch?v=2_UtJZLKyAk (Task 1)
- https://www.youtube.com/watch?v=Aiv9oiZ8B00 (Task 2)
- https://www.youtube.com/watch?v=qM53BNZ71o8 (Task 2 - detailed version)
ubuntu@ec2-52-xxx-xxx-253:~$ hadoop fs -dus /data/*
hdfs://ec2-52-xxx-xxx-52.compute-1.amazonaws.com:8020/data/on_time 30547789179
hdfs://ec2-52-xxx-xxx-52.compute-1.amazonaws.com:8020/data/orig_dest 32576029024
ubuntu@ec2-52-xxx-xxx-253:~$ hadoop fs -ls /data/on_time
Found 194 items
-rw-r--r-- 2 ubuntu supergroup 142196587 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_1.csv
-rw-r--r-- 2 ubuntu supergroup 131244457 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_10.csv
-rw-r--r-- 2 ubuntu supergroup 123724241 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_11.csv
-rw-r--r-- 2 ubuntu supergroup 125724649 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_12.csv
-rw-r--r-- 2 ubuntu supergroup 112418169 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_2.csv
-rw-r--r-- 2 ubuntu supergroup 127615271 2016-02-06 05:23 /data/on_time/On_Time_On_Time_Performance_1990_3.csv
...
ubuntu@ec2-52-xxx-xxx-253:~$ hadoop fs -ls /data/orig_dest
Found 22 items
-rw-r--r-- 2 ubuntu supergroup 1201493617 2016-02-05 17:37 /data/orig_dest/Origin_and_Destination_Survey_DB1BCoupon_2003_1.csv
-rw-r--r-- 2 ubuntu supergroup 1424484646 2016-02-05 18:10 /data/orig_dest/Origin_and_Destination_Survey_DB1BCoupon_2003_2.csv
-rw-r--r-- 2 ubuntu supergroup 1356612538 2016-02-05 17:37 /data/orig_dest/Origin_and_Destination_Survey_DB1BCoupon_2003_3.csv
-rw-r--r-- 2 ubuntu supergroup 1356612538 2016-02-05 17:37 /data/orig_dest/Origin_and_Destination_Survey_DB1BCoupon_2003_4.csv
...