Big Data
Computer Science & Information Systems
Big Data is a buzzword. It regards storing and processing large amounts of data. In this course, we discuss the following topics in Big Data:
- Big Data Definition
- Big Data Characteristics and Challenges
- Hadoop
- Hadoop Distributed File System (HDFS)
- MapReduce Programming
- Apache Spark
- Resilient Distributed Datasets (RDDs)
- Pair Resilient Distributed Datasets (PairRDDs)
- Spark SQL
- Pandas on Spark
Below you will find the main datasets used in this course and their respective link.
| Dataset | Link | |
|---|---|---|
| Airports.csv | Link | |
| Bible.txt | Link | |
| Forest Fire | Link | |
| JY157487.1 | Link | |
| RealEstate | Link | |
| Transactions (sample) | Link | |
| UK Makerspace | Link | |
| UK Postcode | Link | |
| Give me Loan | Link |
You will also find a setup for your computer here.
The tools we use during this course.