Class Central Tips
At the end of this course, you will be able to:
* Describe the Big Data landscape including examples of real world big data problems including the three key sources of Big Data: people, organizations, and sensors.
* Explain the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) and why each impacts data collection, monitoring, storage, analysis and reporting.
* Get value out of Big Data by using a 5-step process to structure your analysis.
* Identify what are and what are not big data problems and be able to recast big data problems as data science questions.
* Provide an explanation of the architectural components and programming models used for scalable big data analysis.
* Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.
* Install and run a program using Hadoop!
This course is for those new to data science. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments.
Hardware Requirements:
(A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection because you will be downloading files up to 4 Gb in size.
Software Requirements:
This course relies on several open-source software tools, including Apache Hadoop. All required software can be downloaded and installed free of charge. Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.