What is data science, and why is it different from statistics and market research? Get introduced into the concepts, the main tools and tasks:
1. Volume, Velocity, Variety – what makes data “big data”
2. Data generation
Collect data from various sources; crawling, parsing, scraping; APIs – get data from social networks.
3. Data storage and retrival
Hadoop and HDFS – the “operation system” of big data; NoSQL data bases; real-time analytics – lambda-architecture.
4. Data preparation
Data munging – getting rid of errors and random noise; formatting; quality management.
5. Data analysis
Machine learning; exploratory data analytics; agile statistics; geo-data; network analytics, text analytics; visualization.
This module can be offered as lecture (90 minutes) or workshop with practical examples that are prepared by the participants on their own computers (full day or half day).