Data Science – an Introduction (“Data Science 101”)

What is data science, and why is it different from statistics and market research? Get introduced into the concepts, the main tools and tasks:

1. Volume, Velocity, Variety – what makes data “big data”

2. Data generation

Collect data from various sources; crawling, parsing, scraping; APIs – get data from social networks.

3. Data storage and retrival

Hadoop and HDFS – the “operation system” of big data; NoSQL data bases; real-time analytics – lambda-architecture.

4. Data preparation

Data munging – getting rid of errors and random noise; formatting; quality management.

5. Data analysis

Machine learning; exploratory data analytics; agile statistics; geo-data; network analytics, text analytics; visualization.

This module can be offered as lecture (90 minutes) or workshop with practical examples that are prepared by the participants on their own computers (full day or half day).