資料科學從頭學（二）初學．Udemy 4門初學課程

6月 23, 2017

（文章內容皆為記錄本人之學習過程，非以分享為目的）
一、目標：

7、8月完成 Udemy 4門資料科學相關課程，總時數 93.5小時。

二、做法：

7月份完成課程一遍，8月份挑出不熟與需要加強的地方精進。

7月份平均每周23.4小時，等於每天4.6小時(五天) + 2小時以上練習。

( google calendar 紀錄課程與累積時數)

三、原則：

每堂課邊看邊分類出 1.基礎:一定要懂、 2.八月份精進精進名單、3.有概念就好

四、課程內容：

Course1. Data Science A-Z：Real-Life Data Science Exercises (21 Hours)

This course will give you a full overview of the Data Science journey. Upon completing this course you will know:

How to clean and prepare your data for analysis
How to perform basic visualisation of your data
How to model your data
How to curve-fit your data
And finally, how to present your findings and wow the audience

In this course you will develop a good understanding of the following tools:

SQL
SSIS
Tableau
Gretl

重點：

1) 學整體概念優先於個別工具學習

2) 掌握資料庫相關概念

注意：

1) Tableau、Gretl　都是付費工具，以體驗為主，不求精進

Course2. Machine Learning A-Z：Hands-on Python & R in Data Science(41 Hours)

Structured:

Part 1 - Data Preprocessing (與課程1重複)
Part 2 - Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, Random Forest Regression
Part 3 - Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree Classification, Random Forest Classification
Part 4 - Clustering: K-Means, Hierarchical Clustering
Part 5 - Association Rule Learning: Apriori, Eclat
Part 6 - Reinforcement Learning: Upper Confidence Bound, Thompson Sampling
Part 7 - Natural Language Processing: Bag-of-words model and algorithms for NLP
Part 8 - Deep Learning: Artificial Neural Networks, Convolutional Neural Networks
Part 9 - Dimensionality Reduction: PCA, LDA, Kernel PCA
Part 10 - Model Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid Search, XGBoost

重點：

1) 初步了解、並選擇幾個ML工具進入八月精進名單

2) Deep Learning 概念掌握

3) 學習老師的 Python 語法

注意：

1) 要另外花時間補一些統計與數學知識

2) R語言先瀏覽，但不刻意學習

Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)

Extract meaning from large data sets using a wide variety of machine learning, data mining, and data science techniques with the Python programming language.
Perform machine learning on "big data" using Apache Spark and its MLLib package.
Design experiments and interpret the results of A/B tests
Visualize clustering and regression analysis in Python using matplotlib
Produce automated recommendations of products or content with collaborative filtering techniques
Apply best practices in cleaning and preparing your data prior to analysis

Regression analysis
K-Means Clustering
Principal Component Analysis
Train/Test and cross validation
Bayesian Methods
Decision Trees and Random Forests
Multivariate Regression
Multi-Level Models
Support Vector Machines
Reinforcement Learning
Collaborative Filtering
K-Nearest Neighbor
Bias/Variance Tradeoff
Ensemble Learning
Term Frequency / Inverse Document Frequency
Experimental Design and A/B Tests

重點：

1) 比較與課程2的講法差異，選擇幾個ML工具進入八月精進名單

2) Apache Spark 概念掌握

3) 學習老師的 Python 語法，與課程2比較

注意：

1) 要另外花時間補一些統計與數學知識

Course4. Deep Learning A-Z：Hands-on Artificial Neural Networks (22.5 Hours)

Understand the intuition behind Artificial Neural Networks
Apply Artificial Neural Networks in practice
Understand the intuition behind Convolutional Neural Networks
Apply Convolutional Neural Networks in practice
Understand the intuition behind Recurrent Neural Networks
Apply Recurrent Neural Networks in practice
Understand the intuition behind Self-Organizing Maps
Apply Self-Organizing Maps in practice
Understand the intuition behind Boltzmann Machines
Apply Boltzmann Machines in practice
Understand the intuition behind AutoEncoders
Apply AutoEncoders in practice

重點：

1) 掌握Artificial Neural Networks概念與目前發展狀況

2) 先挑一個深入精進

快速概念參考:

Deep Learning（深度学习）学习笔记整理系列之（一）

一文讀懂卷積神經網絡CNN 、【青年學者專欄】遞歸神經網路

深度學習之autoencoder、 Deep Learning】林軒田機器學習技法

Self-organizing map

搜尋此網誌

Data Science Learning Record

資料科學從頭學（二）初學．Udemy 4門初學課程

留言

張貼留言

這個網誌中的熱門文章

統計從頭學(二) 假設檢定入門

資料科學從頭學(五) Linear Regression

資料科學從頭學(四) SVM(線性)