資料科學從頭學(二)初學.Udemy 4門初學課程
7、8月完成 Udemy 4門資料科學相關課程,總時數 93.5小時。
二、做法:
7月份完成課程一遍,8月份挑出不熟與需要加強的地方精進。
7月份平均每周23.4小時,等於每天4.6小時(五天) + 2小時以上練習。
( google calendar 紀錄課程與累積時數)
三、原則:
每堂課邊看邊分類出 1.基礎:一定要懂、 2.八月份精進精進名單、3.有概念就好
四、課程內容:
Course1. Data Science A-Z:Real-Life Data Science Exercises (21 Hours)
This course will give you a full overview of the Data Science journey.
Upon completing this course you will know:
- How to clean and
prepare your data for analysis
- How to perform basic
visualisation of your data
- How to model your data
- How to curve-fit your
data
- And finally, how to
present your findings and wow the audience
In this course you will develop a good understanding of the following
tools:
- SQL
- SSIS
- Tableau
- Gretl
重點:
1) 學整體概念優先於個別工具學習
2) 掌握資料庫相關概念
注意:
1) Tableau、Gretl 都是付費工具,以體驗為主,不求精進
Course2. Machine Learning A-Z:Hands-on Python & R in Data Science(41 Hours)
Structured:
- Part 1 - Data
Preprocessing (與課程1重複)
- Part 2 - Regression:
Simple Linear Regression, Multiple Linear
Regression, Polynomial Regression, SVR, Decision Tree
Regression, Random Forest Regression
- Part 3 -
Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes,
Decision Tree Classification, Random Forest Classification
- Part 4 - Clustering:
K-Means, Hierarchical Clustering
- Part 5 - Association
Rule Learning: Apriori, Eclat
- Part 6 - Reinforcement
Learning: Upper Confidence Bound, Thompson Sampling
- Part 7 - Natural
Language Processing: Bag-of-words model and algorithms for NLP
- Part 8 - Deep Learning:
Artificial Neural Networks, Convolutional Neural Networks
- Part 9 - Dimensionality
Reduction: PCA, LDA, Kernel PCA
- Part 10 - Model
Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid
Search, XGBoost
重點:
1) 初步了解、並選擇幾個ML工具進入八月精進名單
2) Deep Learning 概念掌握
3) 學習老師的 Python 語法
注意:
1) 要另外花時間補一些統計與數學知識
2) R語言先瀏覽,但不刻意學習
Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)
- Extract meaning from
large data sets using a wide variety of machine learning, data
mining, and data science techniques with the Python programming
language.
- Perform machine
learning on "big data" using Apache Spark and its MLLib
package.
- Design experiments and
interpret the results of A/B tests
- Visualize clustering and
regression analysis in Python using matplotlib
- Produce automated
recommendations of products or content with collaborative filtering
techniques
- Apply best
practices in cleaning and preparing your data prior to analysis
- Regression analysis
- K-Means Clustering
- Principal Component
Analysis
- Train/Test and cross
validation
- Bayesian Methods
- Decision Trees and
Random Forests
- Multivariate Regression
- Multi-Level Models
- Support Vector Machines
- Reinforcement Learning
- Collaborative Filtering
- K-Nearest Neighbor
- Bias/Variance Tradeoff
- Ensemble Learning
- Term Frequency /
Inverse Document Frequency
- Experimental Design and
A/B Tests
重點:
1) 比較與課程2的講法差異,選擇幾個ML工具進入八月精進名單
2) Apache Spark 概念掌握
3) 學習老師的 Python 語法,與課程2比較
注意:
1) 要另外花時間補一些統計與數學知識
Course4. Deep Learning A-Z:Hands-on Artificial Neural Networks (22.5 Hours)
- Understand the
intuition behind Artificial Neural Networks
- Apply Artificial Neural
Networks in practice
- Understand the
intuition behind Convolutional Neural Networks
- Apply Convolutional
Neural Networks in practice
- Understand the
intuition behind Recurrent Neural Networks
- Apply Recurrent Neural
Networks in practice
- Understand the
intuition behind Self-Organizing Maps
- Apply Self-Organizing
Maps in practice
- Understand the
intuition behind Boltzmann Machines
- Apply Boltzmann
Machines in practice
- Understand the
intuition behind AutoEncoders
- Apply AutoEncoders in
practice
重點:
1) 掌握Artificial
Neural Networks概念與目前發展狀況
2) 先挑一個深入精進
快速概念參考:
留言
張貼留言