資料科學從頭學（三）加快速度

7月 05, 2017

（文章內容皆為記錄本人之學習過程，非以分享為目的）

Udemy 4門資料科學相關課程(總時數 93.5小時)全聽完有點太慢，

為了快速開始進行研究主題，必須再大幅縮短資學習時間，因此先挑選

Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)

先把常用的機器學習演算法看過一遍，並且將演算法分類後挑選出第一階段要實作的演算法，

分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6

以下是我個人初步分類的準則:

•Purpose

•Data characteristic

•Most popular

•Speed demanded

•Accuracy demanded

其中速度和精準度目前我還沒有足夠知識去評估，因此先跳過。

以上準則各別為:

Purpose：

•Classification

•Prediction value

•Discovering structure （Clustering）

•Finding relationship

•Dimension reduction

•Special purpose （EX： Image recognition ）

分類參考文章:

https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6

Data characteristic：

•Have answer or not

•Type：Numeral、String、 Boolean 、 Time Series 、Space、Text、Other Media Data

The top 13 algorithms and methods used by data scientists ( industry)

1.Regression

2.Clustering

3.Decision Trees/Rules

4.Visualization

5.K-Nearest Neighbor

6.PCA (Principal Component Analysis)

7.Statistics

8.Random Forests

9.Time series/Sequence

10.Text Mining

11.Ensemble method

12.SVM

13.Boosting (ensemble method)

14.Neural network

15.Optimization

16.Naïve Bayes

參考資料:
http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html

The top 16 topics in machine learning from 31 leading journals between 2007 and 2016 ( scholarship)

1.Support vector machine

2.Neural network

3.Data set

4.Objective function (Deep Learning)

5.Markov random field

6.Feature space

7.Generative model

8.Linear matrix inequality

9.Gaussian mixture model

10.Principal component analysis

11.Hidden Markov model

12.Conditional random field

13.Graphical model

14. Maximum likelihood estimation

15. Clustering algorithm

16. Nearest neighbors

參考資料:
https://arxiv.org/abs/1703.10121

以上參考資料先初步挑選第一階段要熟悉的演算法:
(還沒深思過，先初步列出來)

•Decision Trees （ Advanced ：Random Forests ）

•K Nearest Neighbor

•Naive Bayes classiﬁers

•Linear Regression （ Advanced：non-linear regression）

•Principal Component Analysis

•Support Value Machine（Kernel trick）

•Clustering

•Neural Network （ Advanced： Deep learning）

以上大概都有Python 相關套件可以引入，基本上先用套件玩一下資料，
體驗與了解演算法用途，日後再來想辦法嘗試自己寫code。

搜尋此網誌

Data Science Learning Record

資料科學從頭學（三）加快速度

留言

張貼留言

這個網誌中的熱門文章

統計從頭學(二) 假設檢定入門

資料科學從頭學(五) Linear Regression

資料科學從頭學(四) SVM(線性)