資料科學從頭學(三)加快速度

(文章內容皆為記錄本人之學習過程,非以分享為目的)



Udemy 4門資料科學相關課程(總時數 93.5小時)全聽完有點太慢,

為了快速開始進行研究主題,必須再大幅縮短資學習時間因此先挑選

 Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)

先把常用的機器學習演算法看過一遍,並且將演算法分類後挑選出第一階段要實作的演算法,

分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6


以下是我個人初步分類的準則:
Purpose
Data characteristic
Most popular
Speed demanded

Accuracy  demanded

其中速度和精準度目前我還沒有足夠知識去評估,因此先跳過。




以上準則各別為:
Purpose
Classification
Prediction value
Discovering structure Clustering
Finding relationship
Dimension reduction 
Special purpose EXImage recognition

分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6


Data characteristic
Have answer or not

TypeNumeralString Boolean Time Series SpaceTextOther Media Data





The top 13 algorithms and methods used by data scientists ( industry)
1.Regression
2.Clustering
3.Decision Trees/Rules
4.Visualization
5.K-Nearest Neighbor
6.PCA (Principal Component Analysis)
7.Statistics
8.Random Forests
9.Time series/Sequence
10.Text Mining
11.Ensemble method
12.SVM
13.Boosting (ensemble method)
14.Neural network
15.Optimization

16.Naïve Bayes

參考資料:
http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html



The top 16 topics in machine learning from 31 leading journals between 2007 and 2016  ( scholarship)
1.Support vector machine
2.Neural network
3.Data set
4.Objective function  (Deep Learning)
5.Markov random field
6.Feature space
7.Generative model
8.Linear matrix inequality
9.Gaussian mixture model
10.Principal component analysis
11.Hidden Markov model
12.Conditional random field
13.Graphical model
14. Maximum likelihood estimation
15. Clustering algorithm

16. Nearest neighbors

參考資料:
https://arxiv.org/abs/1703.10121



以上參考資料先初步挑選第一階段要熟悉的演算法:
(還沒深思過,先初步列出來)
Decision Trees Advanced Random Forests
K Nearest Neighbor
Naive Bayes classifiers
Linear Regression Advancednon-linear regression
Principal Component Analysis
Support Value MachineKernel trick
Clustering

Neural Network  AdvancedDeep learning

以上大概都有Python 相關套件可以引入,基本上先用套件玩一下資料,
體驗與了解演算法用途,日後再來想辦法嘗試自己寫code。




留言

這個網誌中的熱門文章

統計從頭學(二) 假設檢定入門

資料科學從頭學(五) Linear Regression

資料科學從頭學(四) SVM(線性)