資料科學從頭學(三)加快速度
(文章內容皆為記錄本人之學習過程,非以分享為目的)
Udemy 4門資料科學相關課程(總時數 93.5小時)全聽完有點太慢,
為了快速開始進行研究主題,必須再大幅縮短資學習時間,因此先挑選
Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)
先把常用的機器學習演算法看過一遍,並且將演算法分類後挑選出第一階段要實作的演算法,
分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6
以下是我個人初步分類的準則:
其中速度和精準度目前我還沒有足夠知識去評估,因此先跳過。
以上準則各別為:
參考資料:
http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html
The top 16 topics in machine learning from 31 leading journals between 2007 and 2016 ( scholarship)
參考資料:
https://arxiv.org/abs/1703.10121
以上參考資料先初步挑選第一階段要熟悉的演算法:
(還沒深思過,先初步列出來)
以上大概都有Python 相關套件可以引入,基本上先用套件玩一下資料,
體驗與了解演算法用途,日後再來想辦法嘗試自己寫code。
Udemy 4門資料科學相關課程(總時數 93.5小時)全聽完有點太慢,
為了快速開始進行研究主題,必須再大幅縮短資學習時間,因此先挑選
Course3. Data Science and Machine Learning with Python - Hands On (9 Hours)
先把常用的機器學習演算法看過一遍,並且將演算法分類後挑選出第一階段要實作的演算法,
分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6
以下是我個人初步分類的準則:
•Purpose
•Data characteristic
•Most popular
•Speed demanded
•Accuracy demanded
其中速度和精準度目前我還沒有足夠知識去評估,因此先跳過。
以上準則各別為:
Purpose:
•Classification
•Prediction value
•Discovering structure (Clustering)
•Finding relationship
•Dimension reduction
•Special purpose (EX: Image recognition )
分類參考文章:
https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6
Data
characteristic:
•Have answer or not
•Type:Numeral、String、 Boolean 、 Time Series 、Space、Text、Other Media Data
The top 13 algorithms and methods used by data scientists
( industry)
1.Regression
2.Clustering
3.Decision
Trees/Rules
4.Visualization
5.K-Nearest
Neighbor
6.PCA
(Principal Component Analysis)
7.Statistics
8.Random
Forests
9.Time
series/Sequence
10.Text
Mining
11.Ensemble
method
12.SVM
13.Boosting
(ensemble method)
14.Neural
network
15.Optimization
16.Naïve
Bayes
參考資料:
http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html
The top 16 topics in machine learning from 31 leading journals between 2007 and 2016 ( scholarship)
1.Support
vector machine
2.Neural
network
3.Data
set
4.Objective
function (Deep Learning)
5.Markov
random field
6.Feature
space
7.Generative
model
8.Linear
matrix inequality
9.Gaussian
mixture model
10.Principal
component analysis
11.Hidden
Markov model
12.Conditional
random field
13.Graphical
model
14.
Maximum likelihood estimation
15.
Clustering algorithm
16.
Nearest neighbors
參考資料:
https://arxiv.org/abs/1703.10121
以上參考資料先初步挑選第一階段要熟悉的演算法:
(還沒深思過,先初步列出來)
•Decision Trees ( Advanced :Random Forests )
•K Nearest Neighbor
•Naive Bayes classifiers
•Linear Regression ( Advanced:non-linear regression)
•Principal Component Analysis
•Support Value Machine(Kernel trick)
•Clustering
•Neural Network ( Advanced: Deep learning)
以上大概都有Python 相關套件可以引入,基本上先用套件玩一下資料,
體驗與了解演算法用途,日後再來想辦法嘗試自己寫code。
留言
張貼留言