
一份 Python 机器学习在线指南

大数据技术实战 406




首先附上这份指南的 GitHub 地址:


这份指南主要是提供一个全面而简单的使用 Python 的机器学习课程。机器学习作为人工智能的工具,是应用最广泛的科学领域之一。大量关于机器学习的文献已经发表。这个项目的目的是通过展示一系列使用 Python 的简单而全面的教程来提供机器学习的最重要方面。在这个项目中,我们使用许多不同的众所周知的机器学习框架(如 scikit-learn)构建了我们的教程。在这个项目中,你将学到:



机器学习基础线性回归过拟合/欠拟合正则化交叉验证监督式学习决策树kNN朴素贝叶斯逻辑回归支持向量机非监督式学习聚类主成分分析 PCA深度学习神经网络概述卷积神经网络自编码器循环神经网络


1. 机器学习基础




import matplotlib.pyplot as pltimport seaborn as snsfrom sklearn import datasets, linear_modelfrom sklearn.datasets import make_regressionfrom sklearn.model_selection import train_test_split# Create a data set for analysisx, y = make_regression(n_samples=500, n_features = 1, noise=25, random_state=0)# Split the data set into testing and training datax_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)# Plot the datasns.set_style("darkgrid")sns.regplot(x_test, y_test, fit_reg=False)# Remove ticks from the plotplt.xticks([])plt.yticks([])plt.tight_layout()plt.show()

2. 监督式学习




# All the libraries we need for linear SVMimport numpy as npimport matplotlib.pyplot as pltfrom sklearn import svm# This is used for our datasetfrom sklearn.datasets import load_breast_cancer# =============================================================================# We are using sklearn datasets to create the set of data points about breast cancer# Data is the set data points# target is the classification of those data points. # More information can be found at =============================================================================dataCancer = load_breast_cancer()# The data[:, x:n] gets two features for the data given. # The : part gets all the rows in the matrix. And 0:2 gets the first 2 columns # If you want to get a different two features you can replace 0:2 with 1:3, 2:4,... 28:30, # there are 30 features in the set so it can only go up to 30.# If we wanted to plot a 3 dimensional plot then the difference between x and n needs to be 3 instead of twodata = dataCancer.data[:, 0:2]target = dataCancer.target# =============================================================================# Creates the linear svm model and fits it to our data points# The optional parameter will be default other than these two,# You can find the other parameters at  =============================================================================model = svm.SVC(kernel = 'linear', C = 10000)model.fit(data, target)# plots the points plt.scatter(data[:, 0], data[:, 1], c=target, s=30, cmap=plt.cm.prism)# Creates the axis bounds for the gridaxis = plt.gca()x_limit = axis.get_xlim()y_limit = axis.get_ylim()# Creates a grid to evaluate modelx = np.linspace(x_limit[0], x_limit[1], 50)y = np.linspace(y_limit[0], y_limit[1], 50)X, Y = np.meshgrid(x, y)xy = np.c_[X.ravel(), Y.ravel()]# Creates the decision line for the data points, use model.predict if you are classifying more than two decision_line = model.decision_function(xy).reshape(Y.shape)# Plot the decision line and the marginsaxis.contour(X, Y, decision_line, colors = 'k', levels=[0], linestyles=['-'])# Shows the support vectors that determine the desision lineaxis.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=100,linewidth=1, facecolors='none', edgecolors='k')# Shows the graphplt.show()

3. 非监督式学习

这部分主要包含了一些机器学习中的非监督式学习,包括聚类、主成分分析 PCA。

指南同样包含了各个算法的理论介绍和 Python 代码实现两大部分。

4. 深度学习





这份机器学习完备教程除了在 GitHub 上发布之外,作者也公布了该教程的在线阅读地址:

标签: #python在线