第一章.Classification -- 03.Statistical Learning Theory for Supervised Learning翻译_综合

he key principle in statistical learning theory is the principle of Ockham’s razor.

Now Ockham’s razor is the idea that the best models are simple models that fit the

data well and it was named after the English friar and philosopher who said that among

hypothesis that predict equally well, we should choose the one with the fewest assumptions.

I’m sure he didn’t sort of have statistical learning theory in mind when he said that,

but that’s where that expression comes from.

Ok, so let’s start with a basic one dimension regression model.

This model I have on the screen there is not good; it’s really over-fitted.

So it’s just not going to generalize well to new points; it just can’t predict well.

Now let’s say that I have some way to measure model complexity,

and the more complex the models I have, the more they tend to over-fit and the simpler models,

they tend to under-fit. Now this plot is the key to understanding learning theory.

If I plot training error, which is this curve over here, then as the models grow more and more complex, the training error continues to decrease, because I can just over-fit more and more.

But at the same time, if I do that, the test error gets worse and worse.

If I, on the other hand, under-fit, then I won’t do well for either training or test.

Where I want to go, is this sweet spot in the middle here.

And the idea of this plot it holds true for classification, progression, whatever.

So the idea is that the best models are simple models that fit the data well.

So what we need is a balance between accuracy and simplicity.

Now, Ockham probably didn’t know optimization,

but this would have suited him fine, I’m guessing, if he were alive now.

So the most common machine learning methods,

they choose their function f, to minimize training error and model complexity,

which aims to thwart the cursive dimensionality.

So the cursive dimensionality is that we tend to over-fit when we have a lot of features and not as much data.

Data needs to increase exponentially with the number of features in order not to have this issue of over-fitting.

So we’re going to choose a model that’s both simple – low complexity – and has low training error;

and this exactly is the principle of Ockham’s razor.

And simplicity is measured in several different ways,

and is usually called regularization machine learning.

So this is the main foundation of machine learning;

it’s all about creating functions that minimize the loss, but also keep the model simple.

And this is the bottom line, folks;

we’re going to do this in many different ways throughout the course.

Different machine learning methods have different loss functions and they have different regularization terms.

And so, as we go on, you’ll see more and more inside these machine learning methods,

because I’ll tell you what all of these terms are.

他在统计学习理论中的关键原则是奥卡姆剃刀原理。

现在Ockham的剃刀是一个想法，最好的模型是适合的简单模型。

数据很好，它是以英国修士和哲学家的名字命名的。

假设预测同样好，我们应该选择最少的假设。

我相信他在说到这一点时并没有想到统计学的学习理论，

但这就是这个表达的来源。

好的，让我们从一个基本的一维回归模型开始。

我在屏幕上的这个模型不太好;很合身的”。

所以它不会推广到新的点;它只是不能很好地预测。

现在假设我有一种方法来测量模型的复杂性，

我拥有的模型越复杂，它们就越倾向于过度拟合和简化模型，

他们倾向于under-fit。这个情节是理解学习理论的关键。

如果我画的是训练误差，也就是这条曲线，随着模型越来越复杂，训练误差会继续减小，因为我可以越来越多地适应。

但与此同时，如果我这样做，测试误差会越来越大。

另一方面，如果我不适合，那么我就不会在训练和测试中表现出色。

我想去的地方，就是中间的这个甜蜜点。

这个情节的概念适用于分类，进展，等等。

所以我们的想法是，最好的模型是简单的模型，适合于数据。

所以我们需要的是在准确性和简单性之间找到平衡。

Ockham可能不知道优化，

但如果他现在还活着，这对他很合适，我猜。

最常用的机器学习方法，

他们选择函数f，最小化训练误差和模型复杂度，

它的目的是阻止曲线的维度。

所以曲线维度是当我们有很多特征时我们倾向于过度拟合而不是大量的数据。

数据需要以指数形式增加，以避免出现过度拟合的问题。

所以我们要选择一个既简单又复杂的模型，而且训练误差很小;

这就是奥卡姆剃刀原理。

简单是用几种不同的方式来衡量的，

通常被称为正规化机器学习。

这是机器学习的主要基础;

这一切都是为了创建能够最小化损失的函数，同时也保持模型的简单性。

这是底线，伙计们;

在整个课程中，我们会用很多不同的方法来做这个。

不同的机器学习方法有不同的损失函数，它们有不同的正则化条件。

所以，当我们继续，你会看到越来越多的机器学习方法，

因为我要告诉你们所有这些项是什么。