[Learning theory] Bias variance trade-off (Hoeffding inequality)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

[Learning theory] Bias variance trade-off (Hoeffding inequality) 본문

공부/Machine Learning

[Learning theory] Bias variance trade-off (Hoeffding inequality)

mariabeetle 2019. 10. 12. 20:00

유한 개의 Hypothesis set을 가정하고, training error를 가장 적게 하는 hypothesis에 대해서 generalization error bound를 구함.
위에서 구한 generalization error bound를 통해 learning theory 관점에서 bias variance trade-off를 볼 수 있음.

1. Preliminaries

Lemma. (The union bound)

Let $A_1, A_2, ..., A_k$ be $k$ different events(that may not be independent). Then
$P(A_1 \cup \cdots \cup A_k) \leq P(A_1) + \cdots + P(A_k)$

Lemma. (Hoeffding inequality) (Chernoff bound)

Let $Z_1, ..., Z_m$ be $m$ independent and indentically distributed(iid) random variables drawn from a Bernoulli( $\phi$ ) distribution(I.e. $P(Z_i = 1) = \phi,$ and $P(Z_i = 0) = 1-\phi$ ).
Let $\hat{\phi} = \frac{1}{m} \sum_{i=1} ^m Z_i$ be the mean of these random variables, and let any $\gamma > 0$ be fixed. Then
$P(|\phi - \hat{\phi}| > \gamma) \leq 2\text{exp}(-2\gamma^2 m)$

Assumption

binary classifiation with the labels are $y \in {0, 1}$
Training set $S = \{(x^{(i)} , y^{(i)}); i = 1, ..., m\}$ of size $m$ , drawn iid from some probability distribution $D$ .

Training error(empirical risk or empirical error)

$\hat{\epsilon}(h) = \frac{1}{m} \sum_{i=1}^m1\{h(x)^{(i)} \neq y^{(i)}\}$

Generalization error

$\epsilon(h) = P_{(x, y) \sim D}(h(x) \neq y)$

Hypothesis class $\mathcal{H}$

$H = \{h_\theta: h_\theta(x) = 1\{\theta ^T x \geq 0\}, \theta \in \mathbb{R}^{n+1}\}$
Decision boundary is linear

Empirical risk minimization

$\hat{h} = \arg \min_{h \in \mathcal{H}} \hat{\epsilon}(h)$

2. The case of finite $\mathcal{H}$

$\mathcal{H} = {h_1, ..., h_k}$
$\mathcal{H}$ is a set of $k$ functions mapping from $\mathcal{X}$ to {0, 1}
$\hat{h}$ : smallest training error를 보여주는 hypothesis function 안에서의 함수.
목표 : $\hat{h}$ 에 대해 generalization error를 구해보자.
$h_i \in \mathcal{H}$ hypothesis function $h_i$ 고정.
Z=1{hi(x)≠y}
- $Z_j = 1\{h_i (x^{(j)} \neq y^{(j)})\}$ : j번째 데이터셋에 대한 $Z_j$ 값
- Training set이 $D$ 에서 샘플링되었기 때문에, $Z_j$ 들도 같은 distribution을 가지게 됨.

Training error

$\hat{\epsilon} (h_i) = \frac{1}{m}\sum_{j=1}^m Z_j$
$\hat{\epsilon}(h_i)$ 는 m개의 random variables $Z_j$ 에 대한 평균값이고, 각각의 $Z_j$ 는 평균이 $\epsilon(h_i)$ 인 Bernoulli distribution에서 sampling되었음.

Apply Hoeffding inequality

$P(|\epsilon(h_i) - \hat{\epsilon}(h_i)| > \gamma) \leq 2 \text{exp}(-2 \gamma^2 m)$

$h_i$ 하나에 대해서만 정의하지 않고, 모든 $h \in \mathcal{H}$ 에 대해 정의해보자.

Uniform convergence

$A_i$ : $|\epsilon(h_i) - \hat{\epsilon(h_i)}| > \gamma$ 인 event
$P(A_i) \leq 2 \text{exp}(-2 \gamma^2 m)$ => 여기에 uniform bound 적용

Three quantities
- $m$ : training data 개수
- $\gamma$ : 평균과의 차이(거리)
- probability of error : $2k \text{exp}(-2\gamma^2 m)$
Sample complexity : training error와 generalization error 차이가 $\gamma$ 보다 작게 하기 위해서 얼마나 많은 training dataset이 필요할까?
- $\delta \geq 2k\text{exp}(-2 \gamma^2 m)$ => m에 대해서 풀자.
- $m \geq \frac{1}{2\gamma^2}\log\frac{2k}{\delta}$

3. Generalization bound

$\delta = 2k\text{exp}(-2 \gamma^2 m)$ => $\gamma$ 에 대해서 풀자.
$\sqrt{\frac{1}{2m} \log \frac{2k}{\delta}} = \gamma$
$|\hat{\epsilon}(h) - \epsilon(h)| \leq \gamma \cdots (1)$
$|\hat{\epsilon}(h) - \epsilon(h)| \leq \sqrt{\frac{1}{2m} \log \frac{2k}{\delta}}$
$\hat{h} = \arg \min_{h \in \mathcal{H}} \hat{\epsilon}(h)$ ; training error를 가장 작게 해주는 hypothesis function
$h^* = \arg\min_{h \in \mathcal{H}} \epsilon(h)$ : generalization error를 가장 작게 해주는 hypothesis function

$\epsilon(\hat{h}) \leq \hat{\epsilon}(\hat{h}) + \gamma$ ; 위의 부등식 (1), $\hat{h}$ 의 정의가 $\hat{\epsilon}(\cdot)$ 함수의 최솟값
$\hat{\epsilon}(\hat{h})$ 는 $\hat{\epsilon}(h^*)$ 보다 작거나 같음.
부등식 (1), $h^*$ 는 $\epsilon (\cdot)$ 함수의 최솟값.

reference = http://cs229.stanford.edu/

저작자표시

'공부 > Machine Learning' 카테고리의 다른 글

[ML] Convolution 정리 (1)	2020.09.08
유전알고리즘 시험 내용 요약 (0)	2019.12.06

'공부/Machine Learning' Related Articles

Comments

Archive 인공지능 관련 공부를 정리한 블로그 입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Archive

Archive

[Learning theory] Bias variance trade-off (Hoeffding inequality) 본문

[Learning theory] Bias variance trade-off (Hoeffding inequality)