Pytorch 自用 Scheduler 分享

Scheduler

Warm-up + CosineAnnealingLR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import math

warm_up_iter = 10
T_max = 50
lr_max = 0.1
lr_min = 1e-5

def WarmupCosineAnnealingLR(cur_iter):
if cur_iter < warm_up_iter:
return (lr_max - lr_min) * (cur_iter / warm_up_iter) + lr_min
else:
return lr_min + 0.5 * (lr_max - lr_min) * (
1 + math.cos((cur_iter - warm_up_iter) / (T_max - warm_up_iter) * math.pi)
)

warm_up_iter 步,学习率从 lr_min 线性增加到 lr_max;后 T_max - warm_up_iter 步,学习率按照余弦退火从 lr_max 降到 lr_min

WarmupCosineAnnealingLR

Warm-up + ExponentialLR

1
2
3
4
5
6
7
8
9
10
11
12
13
import math

warm_up_iter = 10
T_max = 50
lr_max = 0.1
lr_min = 1e-5

def WarmupExponentialLR(cur_iter):
gamma = math.exp(math.log(lr_min / lr_max) / (T_max - warm_up_iter))
if cur_iter < warm_up_iter:
return (lr_max - lr_min) * (cur_iter / warm_up_iter) + lr_min
else:
return lr_max * gamma ** (cur_iter - warm_up_iter)

warm_up_iter 步,学习率从 lr_min 线性增加到 lr_max;后 T_max - warm_up_iter 步,学习率按照指数衰减从 lr_max 降到 lr_min

WarmupExponentialLR

Warm-up + StepLR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import math

warm_up_iter = 10
T_max = 50
lr_max = 0.1
lr_min = 1e-5
step_size = 3

def WarmupStepLR(cur_iter):
gamma = math.exp(math.log(lr_min / lr_max) / ((T_max - warm_up_iter) // step_size))
if cur_iter < warm_up_iter:
return (lr_max - lr_min) * (cur_iter / warm_up_iter) + lr_min
else:
return lr_max * gamma ** ((cur_iter - warm_up_iter) // step_size)

warm_up_iter 步,学习率从 lr_min 线性增加到 lr_max;后 T_max - warm_up_iter 步,学习率按照指数衰减从 lr_max 降到 lr_min,每 step_size 步衰减一次。

WarmupStepLR

可视化对比

Learning Rate Schedulers

Pytorch 模板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import math
import torch
from torch import nn, optim
from torch.optim.lr_scheduler import LambdaLR

warm_up_iter = 10
T_max = 50
lr_max = 0.1
lr_min = 1e-5

def WarmupCosineAnnealingLR(cur_iter):
if cur_iter < warm_up_iter:
return (lr_max - lr_min) * (cur_iter / warm_up_iter) + lr_min
else:
return lr_min + 0.5 * (lr_max - lr_min) * (
1 + math.cos((cur_iter - warm_up_iter) / (T_max - warm_up_iter) * math.pi)
)

model = nn.Linear(1, 1)
optimizer = optim.AdamW(model.parameters(), lr=0.1)
scheduler = LambdaLR(optimizer, lr_lambda=WarmupCosineAnnealingLR)

for epoch in range(T_max):
train(...)
valid(...)
scheduler.step()