'Error report/Pytorch' 카테고리의 글 목록

Error report/Pytorch

ERROR report: loss가 줄어들지 않을 때, custom loss, torch.sqrt 2024.03.07
ERROR report: Torch.no_grad change sequence length 2024.03.04
ERROR report: Pytorch rnnt loss, RuntimeError: output length mismatch 2024.02.26

ERROR report: loss가 줄어들지 않을 때, custom loss, torch.sqrt

호Tuck 2024. 3. 7. 21:07

2024. 3. 7. 21:07

종종 모델링을 하다보면 customize해서 Loss를 설계할 일이 생긴다.

논문에서 써 놓은 loss가 아닌 customize를 해서 Loss를 만들었다가 최적화가 되지 않는 현상이 발생했다.

아래의 Loss 가 내가 따로 만들었던 Loss이다. 원래 논문에서는 L2 norm 으로 구현했는데, 값이 너무 커서 임의로 mean값을 취해주고 다른 형태로 바꿔주었다.

ATT_loss = torch.sqrt(torch.mean(torch.abs(word_label_lengths - attention.to(device))))

처음에는 Loss니까 requires_grad가 있어야하나? 하고 이 flag도 넣어줬었다.

근데 requires_grad는 업데이트 되어야 하는 weight, 즉 파라메터를 설정할 때 넣어주는 것이지 Loss 계산에 필요한 레이블에 넣어주면 안된다. 직접적으로 update 되는 대상이 아니기 때문.

여차 저차 requires_grad 플래그도 빼고, 내가 만든 식이 적절한 식인지를 살펴보았다.

내가 만든 Loss는 MAE(Mean Absolute Loss, L1 Loss=> Error의 절대값의 평균) 의 변형과 유사하다.

MAE는 loss가 크던, 작던 항상 gradient가 일정하다. 아래의 그래프를 보면 알 수 있다.

따라서 오차가 크던, 작던, 그라디언트의 크기는 일정하게 유지되고 gradient descent 에서 파라메터 업데이트 크기가 일관된다.

반대로 L2 Norm 은 오차가 클 때 더 그라디언트가 크기 때문에, 큰 오차를 줄이는데 효과적이다.

나의 경우 초기 학습 단계에서 큰 오차를 초래한다. attention 값이 굉장히 크게 계산되기 때문. 따라서 L2 norm 을 사용하는 것이 빠른속도로 손실을 줄일 수 있다.

----------------------------------------------------------------------------------------------------------------------------------------------------------------

++ 추가

MAE, MSE Loss 차이의 문제가 아니었다.

torch.sqrt 를 사용해서 발생하는 문제였는데, 이게 gradient 를 nan으로 야기했다.

sqrt 를 미분할 경우 1/(2*sqrt(x))가 되는데, x가 0으로 근접할 수록 미분값은 발산하게 되고 실제 계산에서는 nan으로 뜨게 된다.

코드에서 torch.sqrt를 제거하니 적절히 수렴하는 것을 확인할 수 있었다.

오늘도 하나 알아간다..!

'Error report > Pytorch' 카테고리의 다른 글

ERROR report: Torch.no_grad change sequence length (0)	2024.03.04
ERROR report: Pytorch rnnt loss, RuntimeError: output length mismatch (0)	2024.02.26

ERROR report: Torch.no_grad change sequence length

호Tuck 2024. 3. 4. 10:32

2024. 3. 4. 10:32

nn.Transformer를 활용하다가 Evaluation Mode 중 sequence length가 변하는 문제가 발생했다.

Batch, sequence, feature 인데 Train때는 max length 인 65가 잘 나오다가 Evaluation 에서 바로 에러가 발생했다.

현재 나의 버전은 1.13.1+cu117 이다.

pytorch 커뮤니티에서 해당 내용을 찾을 수 있었다.

https://discuss.pytorch.org/t/torch-no-grad-changes-sequence-length-during-evaluation-mode/186176

Torch.no_grad() Changes Sequence Length During Evaluation Mode

I built a TransformerEncoder model, and it changes the output’s sequence length if I use “with torch.no_grad()” during the evaluation mode. My model details: class TransEnc(nn.Module): def __init__(self,ntoken: int,encoder_embedding_dim: int,max_item

discuss.pytorch.org

버전을 다운그레이드 하라고 나와서, 해당 글 안에있는 버전으로 다시 다운그레이드를 진행했다.

우선

pip uninstall torch torchvision torchaudio

해당 명령어로 uninstall 해주었다.

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

쿠다까지 확인하고

shape 도 제대로 나오는것 확인했다!

버전 문제였다뉘,,,

'Error report > Pytorch' 카테고리의 다른 글

ERROR report: loss가 줄어들지 않을 때, custom loss, torch.sqrt (0)	2024.03.07
ERROR report: Pytorch rnnt loss, RuntimeError: output length mismatch (0)	2024.02.26

ERROR report: Pytorch rnnt loss, RuntimeError: output length mismatch

호Tuck 2024. 2. 26. 22:42

2024. 2. 26. 22:42

https://pytorch.org/audio/main/generated/torchaudio.functional.rnnt_loss.html

Rnnt loss를 구현하다가 에러를 맞닥트렸다.

RNN Transducer에 대해 공부한다고 했는데, joint이후로 메커니즘이 어떻게 되는지 이해를 못하고 있어 발생한 문제였다.

error 문구는

RuntimeError: output length mismatch

possible path에 대한 alignment path를 그릴 때 "null"값이 존재해야한다. 이는 아래 그림을 보면 이해가 될 것.

알고리즘 상 null값이 존재하는 자리가 있어야 하기 때문에 rnnt loss의 logits의 shape에 target+1을 해주어야한다.

아래의 글을 참고하면 쉽게 이해할 수 있을 것이다.

https://lorenlugosch.github.io/posts/2020/11/transducer/

https://github.com/pytorch/audio/issues/3750#issuecomment-1964109967

I have some questions about RNNT loss. · Issue #3750 · pytorch/audio

hello I would like to ask you a question that may be somewhat trivial. The shape of logits of RNN T loss is Batch, max_seq_len, max_target_len+1, class. Why is max_target_len+1 here? Shouldn't the ...

github.com

계속이해를 못해서 torchaudio issue에다가 올렸고, 어떤 분이 댓글 달아주셨다.

다들 오늘도 파이튕

'Error report > Pytorch' 카테고리의 다른 글

ERROR report: loss가 줄어들지 않을 때, custom loss, torch.sqrt (0)	2024.03.07
ERROR report: Torch.no_grad change sequence length (0)	2024.03.04

PREV 이전 1 NEXT 다음

호떡의 개발일기