[DL] Skip-Thought Vectors

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

Research Notes

[DL] Skip-Thought Vectors 본문

Study/Deep Learning

[DL] Skip-Thought Vectors

jiachoi 2022. 9. 21. 21:16

Research Question
Is there a task and a corresponding loss that will allow us to learn highly generic sentence representations?

Proposed Methods

generic sentence embedding/representation을 통해, 현재 문장을 가지고 이전/이후 문장을 예측할 수 있는 모델을 개발함

단어 확장 방법(vocabulary expansion method)을 통해, encode된 word가 학습 시에만 활용되는 것이 아니라 수만개의 단어로 확장될 수 있도록 함

학습 후, 추출된 vector들은 8가지 task에 활용할 수 있음 (semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets.)

1. Skip-Thought Model

1.1 Goal

연속된 문장 (s(i-1),s(i),s(i+1)) 가 주어졌다고 할 때(s(i): i번째 sentence), s(i)는 encoding되고, 이전 문장인 s(i-1)과 이후 문장인 s(i+1)를 복원하고자 함
위의 예시에서 input은 triplet(3개가 한 pair) 문장임
위 그림에서 이어져있지 않은 화살표는 enocder output으로 연결되고, 색은 어떤 component들끼리 파라미터를 공유하는지 나타냄. <eos>는 sentence token의 마지막이라는 표시
w(ti)가 문장 s(i)의 t번째 단어라고 할 때, x(ti)는 w(ti)의 word embedding이라고 정의함

1.2 Encoder-Decoder Model Framework

skip-thought model은 encoder-decoder model의 framework를 사용함. 모델은 encoder, decoder, objective function 세 파트로 구성됨
Encoder: 단어를 문장 벡터에 매핑함 (RNN encoder는 GRU activation가 사용됨)
Decoder: 주변 문장을 생성함 (RNN decoder는 조건부 GRU가 사용됨)
이 모델 조합은 GRU를 사용한다는 것을 제외하면, RNN encdoer-decoder 모델과 유사함

2. Encoder, Decoder, & Objective Function

2.1 Encoder

RNN encoder는 GRU activation이 사용됨
w(1i), ..., w(Ni) 는 sentence s(i)에 존재하는 단어임. N은 문장 내 단어의 개수. 각각의 time step에서 encoder는 hidden state인 h(ti)를 만들고, 이는 sequence w(1i), ..., w(Ni) 를 representation하기 위해 사용됨
hidden state h(Ni)는 전체 문장을 represent함

2.2. Decoder

RNN decoder는 conditional GRU가 사용됨
decoder는 neural language model이며, encoder output인 h(i)를 조건으로 가짐
계산과정은 encoder와 유사함
첫번째 decoder는 다음 문장인 s(i+1)를 위해 사용되며, 두번쨰 decoder는 이전 문장 s(i-1)을 위해 사용됨
각 decoder는 다른 parameter가 사용됨

2.3 Objective Function

tuple 형태로 (s(i-1),s(i),s(i+1))가 주어졌다고 할 때, 목적함수는 log probabilities의 합을 최적화함. encoder representation에 따라 앞/뒤 문장에 대해 로그 확률을 구함

왼쪽 term은 뒤 문장을 위한 것, 오른쪽 term은 앞 문장을 위한 것임. 전체 objective function은 input들이 학습되며 모두 더해짐

3. Vocabulary Expansion

encoder가 학습 시 보지 못한 단어여도, 기존에 pre-trained된 word2vec가 해당 단어들을 representation하기 위해 사용됨
V(w2v) : word embedding space (word2vec을 활용하여 word representation한)
V(rnn) : RNN word embedding space
V(w2v)의 단어들이 V(rnn)의 단어보다 훨씬 많다고 가정했을 때, 목표는 "f: V(w2v) --> V(rnn)" , 이를 매핑하는 것이며, matrix W에 의해 파라미터화됨
translation word spaces간의 linear mapping은 비선형 L2 Linear regression loss를 학습하며 진행됨. 그러므로 V(w2v)의 어떤 단어든지 V(rnn)과 매핑될 수 있음

4. Models

uni-skip : unidirectional encoder (2400 dims)
bi-skip : bidirectional model (forward, backward model, 각각 1200 dims), output은 2400 차원 vector로 나옴
combine-skip: uni-skip과 bi-skip의 결합, 4800 차원 벡터

저작자표시 (새창열림)

'Study > Deep Learning' 카테고리의 다른 글

[DL] Neural Network - XOR Problem, Back-propagation (0)	2023.07.03
[DL] Basic Concept of Deep Learning (0)	2023.07.03
[DL] Sent2Vec (Sentence2Vec) (1)	2022.09.21
[DL] Word2vec (0)	2022.09.21
[DL] Graph Neural Networks (0)	2022.04.02

'Study/Deep Learning' Related Articles

Research Notes

[DL] Skip-Thought Vectors 본문

[DL] Skip-Thought Vectors

1. Skip-Thought Model

2. Encoder, Decoder, & Objective Function

3. Vocabulary Expansion

4. Models

'Study > Deep Learning' 카테고리의 다른 글

티스토리툴바