How does ChatGPT work

Created: 14 Mar 2023, 11:09 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, KnowledgeSharing

Adding one word at a time (based on prompt, see highest ranked predicted next word, choose the next word randomly from the list of probs of next word)
- Temperature - randomness to select the next word
- How to get probs?
  - N-grams
    - Difficult, not great output. Uses some large corpus of text, then based on N previous letters / words - predict the next word
    - Large num possibilities
  - LLM
One hot encoding / vector rep
- Not taking the context into consideration
- Use embedding instead
Embeddings
- Similar words are clustered together in embedding space
With the new embedding get a probability for the next word
- How to select the next word? Is it based on the similarity / distance search to the computed embedding?
- Once you get the new embedding - how to get probs of next word?
- Probability comes from the softmax output for all possible (50k) words that is in the dictionary, where probability - prob of that word being the next word in sentence.

History:

Transformer → encoder decoder for translation task

BERT

Stack of encoder layers
Get embeddings for sentences
Wordpiece tokenisation
- Surfboard ⇒ “surf”, “board”
- Swimming ⇒ “swim”, “ing”
- Will learn the context of the word pieces
Trained on 2 tasks
- Masked language model: Masking of words in senteces, predict masked word
- Next senctence prediction
Finetuning
- Actually also need a huge dataset isntead of just what ppl think finetuning need smol dataset only

GPT

Decoder only (what does this mean? Start from the embedding?)
preTrained on only next word prediction
No arch change involved, is more about the finetuning based on the pretrained model
Do finetuning by adding a linear layer

GPT-2/3

Combination of encoder and decoder?

Generative capability

Contextual

Encoder /decoder

Closest to chatgpt - instructgpt (that was published)

How to use gpt to overcome misalignment? Using RLHF (reinforcement learning w human feedbacl_

What is KL Loss?

Darius Knowledge Hub