Domain specificity in language models
Created: 28 Mar 2023, 11:12 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge, KnowledgeSharing
NLP - essentially classification for the next word based on the context of the previous words, from the tokens
Long tail distribution, freq of words lesser
- Every token same level of imptce in masking (BERT), regardless of the freq of words. Rare words are also masked as much as freq words
- It doesn’t matter - since the dataset is huge, or
- Transformers - if more attention to the context of words being used, will be considered more freq - even if it occurs less freq
- Top p sampling
- Take varying number of tokens based on the probability
- Instead of top k, which is a hard fixed number that you take from the tokens
Perplexity → 1 means less confused, larger (closer to size of num tokens), more confused
Computer vision having a latent space? But NLP don’t have?
- What is latent space? Latent space vs embedding space?