Grokking, Deep Double Descent, Overfitting

Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge

Overview

Introduction

List of Links

Grokking

Double Descent

Deep double descent | OpenAI Given the increased capacity of the model, with the same data size, the model performance first increase and decrease and increase again.

Given increased model capacity, with some middle range of labelled data, the model can “memorize and generalize” the basic pattern faster.
This is where a fresh init model can perform better.
Then, they move beyond the “peak” in double descent to the regime where more complex patterns are to be learned to further enhance the performance (with increased data size).
The current LLM are more in the first regime of the double descent, that is why they need large capacity to accommodate to large data size.
- Current LLM is still in the first regime without exploring enough of the potential of the capacity. All the recent large LLM models mostly didn’t even complete one epoch of training

Darius Knowledge Hub

Explorer

Grokking, Deep Double Descent, Overfitting

Grokking, Deep Double Descent, Overfitting

Overview

Introduction

Grokking

Double Descent

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Darius Knowledge Hub

Explorer

Grokking, Deep Double Descent, Overfitting

Grokking, Deep Double Descent, Overfitting

Overview

Related fields

Introduction

Grokking

Double Descent

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents