updated 2022-02-06: added entry for gpt-neox-20B

intro

i put together a table of various metrics for all the main gpt models. i think this is a useful resource for selecting appropriate models based on needs and computing costs.

data

MODEL LAMBADA PPL LAMBADA ACC
═ OpenAI GPT-2 Family
GPT-2-117M1 35.130 45.99%
GPT-2-345M1 15.600 55.48%
GPT-2-762M1 10.870 60.12%
GPT-2-1542M1 8.630 63.24%
═ OpenAI GPT-3 Family
GPT-3-124M2 18.600 42.70%
GPT-3-350M2 9.090 54.30%
GPT-3-Ada2 9.950 51.60%
GPT-3-760M2 6.530 60.40%
GPT-3-1.3B2 5.440 63.60%
GPT-3-Babbage2 5.580 62.40%
GPT-3-2.7B2 4.600 67.10%
GPT-3-6.7B2 4.000 70.30%
GPT-3-Curie2 4.000 68.50%
GPT-3-13B2 3.560 72.50%
GPT-3-175B2 3.000 76.20%
GPT-3-Davinci 2.970 74.80%
═ EleutherAI GPT-Neo Family
GPT-Neo-125M3 30.266 37.36%
GPT-Neo-350M3 13.876 47.27%
GPT-Neo-1.3B4 7.498 57.23%
GPT-Neo-2.7B4 5.626 62.22%
═ EleutherAI GPT-NeoX Family
GPT-NeoX-20B5 ? 71.98%

sources


1

openai gpt2 data is from the paper

2

openai gpt3 data is from the eleuther blog

3

gpt neo 125M/350M data is from my own testing with the eleuther harness. this was done because official numbers weren't provided at the time. their own test suite was used so it should be comparable.

4

gpt neo 1.3B/2.7B data is from their readme

5

gpt neox 20B data is from the eleuther 20B announcement blog post