intro

i put together a table of various metrics for all the main gpt models. i think this is a useful resource for selecting appropriate models based on needs and computing costs.

data

MODEL LAMBADA PPL LAMBADA ACC
═ GPT-2 Family
GPT-2-117M 35.130 45.99%
GPT-2-345M 15.600 55.48%
GPT-2-762M 10.870 60.12%
GPT-2-1542M 8.630 63.24%
═ GPT-3 Family
GPT-3-124M 18.600 42.70%
GPT-3-350M 9.090 54.30%
GPT-3-Ada 9.950 51.60%
GPT-3-760M 6.530 60.40%
GPT-3-1.3B 5.440 63.60%
GPT-3-Babbage 5.580 62.40%
GPT-3-2.7B 4.600 67.10%
GPT-3-6.7B 4.000 70.30%
GPT-3-Curie 4.000 68.50%
GPT-3-13B 3.560 72.50%
GPT-3-175B 3.000 76.20%
GPT-3-Davinci 2.970 74.80%
═ GPT-Neo Family
GPT-Neo-125M 30.266 37.36%
GPT-Neo-350M 13.876 47.27%
GPT-Neo-1.3B 7.498 57.23%
GPT-Neo-2.7B 5.626 62.22%

sources

gpt2 data is from the paper, gpt3 data is from the eleuther blog, gpt neo 1.3B/2.7B data is from their readme, and gpt neo 125M/350M data is from my own testing with the eleuther harness.