International large-scale assessments such as PISA are increasingly being used to benchmark the academic performance of young people across the world. Yet many of the technicalities underpinning these datasets are miss-understood by applied researchers, who sometimes fail to take into account their complex survey and test designs. The aim of this paper is to generate a better understanding amongst economists about how such databases are created, and what this implies for the empirical methodologies one should or should not apply. In doing so, we explain how some of the modelling strategies preferred by economists is at odds with the design of these studies. In doing so, we hope to generate a better understanding of international large-scale education datasets, and promote better practice in their use.