Concurrent validity and feasibility of short tests currently used to measure early childhood development in large scale studies

M. Caridad Araujo
Orazio Attanasio
Sally Grantham-McGregor
Pablo Muñoz
Marta Rubio Codina

Published on 22 August 2016

In low- and middle-income countries (LIMCs), measuring early childhood development (ECD) with standard tests in large scale surveys and evaluations of interventions is difficult and expensive. Multi-dimensional screeners and single-domain tests (‘short tests’) are frequently used as alternatives. However, their validity in these circumstances is unknown. We examined the feasibility, reliability, and concurrent validity of three multi-dimensional screeners (Ages and Stages Questionnaires (ASQ-3), Denver Developmental Screening Test (Denver-II), Battelle Developmental Inventory screener (BDI-2)) and two single-domain tests (MacArthur-Bates Short-Forms (SFI and SFII), WHO Motor Milestones (WHO-Motor)) in 1,311 children 6–42 months in Bogota, Colombia. The scores were compared with those on the Bayley Scales of Infant and Toddler Development (Bayley-III), taken as the ‘gold standard’. The Bayley-III was given at a center by psychologists; whereas the short tests were administered in the home by interviewers, as in a survey setting. Findings indicated good internal validity of all short tests except the ASQ-3. The BDI-2 took long to administer and was expensive, while the single-domain tests were quickest and cheapest and the Denver-II and ASQ-3 were intermediate. Concurrent validity of the multi-dimensional tests’ cognitive, language, and fine motor scales with the corresponding Bayley-III scale was low below 19 months. However, it increased with age, becoming moderate-to-high over 30 months. In contrast, gross motor scales’ concurrence was high under 19 months and then decreased. Of the single-domain tests, the WHO-Motor had high validity with gross motor under 16 months, and the SFI and SFII expressive scales showed moderate correlations with language under 30 months. Overall, the Denver-II was the most feasible and valid multi-dimensional test and the ASQ-3 performed poorly under 31 months. By domain, gross motor development had the highest concurrence below 19 months, and language above. Predictive validity investigation is needed to further guide the choice of instruments for large scale studies.