A Brief History of Evidence

By DavidNash @ 2025-07-21T11:30 (+9)

This post explores efforts to improve lives globally using evidence. (Part 2 of a 7 week series covering global development).

Systematic observation and rigorous testing, which played a key role in the scientific revolution and the development of clinical trials in medicine, has increasingly influenced how we understand and address other global challenges.

In recent decades, global development has seen a shift towards using evidence to inform action. This application has led to the increasing use of methodologies like randomised control trials (RCTs). This shift has generated successes, such as cost-effective interventions to combat infectious diseases, but questions remain about how much evidence can be generalised and how to implement solutions at scale.

The Evidence Revolution

Science

The pursuit of evidence as a cornerstone of understanding and progress has deep historical roots. The scientific revolution of the 16th and 17th centuries marked a fundamental shift away from reliance on tradition and authority towards systematic observation, experimentation and the formulation of testable hypotheses. It hinged on the earlier invention of the printing press by Johannes Gutenberg in the 15th century. This allowed scholarly works to be more widely read and enabled people to build upon previous knowledge, rather than starting from scratch. This acceleration in knowledge dissemination paved the way for evidence-based thinking.

Key moments:

1543 - Copernicus's heliocentric model challenged the long-held geocentric view. This was a shift in perspective, placing the Sun, rather than the Earth, at the center of the solar system and questioning humanity's central place in the universe
1610 - Galileo used the newly invented telescope to make astronomical observations, discovering the moons of Jupiter, which provided evidence supporting Copernicus and further undermining the accepted understanding of the cosmos. This demonstrated that celestial bodies could orbit something other than the Earth, weakening the view that everything revolved around our planet
Francis Bacon emphasised inductive reasoning and careful observation. His ideas about systematic experimentation and collaborative research helped lead to the establishment of the Royal Society of London (after his death), which formalised scientific investigation by providing a platform for sharing findings, replicating experiments and engaging in peer review
1687 - Newton formulated the laws of motion and universal gravitation, providing a mathematical framework for explaining a wide range of physical phenomena. They offered a unified explanation for both terrestrial and celestial motion, establishing the foundation for classical physics and enabling future technological advancements

These intellectual transformations laid the groundwork for evidence-based approaches in other domains, including a gradual evolution within academia. While the scientific revolution was driven by individual scientists, scientific societies and wealthy patrons largely outside the traditional university system^[1], the growing prestige and practical successes eventually spurred universities to incorporate scientific disciplines. Over time, universities increasingly adopted peer review and empirical methodologies as key parts of academic inquiry.

Medicine

Building upon the foundations laid by the Scientific Revolution, the medical field underwent its own transformative period. It saw a shift from relying on traditional remedies and anecdotal observations towards a scientific understanding of disease and treatment. As diseases were increasingly understood through a biological lens with the advent of germ theory, the need for evaluation of medical interventions became increasingly apparent.

Development of Vaccines: Edward Jenner's work with cowpox vaccination in 1796 demonstrated the possibility of preventing infectious diseases by stimulating the immune system with a weaker version of the disease
Germ Theory of Disease: The work of Louis Pasteur and Robert Koch established that many diseases are caused by specific microorganisms. This breakthrough revolutionised sanitation practices and laid the foundation for vaccines and antibiotics
Antiseptic treatments: In 1865 Joseph Lister's application of carbolic acid to sterilise surgical instruments and wounds dramatically reduced infections from operations
Public Health Interventions: The implementation of clean water systems, improved sanitation, and other public health measures significantly reduced mortality from infectious diseases
- John Snow is credited as a founder of modern epidemiology for studying the Broad Street cholera outbreak and his hypothesis that germ-contaminated water was the cause, rather than something in the air called ‘miasma’
- The Metropolis Water Act of 1852 introduced the regulation of the water supply companies in London, including minimum standards of water quality for the first time

The advances in understanding the causes of disease, coupled with a growing emphasis on systematic evaluation of treatments, transformed the practice of medicine.

In the late 19th and early 20th centuries, statistical methods began to be applied to medical data, allowing researchers to identify correlations and trends
In the mid-20th century, cohort studies and case-control studies became more widely used, enabling researchers to investigate the relationship between risk factors and disease outcomes
By the 1980s, the importance of synthesising evidence from multiple studies was increasingly recognised, leading to the development and widespread adoption of meta-analyses and systematic reviews as key components of evidence-based practice

The Evidence-Based Medicine Pyramid

Increasingly, cost-effectiveness analysis (CEA) was adopted to evaluate the relative value of different interventions, informing resource allocation decisions by comparing costs to health outcomes achieved. CEA often uses standardised metrics like Quality-Adjusted Life Years^[2], Disability-Adjusted Life Years^[3], and Well-being Adjusted Life Years^[4] to quantify the health and well-being benefits of interventions, allowing for comparisons across different diseases and programs.

The data needed to perform rigorous CEAs often relies on well-designed studies, and one of the key tools that was developed and used in multiple fields to generate this evidence is the randomised controlled trial (RCT).

Randomised Control Trials

A randomised controlled trial (RCT) is an experiment designed to control for factors outside of direct experimental control. In an RCT, participants are randomly allocated to different treatment groups. This random allocation helps to ensure that the groups are statistically comparable, even for characteristics that researchers haven't identified or cannot directly manage. When well-designed, properly conducted and involving a sufficient number of participants, an RCT can provide a robust comparison of the treatments being studied, mitigating the influence of confounding factors.

The history of RCTs: scurvy, poets and beer (4 minutes)
- 1747 - Naval surgeon James Lind compared different treatments for scurvy among six pairs of sailors and he was trying to make sure that the men in the different treatments were as similar as possible^[5]
- 1884 - Psychology researcher Charles Pierce randomised the order in which subjects received different weights to test their perception of small differences
- 1920s - The Rothamsted Agricultural Research Station conducted randomised field experiments in agriculture, laying groundwork for statistical analysis in research
- 1927 Shaffer and other researchers randomly assigned individuals to groups in psychology experiments
- 1950s - Large-scale RCTs assessed the effectiveness of the polio vaccine, contributing to its widespread adoption
- 1960s - Heather Ross conducted a field experiment that examined the impacts of negative income tax, representing early use of RCTs in economics
- 1974 - One of the first development economics experiments investigated teaching mathematics over the radio in primary schools

In 2004 the Center for Global Development convened the Evaluation Gap Working Group. The group was asked to investigate why rigorous impact evaluations of social development programs, whether financed directly by governments or supported by international aid were relatively rare.

3ie - Trends in impact evaluation: Did we ever learn? (3 minutes)

Trends in Impact Evaluation - 3ie

Abhijit V. Banerjee and Esther Duflo's 'Poor Economics' book released in 2011 advocated for an increasingly evidence-based approach to tackling global poverty, challenging grand theories and market-based solutions. The book proposed understanding the decisions and circumstances of the poor through rigorous RCTs and argued that small, well-targeted interventions could lead to significant progress.

Following the growing recognition of a desire for rigorous evidence in global development, a range of influential organisations emerged or shifted their priorities. The Abdul Latif Jameel Poverty Action Lab (J-PAL) spearheaded the movement by focusing on generating research through hundreds of RCTs globally, partnering with implementing organisations to conduct real-world experiments and influencing billions in development spending through its findings.

Funders like USAID and the Gates Foundation, traditionally focused on large-scale interventions, began allocating significant resources to RCTs and other rigorous evaluations.

Rachel Glennerster - When do innovation and evidence change lives? (5 minutes)

This shift was driven by a desire for greater accountability and a feeling that traditional development approaches had yielded limited or uncertain results.

However, the increased emphasis on RCTs also sparked debates about the limitations of this methodology.

Critiques of RCTs (and impact evaluation generally)

The use of RCTs and broader impact evaluation tools in global development are not without detractors. A range of concerns have been raised, including questions about generalisability, scaling effectiveness, ethical considerations and whether micro-level interventions can address macro development challenges.

Stephanie Wykstra - It’s difficult to test whether poverty relief actually works. Do randomised controlled trials provide a scientific measure? (8 minutes)

Limited Generalisability

RCT results may not easily translate to different contexts, undermining their broad applicability for policy decisions. For example, from the above article, successful teacher hiring programs in Kenya failed to replicate when implemented with government teachers
David Roodman - The Smartest RCT Critic
- Suggests that while RCTs excel at establishing internal validity within a study, the subsequent process of applying those findings externally still requires assumptions about population similarities, mirroring the challenges RCTs aim to overcome internally
Eva Vivalt - What do 600 papers on 20 types of interventions tell us about how much impact evaluations generalise?
- Impact evaluations of the same intervention can produce widely different results across contexts, with the typical difference between studies being about 50% of the average effect

The Micro vs. Macro Debate

While RCTs might demonstrate the effectiveness of interventions like distributing bed nets, they don't address underlying issues such as trade policy, leading to questions about whether RCTs, often focusing on micro-level interventions, address root drivers of poverty compared to broader policy changes

Research Vulnerabilities

Publication bias and selective reporting can skew the results of even well-designed RCTs
For example, from the above article by Wykstra, when the US National Heart, Lung and Blood Institute required pre-registration of studies, this dramatically reduced positive results being reported. While this issue affects non-RCTs as well, it demonstrates how scientific processes can undermine methodological rigour

Practical Implementation Challenges

Scaling Challenges

Scaling successful interventions beyond controlled trials introduces more challenges:

Complexity
- Scaling interventions beyond pilot programs introduces bigger logistic, management and coordination challenges
Costs
- Implementation costs often don't scale linearly, with diseconomies of scale emerging in complex environments. Initial cost-effectiveness calculations from small trials can underestimate large-scale implementation expenses
Staff motivation and quality
- Interventions initially implemented by small teams of highly motivated researchers often struggle when scaled through government systems or larger bureaucracies. The attention to detail and problem-solving capabilities present in pilot phases may not translate to larger workforces operating under different incentive structures and varying levels of commitment
Political economy factors
- Local power structures can significantly impact implementation. Successful pilots may falter when confronted with bureaucratic resistance, competing priorities or corruption

Research initiatives like the Yale Research Initiative on Innovation and Scale (Y-RISE) are investigating these challenges to understand how to preserve impact while navigating the transition from controlled studies to real-world implementation.

Resource Constraints

RCTs are typically expensive and time-consuming to implement properly, which can limit their feasibility in resource-constrained settings. The high costs of data collection, analysis and monitoring can divert resources from program implementation, raising questions about opportunity costs in development spending.

Methodological Limitations

Hawthorne Effect

Subjects behaving differently simply because they know they're being studied can artificially inflate intervention effects. For example, households that receive regular monitoring visits as part of an RCT may change their behaviour in ways that wouldn't occur in a scaled program without such intensive observation.

Time Horizons

RCTs typically measure short-term effects, but scaled interventions may produce different outcomes over longer timeframes, creating challenges for predicting long-term impact. Many important development outcomes unfold over years or decades, beyond the typical timeframe of an RCT.

This is especially relevant in countries that are changing economically, demographically, culturally in a short amount of time.

Evolution of the Field

While RCTs in development economics face legitimate criticisms, practitioners have evolved their methods to address these critiques.

Tim Ogden - RCTs in Development Economics, Their Critics and Their Evolution

The fundamental disagreements about RCTs stem from different theories of change regarding:

The value of small vs. big interventions
Technocratic expertise vs. local knowledge
The role of individuals vs. institutions in development

Using the ‘Hype Cycle’ framework, Ogden argues that RCTs have matured past inflated expectations and are now in a more productive ‘slope of enlightenment’ phase, with practitioners implementing more sophisticated designs, addressing causal mechanisms, and building institutions to support evidence-based policymaking.

"Hype Cycle" framework

Conclusion

While RCTs have made valuable contributions to development economics and policy, understanding their limitations is critical for appropriate application. The critiques outlined above highlight potential value in methodological pluralism. Rather than relying exclusively on RCTs, the field is moving toward a more nuanced understanding of when they are most useful and how they can be complemented by other research approaches.

Building on this broader understanding of evaluation methodologies, it's useful to examine the progress and ongoing challenges in specific areas of global development. One of the most fundamental of these is global health, where significant strides have been made in improving life expectancy and reducing mortality, but substantial issues persist.

I've split this post into two parts, you can find the second part here.

^{^}
Universities prioritised established curricula and traditional disciplines like theology, law and (observation based) medicine
^{^}
A QALY is a measure of health outcome that combines the length of life with its quality (or well-being). It represents one year of life in perfect health. QALYs are used to assess the benefit of medical interventions by estimating the number of additional years of life gained due to the intervention, weighted by the quality of those years. A year lived in less-than-perfect health receives a value between 0 and 1. QALYs are often used in cost-effectiveness analysis to compare the value of different medical treatments or public health interventions
^{^}
A DALY is a measure of overall disease burden, expressed as the number of years lost due to ill-health, disability or early death. One DALY represents one lost year of healthy life. DALYs are calculated by summing the years of life lost (YLL) due to premature mortality and the years lived with disability (YLD) due to prevalent cases of the disease or health condition in a population. DALYs allow for comparisons of the burden of different diseases and risk factors, and are used to prioritise health interventions and research efforts
^{^}
A WALY is a newer metric that extends the concept of QALYs by incorporating subjective well-being as a key dimension. It aims to capture not only the health-related quality of life but also the broader aspects of an individual's well-being, such as happiness, life satisfaction, and social connections. WALYs are calculated by weighting years of life by a well-being score, reflecting an individual's overall sense of flourishing. This metric is particularly relevant for evaluating interventions that aim to improve mental health, social support, and overall life satisfaction, in addition to physical health
^{^}
“The cases were as similar as I could have them. They all in general had putrid gums, the spots and lassitude, with weakness of their knees. They lay together in one place, being a proper apartment of the sick in the fore-hold; and had one diet common to all”

SummaryBot @ 2025-07-21T16:06 (+1)

Executive summary: This post offers a historical overview of how evidence—from the scientific revolution to modern randomized controlled trials (RCTs)—has come to shape global development practices, highlighting both the power and the limitations of evidence-based approaches, especially in scaling interventions and addressing systemic challenges; it serves as part two of a reflective and educational seven-part series.

Key points:

Historical foundations of evidence-based thinking: The scientific revolution, driven by figures like Copernicus, Galileo, Bacon, and Newton, emphasized observation, experimentation, and the systematic sharing of knowledge, laying the groundwork for later evidence-based practices in fields like medicine and development.
Transformation of medicine through science: The transition from anecdotal remedies to germ theory, vaccines, and public health infrastructure marked a major shift toward evidence-driven healthcare, later bolstered by statistical methods and cost-effectiveness analysis (CEA) using metrics like QALYs and DALYs.
Rise of RCTs in global development: Inspired by successes in medicine and agriculture, RCTs gained prominence in economics and development, exemplified by the work of Banerjee and Duflo and institutions like J-PAL, promoting a “small interventions, rigorous testing” paradigm.
Critiques and limitations of RCTs: Concerns include limited generalisability across contexts, ethical and practical challenges in scaling, and the narrow focus on micro-level interventions over structural causes of poverty. Research by Wykstra, Vivalt, and others highlights variability in outcomes and potential biases.
Challenges of scaling and sustainability: Implementing interventions beyond controlled trials often faces logistical, motivational, political, and economic hurdles that can erode effectiveness, prompting research initiatives like Y-RISE to study real-world transitions.
Toward methodological pluralism: While RCTs remain valuable, the field is maturing into a more nuanced phase that embraces diverse methods, better causal inference, and institutional capacity-building, moving past “hype” to more balanced, context-sensitive evidence use.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.