Economía y salud
BOLETÍN INFORMATIVO - Año 2015. Diciembre. nº 84
En este número Temas

Con acuse de recibo
Reseñas bibliográficas
Buzón de sugerencias

The tyranny of averages

We asked Juan Merlo, Head of the Unit for Social Epidemiology, Lund University, Sweden, to write a brief paper on what has been called The tyranny of averages, and some colleagues to provide their opinions and criticisms about the essay. We are very grateful for their generous contributions. As usual, readers are invited to send their opinions to the Bulletin, using the template provided at the end of each contribution.

The Editors

The crisis of the epidemiology of the averages - A call for discriminatory accuracy

Juan Merlo
Unit of Social Epidemiology, Faculty of Medicine, Lund University


All over the world, large amounts of economical and intellectual resources are allocated to the study of both traditional risk factors and novel biomarkers for diseases. Today, it is broadly recognized that epidemiology aims at improving our knowledge on individual risk prediction for a more satisfactory and individually-tailored treatment. However, for this purpose, we typically compute simple measures of average association like the odds ratio (OR). The expectation is that this procedure will help us to identify the exposures that discriminate with accuracy the individuals who will develop the disease from those who will not in order to target effective medical or public health interventions. Nevertheless, during the last years a number of relevant publications have pointed out that measures of association alone are unsuitable for this discriminatory purpose. In fact, what we normally consider as a strong association between a risk factor and a disease (e.g., OR=10) is related to a rather low capacity of the risk factor to discriminate cases and non-cases of disease in the population.

In analogy to diagnostic tests, promoting population level screening and treatment of risk factors with a low discriminatory accuracy may lead to unnecessary side effects and costs. It also raises ethical issues related to risk communication and the perils of both unwarranted medicalization and stigmatization of individuals with the risk factor/biomarker. We need a new epidemiological approach that goes beyond differences between average group risks and systematically informs on discriminatory accuracy and inter-individual heterogeneity of effects.

Keywords: Measures of association, risk factors, biomarkers, discriminatory accuracy, inter-individual heterogeneity of effects.

The average approach: studying group differences to understand individual disease risk

Modern medicine, health economics and epidemiology are overwhelmed by a plethora of both risk factors and biomarkers for diseases and large amounts of economical and intellectual resources are allocated to the identification of new ones. The same is true when it comes to the identification of new socioeconomic and ethnic differences in health, as well as in geographical studies where the existence of “significant small area variation” is almost a universal mantra for most outcomes studied. For this purpose, we typically compute differences in average risk between groups. We can, for instance, investigate exposed and unexposed groups and obtain simple measures of average association like the relative risk (RR) or the odds ratio (OR). We can also use some simple measure of variance to summary differences between averages or use graphical approaches like funnel plots, health league tables, choropleth maps or atlases of variation to compare hospitals, or small geographical area averages. In any case, our implicit expectation is to identify the exposure categories that discriminate with accuracy the individuals who will develop the disease from those who will not in order to provide solid rationale for targeted medical or public health interventions.

A classic example concerns preventive strategies for coronary heart disease, where traditional risk factors like smoking habits and blood pressure are systematically evaluated in health care, frequently within a risk score equation like the Framingham, SCORE, QRISK, etc. Thereafter, individuals receive treatment according to their predicted level of disease risk. That is, screening and preventive interventions are closely connected since measurement of risk factors is supposed to inform on which individuals are, and which are not, candidate to different degrees of preventive treatment.

Studying group differences may be misleading

Paradoxically it is today well-known that measures of association alone are unsuitable for this discriminatory purpose. In fact, what we normally consider as a strong association between a risk factor and a disease (e.g., OR for a disease of 10) is related to a rather low capacity of the risk factor to discriminate cases and non-cases of disease in the population. Accordingly, Pepe et al. (1) illustrated that in order to obtain an acceptable discriminatory accuracy of, for example, a true positive fraction (TPF) = 90% and a false positive fraction (FPF) = 5%, we would need an OR higher than 170 (Figure 1). This is a fact not normally recognized.

Figure 1. Correspondence between the true-positive fraction (TPF) and the false-positive fraction (FPF) of a binary risk factor and the odds ratio. Values of TPF and FPF that yield the same odds ratio are connected. The figure has been created following the model described elsewhere by Pepe at al. (1).

Knowledge on the discriminatory accuracy of an average association is not only relevant for diagnostic test, prognostic, or screening markers (1), but they are actually applicable to many areas of clinical and public health research. For instance, there is a discussion questioning the incremental value of assessing levels of novel biomarkers in combination with traditional risk factors (e.g., cholesterol, blood pressure, smoking, diabetes) for the prediction of cardiovascular diseases (2). Moreover, some authors even question the value of adding traditional risk factors to risk predictions based exclusively on age (3). Analogous ideas have been discussed to revisit classical risk factors in cardiovascular (4) and perinatal epidemiology, and ethnical difference in health (5) as well as in multilevel analysis studies investigating geographic differences in individual health (6-8) and for the evaluation of hospital performance (9, 10).

The central problem is that measures of association disregard the heterogeneity of individual-level responses around the average. Measures of association are often pretended to be the best estimation of what is assumed to be a stochastic individual risk. This imposition of the average value on the individual is very common in epidemiology and it has been denominated the “Tyranny of the means” or the “Mean-centric approach”. Besides the medical field, this problem has been discussed in other scientific disciplines such as political science and evolutionary biology. Similar ideas have also been developed in social epidemiology for the investigation of contextual effects. The key criticism is that common measures of association correspond to abstractions that do not represent the heterogeneity of individual effects. This idea points to the study of inter-individual heterogeneity around group averages as fundamental for the understanding of the effect of an exposure (e.g., a risk factor) in the population. Analogous ideas were already described in the XIX century by Claude Bernard (1813-1878†) and later by Lancelot Hogben (1895 – 1975†) as well as by modern clinical epidemiologists promoting “n-of-1” design. The same notion also lies behind the current movement towards personalized (or stratified) medicine.

In short, the key idea of this essay is that from a clinical and public health perspective, it is not enough to know the difference in average disease risk between groups. Instead, what matters the most is to know the discriminatory accuracy, i.e. the capacity of the exposure (whether it is a traditional risk factor, a novel biomarker, or belonging to an ethnic group or residing in a geographic area) to discriminate between individuals who will subsequently suffer from disease from those who will not.

An indiscriminate use of risk factors harms the scientific credibility of modern epidemiology

Promoting population level screening and treatment of risk factors of low discriminatory accuracy may lead to overtreatment, unnecessary side effects and costs. It also raises ethical concerns related to risk communication and the perils of both unwarranted medicalization and stigmatization of exposed individuals. There is also a growing apprehension that financial interests might encourage the introduction and expansion of screening and treatment that lack solid scientific justification. Ultimately, an indiscriminate use of risk factors, biomarkers and even social and geographic factors with low discriminatory accuracy may harm the scientific credibility of modern epidemiology, a phenomenon that was nicely illustrated by Jim Borgman’s cartoon "Today's Random Medical News, from the New England Journal of Panic-Inducing Gobbledygook” (Figure 2).

Figure 2. Sarcastic cartoon created by Jim Borgman illustrating the daily avalanche of breakthrough findings in the news. The cartoon was first published by the Cincinnati Inquirer and King Features Syndicate 1997 Apr 27; Forum section: 1 and reprinted in the New York Times, 27 April 1997, E4.

Are we misleading the community by alarming on risks that may be harmless for most individuals?

Confronting the problem of the low discriminatory accuracy of most current (and past) findings might cast epidemiology into a crisis. If the discriminatory accuracy of most classical risk factors, new biomarkers, as well as ethnic or geographic categorizations is very low, what then happens with the vast majority of recommendations given so far in epidemiology and public health? Are we misleading the community by alarming on risks that may be harmless for most individuals? Are there problems of inefficiency, medicalization and stigmatization? I believe that these questions are of the highest relevance for both the community and the future of public health research.

We need an epidemiological approach based on multilevel analysis of individual heterogeneity

I believe that time has come for a new epidemiological approach that systematically informs on the discriminatory accuracy of exposures at different levels of analysis and on the inter-individual heterogeneity of responses rather than to rely solely on averages(11). For this purpose we ideally need large databases with a rich number of variables and repeated measurements within individuals. We also need new methodological approaches focused on understanding heterogeneity. As a minimum, epidemiological articles presenting measures of association (e.g., OR, RR) or of geographical variation should always include information on their discriminatory accuracy (e.g., the area under the ROC curve) and discuss the findings from this perspective.

Acknowledgements: This work was supported by the Swedish Research Council (PI: Merlo #2013-2484) and by Research founds of the Faculty of Medicine at the Lund University.

Declaration of conflicting interest: The Author declares that there is no conflict of interest.

Contributor: JM had the original idea of the study and drafted the manuscript.


1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882-90.

2. Kaptoge S, Di Angelantonio E, Pennells L, Wood AM, White IR, Gao P, et al. C-reactive protein, fibrinogen, and cardiovascular disease prediction. The New England journal of medicine. 2012;367(14):1310-20.

3. Wald NJ, Simmonds M, Morris JK. Screening for future cardiovascular disease using age alone compared with multiple risk factors and age. PloS one. 2011;6(5):e18742.

4. Sundstrom J, Byberg L, Gedeborg R, Michaelsson K, Berglund L. Useful tests of usefulness of new risk factors: tools for assessing reclassification and discrimination. Scandinavian journal of public health. 2011;39(4):439-41.

5. Mulinari S, Juarez SP, Wagner P, Merlo J. Does Maternal Country of Birth Matter for Understanding Offspring's Birthweight? A Multilevel Analysis of Individual Heterogeneity in Sweden. PloS one. 2015;10(5):e0129362.

6. Merlo J, Asplund K, Lynch J, Rastam L, Dobson A. Population effects on individual systolic blood pressure: a multilevel analysis of the World Health Organization MONICA Project. Am J Epidemiol. 2004;159(12):1168-79.

7. Merlo J, Viciana-Fernandez FJ, Ramiro-Farinas D. Research Group of Longitudinal Database of Andalusian P. Bringing the individual back to small-area variation studies: a multilevel analysis of all-cause mortality in Andalusia, Spain. Soc Sci Med. 2012;75(8):1477-87.

8. Merlo J, Wagner Ph, Ghith N, Leckie G. A novel stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PONE-D-15-36083. 2015.

9. Wagner P, Merlo J. Discriminatory accuracy of a random effect in multilevel logistic regression. 20th IEA World Congress of Epidemiology (WCE2014). 2014.

10. Ghith N, Wagner Ph, Frølich A, J. M. Short term survival after admission for heart failure in Sweden: applying multilevel analyses of discriminatory accuracy to evaluate institutional performance. Submitted. 2015.

11. Merlo J. Invited commentary: multilevel analysis of individual heterogeneity-a fundamental critique of the current probabilistic risk factor epidemiology. American Journal of Epidemiology. 2014;180(2):208-12.


Epidemiological intelligence and population health

Ildefonso Hernández Aguado
Professor of Preventive Medicine and Public Health, Universidad Miguel Hernández, Spain

Low level exposures that affect the whole population may not be good disease predictors at the individual level but can be highly relevant in public health terms.

Translation of population risk averages to an individual is misleading and discredits epidemiology. The recent release by the WHO of its report on disease risks associated with the consumption of processed meat has created some anxiety amongst the public due to the mechanical interpretation –inevitable- of the risks in individual terms.

Geoffrey Rose advised us many years ago to be careful when seeking to identify the fate of every person according to known risk factors. He showed how difficult it is to assess who will have coronary heart disease according to any defined levels of relevant risk factors of heart disease. However, one can predict the prevalence of a disease in a population according to the population mean of the underlying risk factor. I agree with Merlo in that we are misleading the population through our low discriminatory accuracy of most current (and past) findings.

The links between, on the one hand medicine and epidemiology, and on the other the lack of sufficient crossover contamination between epidemiology and social sciences, have been a heavy burden for epidemiology. It has induced a proclivity to apply any new epidemiological knowledge to the individual rather than to communities and society as a whole. There is no doubt that we need better methods to apply epidemiological intelligence to predict individual events, but the public health community should be aware that the key issue is population health (1). Health economists should move from disease oriented science to population health gains, epidemiologists must get rid of their strong links with clinical medicine and focus on population. In this way, the Merlo’s proposal is welcome as he puts forward methodological approaches for the simultaneous consideration of both individual and population questions.


1. Hernández-Aguado I, Parker LA. Intelligence for Health Governance: Innovation in the Monitoring of Health and Well-Being. In: Kickbusch I, editor. Policy Innovation for Health. New York: Springer; 2009, p. 23-66.

Risk evaluation in the future. Round trip from the individual to the group

Beatriz González López-Valcárcel
Professor of Economics, University of Las Palmas de Gran Canaria, Spain

I applaud the initiative of Juan Merlo. It highlights the weakness of many epidemiological studies that measure risks for population groups but have little discriminatory accuracy to predict whether a given individual will win the lottery (in which the prize is a bad thing for studies of risk of disease and something good for studies on the outcomes of treatments). Merlo points out that today there are many individual data that could be exploited to improve individual predictions. He recommends using multilevel analysis and (new) measures of individual heterogeneity within the group when assessing risks. These measures would complement the traditional ones based on averages (RR, OR).

Ultimately, it is a round trip from the individual to the group. The population indicators of risks and benefits are necessary but insufficient. In short, what both patients and policy decisors want to know is the individual probabilty of getting cancer if you eat red meat, and the individual probability of getting cancer if you do not eat it.

The public health recommendations are very easy (and strong) in some cases (say "Stay away from Chernobyl now!"). But such situations with OR over 100 are few. Even in the case of snuff, no one can predict whether you will have lung cancer, not to mention with red meat and colorectal cancer!

The methodological challenge is huge, as in any estimate of counterfactuals. "What would have happened if ..." will never be known because the condition enclosing the ellipsis does not materialize. For a blood test we need just a few drops. Our blood is an homogeneous sample. But humans are very heterogeneous, and as we move into the microcosm of the human genome, we discover new sources of heterogeneity, even if we are able to stratify by means of risk biomarkers. The environment in which we live adds more variance to the equation.

The markets are ahead. Health insurance companies have sophisticated risk selection models with high discriminatory accuracy to predict the use of resources that will make a person next year, knowing his medical history and past bills. Using big data, Netflix has created models that predict correctly in more than 95% of cases whether you will like a movie. If the secret services put the focus on you, you can be sure they will be able to predict your behavior better than yourself. Thanks to science and to markets someday it will be possible to predict not only when you will suffer a stroke, but also how much money you will lack to pay for the health care you will need then.

Advances in the direction proposed by Merlo are needed and will possibly be achieved. They are necessary for the credibility of epidemiology and are necessary to progress towards a value-based health system, where risks are shared between the stakeholders: between the industry and the public funders, between health organizations and their providers, between public healthcare system and their patients. This is a promising avenue for gaining health system’s efficiency and savings.

Geographic Variations in Medical Practice: a showcase for multilevel modelling

Enrique Bernal-Delgado
Instituto Aragonés de Ciencias de la Salud (IACS)

Studies on geographic variations in health care performance (classically known as studies on variations in medical practice) aim at eliciting differences in the behaviour of health care providers (e.g., health plans, health care authorities, hospital catchment areas) irrespective of need differences in the populations they serve.

Note that, in these studies, the unit of analysis is not the population living in a particular geographic area, but the provider. Thus, the approach for this type of research is different to that one discussed in Merlo’s paper, very much focused on interpreting individual risks. Yet, although studies on performance variation do not pursue to make any inference on the individual risk for patients residing in a particular area, the multilevel approach is crucial in the assessment of health care performance variation.

What multilevel modelling has added to the study of variations

Beyond any consideration on its statistical robustness when analysing clustered data –either hierarchically distributed or allowing multi-membership- multilevel analyses provide a better estimation of the specific contribution of the provider in the differences in performance (e.g., utilization rates). Summary measures equivalent to the classical Variance Partition Coefficient can be obtained from Bayesian modelling (e.g., Bësag-York-Mollie, Shared Component Modelling, CAR models), allowing the estimation of the fraction of variation attributable to each of the different spatial units of interest (e.g., primary care settings, healthcare areas, regions, and over time) beyond populations’ need differences.

Consistent with Merlos’ paper, we (as health services researchers) do advocate a more frequent use of multilevel analysis, although for different reasons.

The crisis of the epidemiology of the averages: knowing the scope of its implications

Carlos Campillo-Artero

The crystal clear description of the crisis of the epidemiology of the averages by Juan Merlo was as badly needed as it is challenging. Moreover, it encompasses a wide array of implicit practical implications. It was needed since it highlights the epistemological limitations and, most perilous, the harm that could ensue when population-based interventions disregard both group and individual variations. It is challenging because it points to the need for a thoughtful and unhurried review of some longstanding accepted epidemiological principles, assumptions and beliefs.

The variety of the practical implications and applications of Merlo’s paper raises a substantial number of issues. Given its clinical, social and economic relevance, one could zero in on one of them: applied clinical research.

It can be contended that the tyranny of means is highly relevant to randomized controlled trials (RCT). A large number of clinicians misunderstand some of the assumptions and implications related to the treatment effects reported in RCT owing to their lack of awareness of the tyranny of means. The unit of analyses in RCT is not the individual but the intervention group (both the experimental and control(s) arms). The estimates of the effects of the intervention (e.g. a drug) are group mean effects, not individual effects. With a well-designed and conducted RCT, averages of response to the drug in both groups may be estimated but this still does not allow for accurate discrimination between individuals that will actually respond or not in either group both in the trial and in clinical practice.

Subgroup effects can also be estimated but being very wary of well-known and frequent flaws in the design of subgroup analyses and threads to the interpretation and the validity of their results. Furthermore, recent advances in stratified and personalized medicine still fall short of allowing us to reach individual estimates. Besides, the number of licensed (and properly validated and qualified) diagnostic, prognostic, predictive and response (including safety) biomarkers and of clinical areas where they can be effectively and safely used is still very small. Taken as a whole, they are not substantially increasing still the capacity to accurately discriminate trials at the individual level. Last but not least, although n-of-1 trials could provide reasonable estimates of (capture) individual effects and individual variation, it is thus far not within reach to use them on a large scale to overcome the tyranny of means in RCT. The moral is: be wary of mean estimates standing alone, and bear in mind both central measures and group and individual variances before taking action.

There is an inseparable multilevel structure of individual heterogeneity

Juan Merlo

It is often argued in Epidemiology that we need to distinguish “sick populations” from “sick individuals” as Geoffrey Rose advised us many years ago; that there is a “population perspective” separated from the “individual perspective” or, as Bernal-Delgado indicates, that there is a distinction between studies aimed to evaluate the behaviour of health care providers (e.g., health plans, health care authorities, hospital catchment areas) and those aimed to study the individual risk for patients residing in a particular area. According to this argument, therefore, it is correct to investigate differences between averages when the focus of interest is the population (or the health care provider) not the individual.

I think, however, this separation is normal but invalid. What we actually investigate is an inseparable multilevel structure of individual heterogeneity. Therefore, population influences or the behaviour of health care providers are incorrectly measured by differences between group averages. Rather this influence expresses itself as the share of the individual heterogeneity that is at the population level.

I do agree with Beatriz González López-Valcárcel and with  Carlos Campillo-Artero. With a subtle reinterpretation, I also agree with Hernández Aguado in that translation of population risk averages to an individual is misleading and discredits epidemiology. Just because of that, we always need to accompany information on population risk averages with measures of discriminatory accuracy. This is especially relevant in Epidemiology, as most of the information in this science is not about the individual but about a group of individuals defined by categorizations like, for instance, nations, neighbourhoods, ethnic groups, hospital catchment areas, blood pressure level, sex, socioeconomic position, etc.). Observe that what we often call “individual risk” is actually the average risk in a group.

I attach a few references showing the approach I support (1) (2) (3) (4) (5) (6) (7).


  1. Merlo J, Chaix B, Yang M, Lynch J, Rastam L. A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005;59(6):443-9.
  2. Merlo J, Ostergren PO, Broms K, Bjorck-Linne A, Liedholm H. Survival after initial hospitalisation for heart failure: a multilevel analysis of patients in Swedish acute care hospitals. J Epidemiol Community Health. 2001;55(5):323-9.
  3. Merlo J, Asplund K, Lynch J, Rastam L, Dobson A. Population effects on individual systolic blood pressure: a multilevel analysis of the World Health Organization MONICA Project. Am J Epidemiol. 2004;159(12):1168-79.
  4. Ohlsson H, Merlo J. Understanding the effects of a decentralized budget on physicians' compliance with guidelines for statin prescription - a multilevel methodological approach. BMC health services research. 2007;7:68.
  5. Hjerpe P, Ohlsson H, Lindblad U, Bostrom KB, Merlo J. Understanding adherence to therapeutic guidelines: a multilevel analysis of statin prescription in the Skaraborg Primary Care Database. European journal of clinical pharmacology. 2010.
  6. Cantarero-Arevalo L, Perez Vicente R, Juarez SP, Merlo J. Ethnic differences in asthma treatment among Swedish adolescents: A multilevel analysis of individual heterogeneity. Scand J Public Health. 2015;Nov 9. pii: 1403494815614749. [Epub ahead of print].
  7. Merlo J, Mulinari S. Measures of discriminatory accuracy and categorizations in public health: a response to Allan Krasnik's editorial. European journal of public health. 2015;DOI: 10.1093/eurpub/ckv209.

Deja tu opinión:

E-mail: (No visible)
¿Cuál es el número de en medio?
3 7 (9) 1 6

TecnoCampus Mataró-Maresme
Edifici TCM2 | P2. O3.

Av. Ernest Lluch, 32
08302 Mataró (Barcelona)

Tel. 93 755 23 82

Editores del boletín: Carlos Campillo ( y Cristina Hernández Quevedo (

Editora de redacción: Cristina Hernández Quevedo (

Comité de redacción:
José Mª Abellán Perpiñán, Manuel García Goñi, Ariadna García Prado, Miguel Ángel Negrín, Vicente Ortún, Luz María Peña.

Han colaborado en este número: Rebeca Alfranca Pardillos, Mikel Berdud, Enrique Bernal-Delgado, Carlos Campillo-Artero, David Casado Marín, María Errea Rodríguez, Manuel García Goñi, Beatriz González López-Valcárcel, Ildefonso Hernández Aguado, Emilio Herrera Molina, Sergi Jiménez Martín, Félix Lobo, Juan Merlo.