Methodological Quality of Economic Evaluations in Integrated Care: Evidence from a Systematic Review

Introduction: The aim of this review is to systematically assess the methodological quality of economic evaluations in integrated care and to identify challenges with conducting such studies. Theory and methods: Searches of grey-literature and scientific papers were performed, from January 2000 to December 2018. A checklist was developed to assess the quality of economic evaluations. Authors’ statements of challenges encountered during their evaluations were qualitatively coded. Results: Forty-four articles were eligible for inclusion. The review found that study design, measurement of cost and outcomes, statistical analysis and presentation of data were the areas with most quality variation. Authors identified challenges mostly related to time horizon of the evaluation, inadequate or lack of comparator group, contamination bias, and a post-hoc evaluation culture. Discussion: Our review found significant differences in quality, with some studies showing poor methodological rigor; challenging conclusions on the cost-effectiveness of integrated care. Conclusion: It is essential for evaluators to use best-practice standards when planning and conducting economic evaluations, in order to build a reliable evidence base for decision-making in integrated care.

Despite best practice recommendations [16,17], very few have conducted quality assessment of studies included in systematic reviews examining the impact of integrated care [9,12,13,18]. Little is known about current practices in conducting economic evaluation of integrated care interventions, or the methodological gaps related to the overall study designs, measurements of impact, analysis and presentation of results.
Results of economic evaluations are increasingly being used to inform healthcare decision-making, which further underscores the need to examine whether these evaluations meet methodological standards [19]. Policy makers concerned with the transferability of integrated care models across healthcare settings may be interested in knowing whether the conclusions expressed in studies are acceptable and trustworthy [20,21]. This is especially the case since allocating public funds to healthcare services informed by misleading or biased economic analysis, may pose significant ethical issues and policy implications [19,22]. Employing reliable methods in economic evaluations may also be of importance to clinicians and health practitioners considering integrated care in their health service planning and service delivery [20]. Finally, a systematic assessment of the quality of evidence generated by economic evaluation is important in contextualizing the substantial variation in findings of integrated care interventions, as well as in identifying opportunities for improvements [5]. Therefore, the aim of this review is to systematically assess the methodological quality of economic evaluations in integrated care and to identify challenges with conducting such studies.

Search strategy
Relevant search terms related to the broad concepts of "integrated care" and "economic evaluation" were identified by looking at frequently used terms in previous systematic reviews [5,7,9,10,12,14] and seminal literature [17,[19][20][21][23][24] on the respective topics. Our search was also informed by terms used in the Integrated Care Search tool developed by the International Foundation for Integrated Care [25]. See Appendix 1 for search terms used in the review.
Searches of grey-literature and scientific papers were performed, from January 2000 to December 2018, in the following databases: Medline/PubMed, EMBASE, CINAHL, Web of Science, Scopus, the World Health Organization Library and Information Networks for Knowledge database (WHOLIS), Database of Abstracts of Reviews of Effects (DARE) and the NHS Economic Evaluation Database (NHS EED), and the OECD Library. In addition, hand searching of reference lists in key publications (including systematic reviews on integrated care, dissertations, conference proceedings, opinion pieces, editorials and conference abstracts) was used to identify any relevant missed articles.

Selection process and eligibility criteria
Citations were downloaded and screened in Mendeley, an online citation manager tool. All article abstracts, and titles were read independently by two reviewers based on the inclusion criteria detailed below. If the reviewers could not determine whether to exclude an article based on its abstract and title, then it was retrieved for full text reading until agreement was reached.
Inclusion criteria 1. Articles published in English. 2. Articles that described the implementation, execution or evaluation of interventions or programmes based on Kodner and Spreeuweberg's definition of integrated care: "funding, administrative, organisational, service delivery and clinical interventions designed to create connectivity, alignment and collaboration within and/or between the cure and care sectors" [26, p. 3]. 3. Empirical economic evaluations as defined by Drummond et al.: "the comparative analysis, measurement, valuing and identification of alternative courses of actions in terms of their cost and consequences" [20, p. 3].

Data abstraction and analysis
Two data abstraction templates were adapted from Boland et al. [12], and aligned with elements from the PICO framework for research [27]. The first template was used to extract information about the studies including: the study objectives, the settings, description of the intervention, target population and size. The table was also used to abstract information on the evaluation, including study design, the type of economic analysis, the evaluation perspective, the length of the observation period and the measures used. The second template was developed to assess the quality of economic evaluations. This was in the form of a checklist appraising the economic evaluation methodology and the risk of bias in the study design of the interventions. It provided a binary scoring criterion (yes/ no) assessing strengths and weakness of the studies on 30 items. The checklist was a combination of: 1) the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) [17], and 2) the Health Technology Assessment of Disease Management Programmes (HTA-DM) [28]. The CHEERS guideline was developed by an expert panel to optimize the reporting of health economic evaluations [17], while the HTA-DM has been previously used to assess the overall quality of evaluations [28]. Two reviewers (MK, NE) abstracted the information from both templates, and then independently cross-verified each other's information for reporting consistency and data reliability.
Our approach to identifying challenges and limitations with their overall evaluations was modeled after the thematic analysis approach used in a systematic review by Thomas and Harden [29]. This involved three stages: 1) coding text, 2) developing descriptive themes from lifted texts and 3) generating analytical themes. Firstly, in the coding phase, when articles described barriers, limitations or challenges that emerged from the economic evaluation of integrated care, they were regarded as "attributive statements" and systematically recorded. These statements were often found in the discussion and results section of the articles. In the second stage, descriptive themes that emerged from the lifted texts were identified.
Finally, three reviewers (MK, NE and VS) used an inductive approach to identify the dominant themes that emerged from clustering statements of challenges [30].

Results
The initial literature search and screening based on the inclusion criteria yielded 2,263 abstracts. Most of these articles were rejected as they were not based on interventions that fulfilled the definition of integrated care and did not report the results of economic evaluations. After full text reading of 69 articles, 44 articles were eligible for inclusion. Figure 1 shows our PRISMA flowchart at various stages of the selection process.

Study design
Randomized control trials (58%) were the most commonly used study designs, followed by observational cohort designs (41%), where 16% of the studies were quasi-experimental or used pre-post cohort designs. Only one study was cross-sectional in nature.
To minimize bias while conducting the economic evaluations, 91% of the studies identified a comparator or control group, which did not receive the intervention. However, the baseline population characteristics in the groups were significantly different in almost a third of the studies, particularly in observational cohort designs. Quasi-experimental approaches such as difference-in-difference analysis of the pre-post intervention period were conducted in 7% of the studies. This was performed to ensure that the evaluations were in fact measuring the impact of the intervention rather than other confounding factors that might bias the results. Authors such as Pimperl et al. used propensity-score matching to ensure individuals enrolled in an accountable care organization were similar at baseline to their comparator group on socio-demographic and clinical variables [63]. Majority of the longitudinal studies collected data at multiple points, spanning beyond baseline and follow-up (61%). Furthermore, 70% had clear description of patient attrition or drop-outs at the follow-up period. Contamination due to the exposure of individuals in the comparator to the intervention, occurred in a third of the studies. To avoid this bias, 20% of the studies used a cluster RCT design, where individuals were randomized at the care setting level, rather than at the patient level.

Measurement of cost and outcomes
Most studies explicitly stated the perspective adopted in their economic evaluation (75%), with the healthcare payer perspective dominating the broader societal perspective (66% vs. 34%). Studies that adopted a societal perspective also considered the indirect impact of the intervention on caregiver burden, out of pocket care expenses and productivity loss. Non-medical and indirect costs were considered in 35% of the studies. For example, van Leeuwen et al. valued the indirect costs of a multidisciplinary geriatric primary care team on the informal care of frail elderly patients [31]. Only a third of the studies included costs associated with the development of the intervention or the implementation costs, with the majority only considering intervention and healthcare utilization costs. Healthcare costs and utilization from across all relevant health and social sectors were reflected in 66% of the studies. However, while some studies reflected both types of information, others only reported overall health care costs without the resource utilization. For example, in a cross-sectional study of a large multispecialty primary care group practice in the US serving Medicare recipients, despite including all relevant health care costs (such as home healthcare, long-term care, skilled nursing facilities and acute care), the resource utilization associated with each sector was not reported [68]. The follow-up period varied across studies, from short time horizons such as 3 months [34], to up to 5 years [49,67]. Those that measured the impact of the interventions within a year did not require discounting of costs and health benefits (53%) because they were expressed in present values [73]. However, only 18% of the studies with a time horizon longer than a year applied discounting.

Statistical analysis and presentation of data
Full economic evaluations of integrated care interventions applied the incremental cost-effectiveness ratio analysis (ICER) in their approach (55%), rather than using the net monetary (NMB) or health benefit (NHB) (7%). Studies that reported the ICER and NHB estimated the joint monetary and effect differences between the intervention and a comparator. Various approaches were adopted to address uncertainty around the reported cost-effectiveness estimates, including presenting both cost-effectiveness acceptability curves and cost-effectiveness planes. This involved demonstrating whether the integrated care intervention met or surpassed society's willingness to pay (WTP) for an additional unit of health benefit (43%). For example, in an evaluation of a community-based intervention for frail older adults, Looman et al. graphically reported their ICERs on a cost-effectiveness plane [46]. Using this approach, they demonstrated that compared to usual care, the new intervention was only 0.21% less costly and more effective, while 78.8% more costly and less effective. Because decision-criteria such as WTP thresholds or cut-off points were not applied to the results, it was challenging to determine whether this intervention could be conclusively deemed cost-effective [46]. On the other hand, Tanajewski et al determined that a multidisciplinary discharge program was more favorable than usual care using the NMB analysis at various WTP thresholds (£20,000-120,000). They also showed that the more decision-makers were willing to invest in the intervention, the higher the probability of its cost-effectiveness [34]. Subgroup analysis was used to examine the heterogeneity of economic impact or the source of variability, but was conducted in 47% of the studies in this review for more information, please refer to Table 2 for the checklist assessing the quality of economic evaluations.

Time horizon
Limitation associated with the study timeframe was the most frequently cited barrier to conducting robust evaluations of integrated care interventions. In a cost-effectiveness analysis of a multidisciplinary residential care for frail older adults, Vroomen et al. note that: "the [six months] duration of the trial was relatively short because of the high risk for drop out owing to the extreme vulnerability of residents" [45, p. 5]. A further limitation of the study's internal validity was the patients' maturation effect: "patients in a residential home have a heterogeneous mix of chronic conditions that naturally erode over time which makes it difficult to know if an intervention would be able to override the downward trend…in such a short time span" [45, p. 6]. Other authors questioned whether measures such as mortality or quality adjusted life years (QALYs), often the standard in economic evaluations, were sensitive enough to truly capture the effect of the intervention in short follow-up periods [45,63].
Underestimation of the downstream health and monetary benefits of integrated care interventions was also a significant concern in studies with short time horizons [42]. In a 12-month cost-utility analysis of a collaborative care program targeting patients with depression and cardiovascular illness, Donohue et al. highlight that in this type of intervention, "most cost savings [occur] between the first and second years of follow-up, stressing the continued need for adequately powered and longer-term trials" [44, p. 457]. Zulman et al. argue that when determining an appropriate follow-up period, it is important to not only account for the implementation period but also the potential initial intensity of patients care needs: "once patients are enrolled in such programs, it takes time to build their trust, modify health behaviors and improve chronic disease trajectories….which could translate to increased [initial need for] health maintenance and screening, but subsequent reductions in future utilizations" [50, p. 172].

Finding suitable comparators
Authors acknowledged the challenge of finding a comparator population that could serve as an appropriate control for those receiving the intervention. Observational designs were often adopted because of the difficulty in identifying participants for a comparator group. Olsson et al. note that it would have been ideal to randomize       18. The resource utilization and costs are reported 19. Reports the (adjusted) dates of estimated resource quantities and unit 20. Discounting of outcomes and costs performed Statistical analysis 21. Data analysis is performed accord- 22. Dealt adequately with missing observations 23. Appropriate statistical methods for analysing skewed data 24. Report the values, ranges, references, and if used, probability distributions for all parameters.  30. The study discusses the generalizability of the results to other context and/or patient groups community-dwelling older adults to usual care vs. a multidisciplinary geriatric primary care team. However, it was impractical due to the administrative burden: "there were concerns about the difficulty for nursing staff working in two care systems at the same time and for patients possibly comparing the treatment they received " [64, p. 1633]. In many observational studies, the significant baseline differences between those receiving the intervention and usual care emerged as a major threat to the studies' validity. In an integrated intervention targeting high cost Medicare users, Mccal et al highlight that: "… because the comparison group was not based on random sampling….if the intervention had a disproportionate number of high risk, more cost increasing beneficiaries, then the [the evaluation results] could be biased against the intervention" [57, p. 143]. However, even in RCTs, the hidden differences in the context of the comparator and the intervention group could jeopardize the attribution of effect, or causality [15]. To address this potential bias, studies adopted quasi-experimental approaches to adjust for baseline differences between comparison groups. In a cost-effectiveness study of disease management programs in the Netherlands, Tsiachristas et al. note that, "[because] of the observational study design, the patients in the comparators in each disease category differed in disease severity and sociodemographic characteristics at baseline. Therefore, we used propensity score matching to reduce confounding caused by these differences" [40, p. 978].

Risk of contamination bias
Controlling for the potential of contamination bias emerged as a challenge in both observational and nonclustered RCT designs. Often due to feasibility issues, both usual care and the integrated care intervention were delivered in the same care setting, alongside each other. Zulman et al. suggest that in their study, clinicians in the usual care group could have observed and adopted the practices of the intensive case management program in the Veterans Affairs medical care home program [50]. In other studies, this bias was even more difficult to control when plans to spread the integrated care intervention across the regional setting were already underway. This was exemplified by a primary care-based collaborative mental health intervention in the Netherlands. The authors note that: "when the study started, about 65% of the general practitioners in the Hague were already participating in the program" [55, p. 78]. In large scale implementations of integrated care, there is a risk of diluting the true magnitude of impact, as it can prove difficult to confine patients receiving usual care from also accessing these services [63].
Post-hoc evaluation culture A significant barrier was the implementation of the interventions without planning for an economic evaluation. In a study of a community-based program integrating HIV and nutrition care in Malawi and Mozambique, Bergmann et al. noted that "the project was set up without a research design for rigorous impact evaluation" [66, p. 710]. This limited data accessibility and the suitability of the outcome measures selected for the evaluation: "we would have preferred to directly estimate the effects of the [intervention] on morbidity and mortality but did not have the data to do so" [66, p. 710]. Other observational studies that were not initially planned as experimental study designs had to rely on routinely collected data to estimate the magnitude of impact. For example, claims data from widely accessible electronic information systems were used to measure the impact of a populationbased accountable care organization in Germany. While this reduced the intensity of resources required for the evaluation, authors noted that their approach may have led to the underestimation of the broader impact of the intervention, such as its effect on patient and caregiver out-of-pocket spending [63].

Main results
The results of economic evaluations of integrated care should be placed in the broader context of its implementation and the methodological approaches used in its evaluation. To our knowledge this is the first review that critically appraised the methodological quality of economic evaluations of integrated care interventions, against best-practice guidelines. Our review found significant differences in quality. Some studies showed poor methodological rigor, challenging conclusions on the cost-effectiveness of integrated care. Similar to Nolte and Pitchforth's review of systematic reviews on the economic impacts of integrated care, we found wide variability across study designs, measurements of costs and outcomes, as well as analytical approaches and presentation of results [5].
Some of the key cited challenges to robust economic evaluations in our study were related to: 1) time horizon of the evaluation; 2) inadequate or lack of comparator group; 3) contamination bias due to potential exposure of those in usual care with the treatment; and 4) a post-hoc evaluation culture.

Interpretation of findings in the context of other studies
Several reviews highlighted similar challenges as impediments to arriving to robust evidence on the economic impacts of integrated care. In a review of interventions targeting frequent users of the emergency department, Althaus et al. note that it was difficult to attribute cost savings to the interventions when a significant number of studies did not have comparators [74]. De Bruin et al. highlight that there was a large variation in the types of comparators used in their review of the impact of disease management programs on health expenditures. They found that when there were similarities in the intervention and usual care, it was difficult to observe differences. The ability to contrast between the two groups was further challenged by the poor description of usual care conditions in the majority of the studies [13].
Because integrated care can impact a broad range of costs within and beyond the healthcare system, a broader societal perspective is preferred when estimating costs [15]. In our review, the healthcare perspective dominated the economic evaluations. This could be because it requires much less time and financial resource for data collection. While there is variation in the cost perspectives suggested by national guidelines for health technology assessment, they often also recommend reporting costs from the societal perspective. This is particularly the case if the intervention has an expected impact on other sectors [75,76]. In a review of collaborative models for individuals with depression, none of the reviewed economic evaluations included costs beyond the healthcare sector [77]. A narrow scope in the cost perspective was also reported in Wong et al.'s systematic review of economic evaluations of integrated care for cardiac rehabilitation. Using the Drummond's check list to examine the quality of economic evaluations, they found that only 9% of studies included all the relevant costs and consequences for both the intervention and comparator [18]. This narrow scope fails to incorporate the impact of integrated care on other types of care (e.g. Social Services), government departments (e.g. Justice and Education) and productivity levels in the overall economy [32,78]. An alternative would be to follow the recommendations of the specific country of origin, and to include a societal perspective as a sensitivity analysis.

Best practice recommendations
Guidelines for health technology assessment around the world recommend adopting a life-time horizon [16,17,23]. Drummond and colleagues recommend, that in treatment of chronic diseases where benefits may have longlasting implications, it is often necessary to extrapolate the effects and costs of the interventions being compared over a life time [20]. However, short time horizon was acknowledged as a major limitation in most studies. In our review, 81% of the studies had less than two years follow-up period. Only one study employed decision analytic modelling to extrapolate costs and outcomes estimated during the follow-up period to patient's lifetime [49]. This approach also allows the linkage of intermediate endpoints (e.g. clinical status) to final endpoints (e.g. QALYs and mortality) [23]. The scarcity of model-based economic evaluations in integrated care maybe due to lack of health economic modelling expertise in this field of research. Another reason could be the complex nature of integrated care, with a non-linear relationship between interventions and outcomes [15]. In addition, delays in the implementation of integrated care interventions may lead the "full" treatment effect to be observed in 3-5 years after the start of implementation [79]. Decision models would not be able to overcome this limitation. Furthermore, evaluations with longer time horizons have an increased risk of the intervention eventually becoming usual care or being contaminated by other initiatives. Tsiachristas et al. suggest setting up routine monitoring of key measures as a potential strategy to measure the long-term effects of the intervention, beyond the research period [15].
Best practice recommendations emphasize the need for a comparator in economic evaluations, often in the form of current practice or variations of similar programs [15], [16,28]. Our review found that identifying the appropriate comparator can be challenging, particularly for observational studies. To ensure that the evaluation is in fact measuring the causal effect of integrated care, an appropriate control population should be chosen [16,17]. RCTs are commonly used to minimize differences between the compared populations (i.e. intervention and control arms). However, addressing the differences in observational studies may prove difficult and resource-intensive because the richness of required data and statistical expertise. Furthermore, the risk of contamination by the control group from the intervention is elevated, especially when integrated care implementation takes place in large parts of the population. As such, even cluster-RCTs may be difficult to overcome this risk if not well designed [80]. Quasi-experimental designs provide feasible alternatives in the evaluation of integrated care, when control groups are identified. An approach such as propensity-score matching could reduce observed confounding between the comparators, and is increasingly used in observational studies [81,82]. Approaches such as difference-in-differences, instrumental variables, and regression discontinuity could reduce the unobserved confounding between the comparators [15,83].
Integrated care is considered a complex intervention, amenable to being tailored to the context in which it is implemented in. Economic evaluation plays an important role in this complex and adaptive intervention, learning through feedback loops of patients and provider experiences and outcomes [15]. Our review found that economic evaluations in integrated care were often piggy backing on larger scale evaluations. In some instances, the plans for evaluation began after the intervention had concluded [66]. When this is the case, researchers may lack control over the types of outcomes measures included in the routinely collected data [15]. In a qualitative study examining stakeholder perspectives on the evaluation of chronic disease management programmes in six European countries, lack of an evaluation culture was also cited as one the main barriers towards producing sound evidence [84]. There are several potential explanations for this. Policy makers and practitioners maybe unaware of the need for, or benefits of conducting evaluations [84]. There may be a reluctance by stakeholders to support evaluations due to perceived additional administrative and financial burdens [64]. Finally, in some health care contexts, particularly in low-resource settings, there might be a lack of personnel capacity or necessary skill sets to undertake comprehensive economic evaluations [85]. Nonetheless, with the increased reliance on evidence generated from empirical studies by decision-makers, it is important for evaluators to be embedded early, from the design and planning of the integrated care programme, rather than post-implementation [15].

Strengths and Limitations
The extensive search strategy used to capture the concepts of integrated care and economic evaluation is a major strength of this review. A second strength is that our checklist was adapted from the CHEERS and the HTA-DM quality assessment guidelines, which were developed by experts in the field for the purpose of optimizing the reporting of evaluations and health economic evaluations [17,28]. However, this review should be interpreted in the context of several limitations. Integrated care as a concept has been widely used to achieve various objectives, which explains why there remains a lack of common definition which is universally accepted [86][87][88]. While our search strategy attempted to include studies that broadly fit with in our definition, our review may have missed others. Rather than presenting the review as a compendium of all economic evaluations conducted in integrated care, we hope this provides a snapshot into the current practices in the field. Secondly, the checklists provided a guiding framework for critically reviewing the economic evaluations reported by the articles [17]. However, there was room for subjective interpretation which may have biased the scoring. We attempted to address this bias through two reviewers independently appraising the articles against the checklist, with disagreements resolved through reaching consensus.

Implications for research, policy and practice
Economic evaluation cannot be viewed as separate from programme evaluation. Rather, it needs to become an integral part of any evaluation effort [15]. Given the wide implementation of integrated care as a viable approach to tackling complex health and social needs, it is necessary for researchers to step up their efforts in understanding how exactly integrated care works and which transferable lessons can be drawn [88]. This necessitates a multidisciplinary approach in research, where economic evaluation forms a crucial part of a mixed methods approach. Recent developments in applying realist evaluation and multi-criteria decision analysis (MCDA) as part of implementation research in Europe and Australia show promising results [89,90]. However, they also highlight the complexity and resource intensity necessary to evaluate integrated care. For policy makers to create enabling conditions, researchers also need to become more engaged in the promotion of an evaluation culture. Routinely reporting on short-, medium-and long-term outcomes may help foster a better understanding among policy makers on adequate time horizons necessary for evaluation. Policy makers also need to provide financial and regulatory frameworks for research to become an integral part of designing integrated care initiative, similar to the Innovation Fund in Germany. More importantly, evaluation needs to be viewed as an integral part of integrated care implementation, as a powerful tool to promote cultural change [88]. This is where practice needs to accept that (economic) evaluation is not a means to punish or cut resources, but can be used to support quality improvement and system change. In a complex environment, regular on-going feedback and monitoring are prerequisites to reaching better outcomes [88]. Given the continuing resource restrictions, alongside the inadequate and ineffective use of resources in health and social care, economic evaluation should be high on any priority list, in research, policy and practice.

Conclusion
Arguably a key challenge to the evaluation of integrated care lies in the complexity of the intervention and the short-term evaluation period. This can make it significantly more difficult to perform rigorous economic evaluations; especially when health economists are not involved in the study design. However, methodological gaps in economic evaluations may be more straightforward to address than cultural barriers to evaluation [20]. Therefore, it is important for evaluators to use best-practice standards when planning and conducting economic evaluations. Work by authors such as Drummond et al. has sought to specify the essential components of good economic evaluations in health care and to develop checklists that can help researchers and practitioners critically appraise the methodological quality of studies [17,19,20]. To build a reliable evidence base for decision-makers and practitioners utilizing evidence from integrated care studies, it is important to determine the mechanisms that influence the economic impacts of integrated care. This will require designing studies that can reliably answer: is this intervention cost-effective, using which resources, in which settings and for whom? [15] With the increased interest in integrated care from policy makers, the reporting of economic evaluations should be further standardized to allow transferability and transparency [20]. Thorough and reliable economic evaluation should be an integral part of informing the decision-making of integrated care implementation.

Additional File
The additional file for this article can be found as follows: