The Effect of Network-Level Payment Models on Care Network Performance: A Scoping Review of the Empirical Literature

Introduction: Traditional payment models reward volume rather than value. Moving away from reimbursing separate providers to network-level reimbursement is assumed to support structural changes in health care organizations that are necessary to improve patient care. This scoping review evaluates the performance of care networks that have adopted network-level payment models. Methods: A scoping review of the empirical literature was conducted according to the five-step York framework. We identified indicators of performance, categorized them in four categories (quality, utilization, spending and other consequences) and scored whether performance increased, decreased, or remained stable due to the payment model. Results: The 76 included studies investigated network-level capitation, disease-based bundled payments, pay-for-performance and blended global payments. The majority of studies stem from the USA. Studies generally concluded that performance in terms of quality and utilization increased or remained stable. Most payment models were associated with improved spending performance. Overall, our review shows that network-level payment models are moderately successful in improving network performance. Discussion/conclusion: As health care networks are increasingly common, it seems fruitful to continue experimenting with reimbursement models for health care networks. It is also important to broaden the scope to not only scrutinize outcomes, but also the contexts and mechanisms that lead to certain outcomes.


INTRODUCTION
Fragmented health care leads to poor system and patient outcomes. Fragmentation manifests itself in a myriad of ways, such as duplication of services and lack of involvement, ownership, or communication [1]. Ageing populations and multi-morbidity amplify these issues, making it more relevant to address fragmentation. In order to do so, governments and policymakers increasingly rely on networks of health care organizations [2,3]. As an alternative to market or quasi-market structures, networks enable separate health care entities to work together and coordinate care [4,5]. However, the current ways of paying for care seem to impede coordination within networks. Providers are predominantly reimbursed separately, through traditional payment models such as fee-for-service (FFS) or diagnosis-related-groups (DRGs), leaving the paywalls between organizations intact [6]. It is widely assumed that most traditional models reward volume [7], discourage prevention [8], impede care coordination [7], and stimulate delivering the most profitable services [9]. In essence, traditional models are perceived as not being able to create the right incentives for the integration of care, leading instead to an array of misaligned incentives [10]. Moving away from separate provider reimbursement to network-level reimbursement would support interorganizational coordination, flexible use of resources between organizations, and innovation in delivery design and IT [11][12][13]. Subsequently, it is assumed that developing adequate network-level payment models is essential to achieving high-quality patient care. Health care purchasers, policymakers and providers have correspondingly initiated demonstrations and experiments with novel network-level payment models. However, to date, how these payment models contribute to network performance has not been systematically investigated.
The current study adds to previous research by considering all payment models that are aimed specifically at joint reimbursement of networks. Although previous reviews have focused on various subsets of payment models, these reviews have not made a primary distinction between disbursement to a network and to separate providers. For example, Cattel and Eijkenaar [8] focused on key design features of value-based payment (VBP) initiatives and included 24 papers that shed light on VPB effects, but on the initiative level rather than payment model level. Vlaanderen et al. [14] conducted an analysis of the characteristics of outcome-based payment (OBP) models and their effects in terms of structure, process, and outcome indicators. Kaufman et al. [15] provide an overview of utilization, care, and outcomes associated with accountable care organizations (ACOs) in the USA. Thus, VBP, OBP, and ACO models have been systematically reviewed separately, but an overview of all network-level payment models, transcending definitions of VBP, OBP, and ACO models, and their performance, is lacking. Our aim is to study how such network-level payment models affect the performance of networks. We summarize this in the following research question: what is the effect of network-level payment models on the performance of care networks? From the resulting comprehensive overview of performance indicators, policymakers and health care professionals can, depending on what performance indicators they deem important, make a more informed decision when implementing a networklevel payment model.

THEORETICAL FRAMEWORK PAYMENT MODELS, NETWORKS, AND PERFORMANCE
Payment models refer to the funding mechanisms that health care purchasers adopt in order to financially reimburse providers of care or, in this case, care networks. The term network-level payment model is used to indicate a payment model in which a set of providers or facilities are jointly reimbursed through a contracting entity (i.e., the network or one network provider), which in turn can then disburse the money received to the providers in the care network. Care networks are defined as sets of two or more legally autonomous providers [see 16] that are tasked with the coordination of care pathways and the execution of clinical interventions across providers [5]. The term provider is used to denote a practice, hospital, or other setting, and not an individual physician, unless otherwise noted. Network performance is defined as the ability of the network to satisfy the payment model's objectives as made explicit in the included studies. In our study, the taxonomy of payment models by Tsiachristas [17] has been used to identify and categorize networklevel payment models (henceforth referred to as payment models). Non-network-level models have been excluded from this taxonomy (see Table 1) as they are not the focus of our study.

INTENDED AND UNINTENDED CONSEQUENCES OF PAYMENT MODELS
How payment models incentivize structural change will depend on the payment model. It is assumed that, given the appropriate incentives, providers will be able to deliver the right care at the right time in the right way, and at the right place [18,19]. Under a capitation system, providers receive a periodic lump sum per enrolled patient for a defined set of services. This incentivizes providers to minimize costs, thereby encouraging them to innovate in cost-reducing technologies, select lower-cost alternative treatments, and invest in prevention. The downsides are increased financial risk for providers, and the temptation to stint on care and avoid high-risk patients, often referred to as 'cherry picking' [13,20]. Episode-based bundled payments cover medical services delivered during an episode of care. Providers are thereby encouraged to coordinate and organize care activities within an episodic bundle to eliminate unnecessary and expensive care and reduce costs [7]. However, there is little incentive to avoid unnecessary episodes [12] since more care episodes implies more revenue. Disease-based bundled payments have a broader scope, covering all the care required for a patient with a particular disease during a predefined period. As with episode-based bundled payments, coordination between providers is encouraged. Providers are incentivized to improve quality since they bear the financial burden of complications and avoidable services, such as hospital readmissions. For both bundled payment types, costs incurred that exceed the preagreed payment are at the expense of the provider and similarly if the costs are less than the payment, providers retain the difference. This approach may lead to stinting on care and cherry picking if adequate quality monitoring is not in place, and patient choice might be limited due to a limited and fixed provider set [12]. In another approach, a global payment is made to cover all medical services for a defined population during a period of time. In the literature, this term is used interchangeably with population-based payment and global budgets. A global payment model shares some properties with bundled payments and capitation but can offer greater managerial flexibility in allocating resources and enables innovation in delivery design [12,13]. A specific downside of global payments is that population health might be prioritized above individual health [12].
These basic payment models are often enhanced with additional payment formula: pay-for-performance (P4P), pay-for-coordination (P4C), risk and gain sharing and shared savings. Risk sharing arrangements, such as riskand-gain-sharing and shared savings, are intended to increase efficiency in care delivery [20]. In part, this works through weakening the providers' tendencies to overtreat patients [21]. Payers or providers can decide whether to agree to one-sided risk only (upside risk) or two-sided risk (upside and downside risk) and can also tweak the percentages of savings and losses that are shared [22]. In a one-sided risk arrangement, providers share only in gains, whereas in a two-sided risk arrangement gains and losses are both shared. Loss aversion theory argues that losses have a stronger psychological effect than have gains [23]. This implies that a two-sided risk arrangement will more strongly incentivize providers, and so have the potential to enhance performance. Providers that want to benefit from shared savings will have to improve in terms of quality and cost measures [24]. All the above payment models are risk-based, except for P4P and P4C. If employing P4P, providers receive a payment for meeting predetermined performance indicators, with the main goal being to improve patient outcomes. Newhouse [25, p.203] cautions however that "payment on specific process measures of quality […] can distort resource allocation to the measured areas and away from unmeasured areas". Hence, a disproportionate focus on measured aspects can be detrimental to aspects of care that are not incentivized [26]. Via P4C, a designated provider receives a payment to coordinate patient care across a set of services. This is intended to provide financial leeway for patient-provider and providerprovider communication, and to limit unnecessary services, and may furthermore increase "flexibility in how, where, and by whom care is provided" [12, p.5].

NETWORK INCENTIVES
Theoretically, all payment models in the taxonomy can provide incentives at the network level. Group-level or network-level payments or 'rewards' stimulate structural changes that are seen as preconditions for optimized patient care [11]. A switch from provider-level to networklevel reimbursement implies a switch from individual Table 1 Taxonomy of network-level payment models, adapted from Tsiachristas [17].
(i.e., provider or organizational) incentives to network incentives. The terms network and groups are used interchangeably in the literature on monetary incentives that underpin payment models. In general, networklevel incentives seem to be most effective when the delivery of health care services encompasses "significant interdependencies between group members" [27]. This presumes that, between network providers, high levels of clinical, professional, and organizational integration are present [28]. The intensity of network incentives might be attenuated by an increase in the number of providers working under the same target [29]. That is, an increase in network size leads to a weakening of incentives. Similarly, evidence from systematic reviews indicates that individual-level rewards are more powerful than network-level or group-level rewards [21]. In addition to the properties of the specific payment models discussed in the previous paragraph, such idiosyncrasies of network incentives might also influence performance.

RESEARCH METHODS
Given the broad nature of the research question [30], the polysemous nature of networks in health care, and the lack of uniform terminology of payment models [10], a scoping review was conducted. Scoping reviews are appropriate for topics where the field of literature is large, complex, ambiguous, and lacking in conceptual boundaries [31]. In our review, we complied with PRISMA-ScR reporting guidelines [32] and followed the five steps specified in the York framework, thereby allowing an iterative process. The process framework consists of (i) identifying the research question (see Introduction), (ii) identifying relevant studies, (iii) study selection, (iv) data charting, and (v) reporting on results [33]. In order to assess the evidence quality of studies, the Effective Practice and Organization of Care (EPOC) criteria table was adapted from Minkman et al. [34]. Evidence levels range from A (systematic reviews and RCTs), through B (controlled studies) and C (non-controlled studies), to D (descriptive, non-analytical studies).

IDENTIFYING RELEVANT STUDIES
To identify relevant studies, a broad systematic search was conducted in six bibliographical databases. An information specialist with expertise in improving literature retrieval for systematic reviews [see 35] was consulted to draft the search strings. The initial string consisted of terms similar to 'payment model' and 'interorganizational network'. A first search of four databases (Embase, Medline Ovid, Cochrane Central Register of Controlled Trials, and Web of Science Core Collection) yielded 3892 hits. Author 1 perused a sample of the identified studies to gain familiarity with concepts and identify additional terms that could serve as input for refining the search string [30]. This modified string was used for the second search in October 2019 and yielded 6069 hits including duplicates. For this search, two additional databases were consulted (EconLit ProQuest and CINAHL EBSCOhost) to further broaden the scope. The literature search was updated in November 2021, eventually yielding a total of 6953 studies including duplicates. Studies up to that date have been included with no earliest cut-off date set. Both the initial and final search strings are presented in the supplementary file. Alongside this bibliographical database search, reference lists were consulted to identify further studies that were eligible for inclusion.

STUDY SELECTION
Studies were included if they were of an empirical nature, peer-reviewed, reported an impact on network performance, described a network-level payment model intervention, and were from an OECD country. OECD countries were chosen since the social and health challenges in these countries call for a well-coordinated system approach [36] that networks can contribute to. Systematic reviews were excluded (although their reference lists were scanned for studies eligible for inclusion) as well as articles where the full text could not be retrieved and where the contents were evidently not related to our research question. A concise list of the exclusion criteria can be found in Figure 1, in which the screening process following the PRISMA guidelines is also illustrated [37]. All potential abstracts and titles were imported into EndNote X9 [38]. After deduplication, the remaining titles and abstracts were exported to an MS Excel workbook for further manual screening. All four authors were involved in the process. Before actual screening began, a sample of 90 papers was discussed to align the team members' interpretations of the exclusion criteria. For each potential inclusion, title and abstract screening was conducted by at least two reviewers independently in a double-blind fashion. Author 1 screened all titles and abstracts, and Author 2, 3, and 4 each screened one-third of the total. Inconsistencies were resolved between the two reviewers who had screened the specific title and abstract. Once this filtering process was completed, the full texts of the still potentially relevant papers were screened by Author 1, and another reviewer was consulted if there were doubts as to whether to include an article.

DATA CHARTING
First, each study was analysed to identify its year, author, country, methodology, intervention program, network configuration, payment model, payment flow, study population, sample size, the investigated indicators of performance, and if the performance on each indicator increased (+), decreased (-) or if there was no (statistically significant) effect (0) under the use of the payment model. The taxonomy discussed in the theoretical framework section was used to code payment models. A distinction is made between payment flows from payerto-network (i.e., to the network) and network-to-provider (i.e., in the network). As a final step, all the indicators were inductively placed in one of four categories [39]: (i) quality of care, (ii) utilization, (iii) spending, and (iv) other consequences. The fourth category is used for indicators that cannot be assigned to any of the first three categories. These tend to be more abstract measures such as 'level of collaboration' or 'level of integration'. A narrative synthesis of the evidence was conducted.

RESULTS
In total, 6960 studies were identified, including seven additional studies that were identified through reference list checks (see Figure 1). Of those, 427 were found eligible for full-text screening. This screening eventually reduced the number of studies to include in the qualitative synthesis to 76.

STUDY CHARACTERISTICS
A comprehensive overview of all the included studies can be found in Table 4 (see below). Most articles stem from the most recent decade (N = 71), and only two of the older five studies were published before 2000. Studies mainly employed quantitative research designs, and, if not, mixed-method designs were employed (see Table 2). Most studies were performed in the USA (N = 70), the others coming from Germany (N = 2) and the Netherlands (N = 4). This might explain the dominance of payments to ACOs as the networks under investigation. Capitation-based payments (N = 4), disease-based bundled payments (N = 5) and P4Ps (N = 8) were addressed in a total of 17 studies, while the remaining studies focused on global payments. The latter were often combined with additional components such as shared savings (N = 45), shared savings plus P4P (N = 13), and pay-for-coordination (N = 1). Most studies lacked precise network configuration descriptions and payment flows to a network (N = 68) were far more common than payment flows in a network (N = 8). The studied populations ranged from disease-specific groups to entire populations served by a network. The quality of evidence was mixed, but consisted predominantly of controlled studies (N = 65) (see Table 3). For studies with evidence level B, the results presented in Table 4 are statistically significant. For evidence C level studies, significance was only reported in two studies [40,41]. Given the exclusion criteria we had set, no studies were graded A (RCTs) or D (descriptive studies).

PERFORMANCE OF CARE NETWORKS
In general, the results of the studies show that payment models have diverse effects on the performance of a network.

Capitation
From the studies, it can be concluded that a capitation approach, both stand-alone or in combination with elements of risk-and-gain-sharing or P4P, is an effective payment model to reduce spending [42] and improve most types of health care utilization [42][43][44][45], without affecting the quality of care [45]. With regard to utilization, both timely discharge and the length of home health episodes showed the desired increase, and inpatient hospital admissions decreased as was anticipated [42,44,45]. Most visit types were positively impacted for home health beneficiaries and community-dwelling elderly: emergency department (ED) hospital visits and home health visits decreased, whereas office-based and preventive visits increased [42,45]. However, HMO enrollees experienced an unwanted decrease in physician visits [43]. No effects were found for one prevention activity (colonoscopy screening) and hospital readmission rates [45].

Disease-based Bundled Payments
Four out of five of the studies that considered diseasebased bundled payments to the network, had a focus on diabetes management programs [46][47][48][49]. In terms of utilization, use of specialist care decreased as expected and hoped for, but eye testing also decreased, and this had not been an intended outcome. All other measures of medical testing increased as was envisioned [47]. Furthermore, the use of institutional postdischarge facilities was successfully reduced [50]. The model negatively impacted performance on total spending, medical specialist and medication spending, but post discharge spending and primary care spending were curbed [46,47,49,50]. One qualitative study [48] mapped other consequences and found some positive effects (better collaboration, greater transparency, and better process quality) but also some negative ones (increased administrative burden, greater price variations, and unwanted dominance by GP care groups). Quality indicators were identified in one study, indicating no significant effect on mortality and a desired decrease in readmissions, with the exception that readmissions for medical episodes were not significantly affected if the bundled payment was not in the setting of an ACO [50].

Pay-for-performance
Of the eight studies on P4P, one described P4P as a means to reimburse on the network level [51], one focused on payment flows both within and to the network [52], while, in the rest of the studies, P4P was        used to make disbursements to individual providers in the network. Levin-Scherz et al. [51] only studied the utilization of diabetes-related services: screening and testing were successfully intensified, but a form of asthma therapy was unaffected. The results from the seven other studies are mixed in terms of both quality and utilization [44,[52][53][54][55][56][57]. Marton et al. [44] observed an unsought increase in the utilization of health care professionals, whereas utilization of outpatient clinics and length of stay were successfully reduced. Substance use disorder (SUD) screening, blood lead level screening and visits that focus on prevention (well care visits) increased as hoped. However, treatments for ADHD and SUD were not affected [54,56]. An overall composite measure of quality showed desired improvements [53], but a more detailed look reveals that the prevalence of asthma, pharyngitis, upper respiratory infection, and rotavirus were not affected, and the performance related to several types of immunizations varied widely [56].
Spending was investigated in one study, which found no significant effects on shared savings or outpatient spending [57].

Global payment with shared savings
Under this payment model, quality tended to improve and, if not, to remain stable [40,[58][59][60][61][62][63][64][65][66][67][68][69][70][71][72][73][74][75]. The same was true for spending [59-61, 66-68, 72-89], whereas the effects on utilization were more diverse [41, 58-61, 63, 67, 68, 72, 74-76, 82-84, 86, 88, 90-100]. Although quality improved overall, some negative outcomes could be observed. For instance, the percentage of patients that met the quality indicator for LDL-cholesterol testing and the number of people identified as having a depressive disorder had not improved, the latter hinting at an under-detection of depressive disorders [66,71]. Furthermore, medication adherence deteriorated in the first three years after payment model implementation, and adequate care for patients with depression was also negatively affected [88,100]. Findings related to spending performance were clearly mixed. Some studies indicated that spending was successfully curbed overall [59,73,74,85], whereas other studies showed no improvements in general [66,71,77,79,84,[86][87][88]. McWilliams et al. [68] found a more nuanced situation: declining spending rates for networks adopting this payment model in 2012 but not in those starting in 2013. These effects of the timing when a network adopts the model are visible specifically in the spending trends of hospital-integrated ACOs (as opposed to physician group ACOs) and for skilled nursing facilities [80,81]. Overall, shared savings arrangements with increased risk exposure show a more positive effect on spending than arrangements with less provider risk [67,72,82]. For arrangements with increased risk exposure, the differences in spending performance could be explained by the number of years using, and hence experience with, the model [66,72] and also by spending category (Medicare part D or A/B spending) [83].
Performance in terms of utilization varied widely, especially for visits and hospitalizations [58-60, 72, 96]. Some differences in visit rates seem to be explained by location and ACO-orientation (primary care or specialtyoriented) [60,96]. Furthermore, use of low-value care (i.e., care that does not or only minimally benefits patients) was not affected according to Modi et al. [94] whereas Schwartz et al. [82] did show favourable reductions. Heightened levels of provider risk did seem to play an important role in increasing testing: some studies showed that the amount of testing was successfully increased [59,67], although others contradicted this [68]. Findings on performance in terms of screening for breast cancer are contradictory. One study [93] observed an unwanted decrease in mammography screening, whereas other studies demonstrate desirable increases in screening [75] or appropriate screening (which refers to the practice of increasing screening rates for patients likely to benefit and decreasing screening rates for those unlikely to benefit) [41,95]. Rates for other types of cancer screening (cervical, prostate and colorectal) were successfully increased [75,93,95].
For all three categories (quality, utilization, and spending), indicator-level differences are in part attributable to geographical state [60], entry cohort [66,68,80,81,89,96,97], and performance year [66,72,75,88,89]. It was observed that performance does not necessarily improve with time, the effects may slip back from one year to the next. In terms of utilization, the type of disease that is being screened for [93,95] or the type of low-value service [82] seem to explain indicator-specific differences. Differences in quality at the indicator level (e.g., the number of readmissions) can be linked to the type of surgical procedure [61] or to the level of risk [66,69]. In shared savings arrangements with little provider risk, two of the ten measures of patient experience improved whereas, when there were higher levels of risk, improvements in patient experience were lacking [101]. Concerning other consequences, the proportion of vulnerable patients served by physician groups was not significantly changed, neither was the adoption of novel technologies for six surgical procedures [99,102].

Global payment with shared savings and pay-forperformance
This payment model led to some improvement in utilization rates [103][104][105], in quality [103,[106][107][108][109][110][111], and in spending [105][106][107][108][109]. Utilization did improve for tobacco cessation treatment with increased use of related therapies and drug regimens [104]. In contrast, with the exception of LDL-cholesterol testing, this model had no effect on testing and screening, overall drug utilization, and admission rates for ambulatory-care-sensitive conditions (ACSCs) [103,106,112,113]. The model's effects on substance use disorder services depended on the patient population [103]. The majority of quality indicators showed positive results. Adult preventive care quality (an aggregate indicator for several screening measures and antibiotic use) improved over time [107][108][109] and Chien et al. [110] revealed that quality in terms of measures linked to P4P improved but that no effects were observed for quality measures not tied to P4P. Except for patients up until 21 years of age, total medical spending was successfully contained under this payment model [110]. For specific spending indicators, the findings varied, with SUD spending and drug spending trends unaffected [103,112]. Turning to other consequences, Blewett et al. [114] showed that adopting this payment model in the setting of the Integrated Health Partnership in Minnesota led to the forming of community partnerships and service integration.

Global payment with shared savings and pay-for-coordination
Only one study, on the Total Cost and Care Improvement (TCCI) initiative, investigates a model that combined a global payment with shared savings and pay-forcoordination. Afendulis et al. [115] showed that this specific model had no effects on either utilization or spending, while quality was not investigated.

DISCUSSION
This review compiles the current evidence on the effect of various network-level payment models on the performance of care networks. The empirical results on performance for a set of payment models are mixed. Overall, no single payment model was associated with consistent improvements in network performance on all three criteria categories (utilization, spending, and quality). However, a more detailed look at the individual categories reveals some insights. First concerning quality, the papers reviewed found that, depending on the quality indicator investigated, quality generally increased or at least remained stable under whichever payment model they were investigating. The same can be said for utilization. Furthermore, all but two payment models showed improved performance in terms of spending. A negative effect on spending performance was found when adopting the disease-based bundled payment model, which failed to curb spending in most instances. Looking at other consequences of these payment models for care networks, some had identified improvements in performance indicators related to collaboration. However, these conclusions were almost entirely related to the effect of making payments to the network, and the very few studies that investigated payments within the network only addressed the P4P model. Our findings support most, but not all, of the theorybased expectations of the effects of payment models on network performance. The expectation is that, under risk-based payment models such as capitation, diseasebased bundled payment, and global payment, providers will be incentivized to minimize costs, control their volume by proactively monitoring utilization and spending, and invest in prevention to curb downstream health care use [13,20,116]. However, our analysis indicates that only capitation proved able to improve performance in terms of both spending and utilization. When applying disease-based bundled payments, performance in terms of utilization improved as predicted, but spending was not contained. In their study, Mohnen et al. [46] suggest that these results could be due to the negotiated contract working out well for the provider (a high bundle price) and that the short length of their study following the introduction of the scheme might not reveal longer term effects. Turning to the global payment approach, performance in terms of spending and utilization in the various studies was found to generally improve or at least remain stable. In the studies where shared savings had been added to the basic global payment approach, we found that shared savings arrangements where there was a significant risk element showed somewhat better performance in terms of spending compared with arrangements with less risk. This finding corresponds with the view that risk sharing arrangements induce cost-conscious behaviour [117]. The payment models discussed above are, by their very nature, more focused on cost containment then on quality improvement [13,118]. This focus has the associated risk of stinting on care [12]. However, our results do not reveal any adverse effects on the quality of care: quality improved or remained stable, with no clear differences between the models.
P4P has gained much attention in the scholarly literature as it is expected to enhance performance by financially incentivizing providers to deliver the best care. However, the evidence from our analysis is not consistently positive, a finding that is in line with earlier reviews of P4P [119,120]. Further, our results do not convincingly demonstrate that P4P has added value over approaches based on a global payment plus shared savings. That is, no meaningful performance differences could be discerned between global payment plus shared savings arrangements with or without additional P4P. Cattel and Eijkenaar [8] offered a potential explanation for this: that P4P is only a small part of the total reimbursement received by a provider. Following this line of reasoning, the P4P incentive in relation to global payment plus shared savings might thus have been too small to have a significant impact on performance.
Also, our results show that the relation between payment models and effects is not necessarily stable but depends on several other factors. For instance, our results suggest that the cohort entry year (starting year of the payment model), scope of services explain differences in performance, and timing of the performance assessment (years since implementation of a payment model). In terms of entry cohort, our review shows that early ACO entrants seem to do better overall in improving performance. Related to this, McWilliams et al. [81] found that, for ACOs offering a wide range of services (hospital-integrated ACOs) -but not for narrow-scoped ACOs -there were performance differences between early and late adopters. Others have also identified scope of services as one of eight organizational attributes that might possibly explain performance differences between early and late adopters, alongside other attributes such as prior experience with payment reform [121,122]. In terms of changes in the years following the introduction of network payments, it seems that initial performance improvements tail off in later years. Thus, improvements might not continue and may even recede as time goes by. These studies that give insight in performance on the longer term, have a maximum span of three to five years. Other than this, evidence on the sustainability of incentives that derive from the payment models is lacking. More research on incentive sustainability and, accordingly, longer term impact on performance is warranted. Next to 'how long' performance is observed, it is important to emphasize 'what' performance is observed, or, neglected. Except for indicators of quality, patient-reported experience and outcome measures (PREMs and PROMs) have hardly been encountered in our study. As such, it can be argued whether the patient perspective is sufficiently covered in the indicators. This review has several limitations. First, the insights are mainly drawn from studies in the USA. ACOs were formed after the passing of the Affordable Care Act in 2010 as an instrument to improve patient care but also to reduce costs, in order to tackle the 'affordability crisis' of the US health system [123]. This context might possibly explain the focus of the USA setting in our review, which limits generalizability. Another limitation is that the implementation of alternative payment models was generally part of a myriad of concurrent interventions, making it difficult to disentangle the effect of a payment model from those associated with other interventions. Additionally, the studies that investigated non-commercial ACOs (Medicare Shared Savings Program and Pioneer) were not explicit as to whether the risks associated with shared savings were one-or two-sided. Hence, we cannot draw any inferences on the relation between the sidedness of risk and performance.
It seemed that networks are generally able to improve their performance under the investigated payment models, it only occasionally remained unchanged and rarely deteriorated. It would be valuable to investigate what circumstances are required to achieve a certain performance. This aspect was emphasized by Kaufman et al. [15, p.270] who state that "looking at outcomes alone misses important information regarding what it takes to produce those outcomes". Here, further research could adopt a mixed-methods approach, combining qualitative research, to uncover contexts, mechanisms, and interpersonal dynamics within networks, with quantitative methods that measure quality, utilization, and spending outcomes on the network level. This contextual and interpersonal perspective would be a valuable addition to studies that have comprehensively investigated the more technical aspects of payment reform such as key design features of payment models [14,124,125]. Furthermore, although bundled payment evaluations are omnipresent in the literature, more research is needed into multi-provider bundled payments, as most evaluations focus on single provider bundled payments. Additionally, to date, provider participation in reformed payment methods is largely voluntary, although policymakers are exploring the possibilities of mandatory participation [126]. Developing a 'theory-based understanding' [127] of contexts and mechanisms -payment being one of many mechanisms [128] -under which certain outcomes are produced could help providers prepare for future, possibly mandatory, payment reform.

CONCLUSION
The aim of this study was to unravel the effects that network-level payment models have on the multidimensional (quality, utilization, spending, other) performance concept in care networks. Although network-level reimbursement schemes are still in their infancy, our review shows that network-level payment has the potential to improve network performance. Given that health care networks are becoming increasingly common, it seems fruitful to continue experimenting with network-level payment models. In future studies, it will be important to broaden the scope beyond only outcomes and to also take contexts and the mechanisms through which networks adopt and implement payment models into account.