An Instrument to Measure Maturity of Integrated Care: A First Validation Study

Introduction: Lessons captured from interviews with 12 European regions are represented in a new instrument, the B3-Maturity Model (B3-MM). B3-MM aims to assess maturity along 12 dimensions reflecting the various aspects that need to be managed in order to deliver integrated care. The objective of the study was to test the content validity of B3-MM as part of SCIROCCO (Scaling Integrated Care into Context), a European Union funded project. Methods: A literature review was conducted to compare B3-MM’s 12 dimensions and their measurement scales with existing measures and instruments that focus on assessing the development of integrated care. Subsequently, a three-round survey conducted through a Delphi study with international experts in the field of integrated care was performed to test the relevance of: 1) the dimensions, 2) the maturity indicators and 3) the assessment scale used in B3-MM. Results: The 11 articles included in the literature review confirmed all the dimensions described in the original version of B3-MM. The Delphi study rounds resulted in various phrasing amendments of indicators and assessment scale. Full agreement among the experts on the relevance of the 12 B3-MM dimensions, their indicators, and assessment scale was reached after the third Delphi round. Conclusion and discussion: The B3-MM dimensions, maturity indicators and assessment scale showed satisfactory content validity. While the B3-MM is a unique instrument based on existing knowledge and experiences of regions in integrated care, further testing is needed to explore other measurement properties of B3-MM.


Introduction
Health systems around the world are under great pressure to drive forward transformation in order to meet the evolving needs of their populations.The traditional disease orientated approaches currently provided no longer suffice to meet the people's needs [1].In many countries care is too often fragmented and has clear deficiencies in quality, inducing low responsiveness of the health system and low satisfaction with health services [2].To address these challenges, the transformation towards integrated care has the potential to repair deficiencies in order to obtain accessible, quality, effective and sustainable health care.
Integrated care initiatives are being developed around the world [3][4][5].Countries in Europe have endeavoured to improve the performance of their health systems.Many have implemented different types of integrated care programmes representing a diversity in their nature and scope of approaches [6].This is not surprising since the transition to integrated care is a complex undertaking, while sufficient support of a systematic understanding of integration in health systems is scarce [7][8][9].The complexity of integration is reflected in the definition of integrated care provided by Kodner [10, p. 12]: "[a] multilevel, multi-modal, demand driven and patient-centred strategy designed to address complex and costly health needs by achieving better coordination of services across the entire care continuum.Not an end in itself, integrated care is a means of optimizing system performance and attaining quality patient outcomes."As a response to the call for the establishment of a common language and framework of integrated care to better understand integrated care and guide empirical research [10,11], several studies have attempted to clarify the concepts underpinning integrated care [8,12].
In particular, the conceptual framework of Valentijn et al. can be used to aid an understanding of the concept Introduction: Lessons captured from interviews with 12 European regions are represented in a new instrument, the B3-Maturity Model (B3-MM).B3-MM aims to assess maturity along 12 dimensions reflecting the various aspects that need to be managed in order to deliver integrated care.The objective of the study was to test the content validity of B3-MM as part of SCIROCCO (Scaling Integrated Care into Context), a European Union funded project.Methods: A literature review was conducted to compare B3-MM's 12 dimensions and their measurement scales with existing measures and instruments that focus on assessing the development of integrated care.Subsequently, a three-round survey conducted through a Delphi study with international experts in the field of integrated care was performed to test the relevance of: 1) the dimensions, 2) the maturity indicators and 3) the assessment scale used in B3-MM.Results: The 11 articles included in the literature review confirmed all the dimensions described in the original version of B3-MM.The Delphi study rounds resulted in various phrasing amendments of indicators and assessment scale.Full agreement among the experts on the relevance of the 12 B3-MM dimensions, their indicators, and assessment scale was reached after the third Delphi round.

Conclusion and discussion:
The B3-MM dimensions, maturity indicators and assessment scale showed satisfactory content validity.While the B3-MM is a unique instrument based on existing knowledge and experiences of regions in integrated care, further testing is needed to explore other measurement properties of B3-MM. of integrated care [8] and shows the complexity of what transformation to integrated care delivery entails.The framework identifies key elements for achieving integrated service delivery which are organised into six dimensions of integration.The features have complementary roles on the micro (clinical integration), meso (professional and organisational integration), and macro (system integration) levels to deliver comprehensive services that address the needs of people and populations.Functional and normative integration establish connectivity across the micro, meso and macro level.Transitioning to integrated care involves various related activities taking place at different levels of the health system and including diverse actors and organisations with various perspectives on integrated care.
To achieve a successful transformation to more integrated care systems, insights are needed into what factors contribute to the progress and success of integrated care interventions.However, there is a lack of substantiation of the working mechanisms in integrated care, partly stemming from poor or non-existent evaluation and measurement of integrated care interventions [13].
Capturing the complexity of measuring integrated care is a challenging task.A first difficulty is that measurement is complicated by the conceptual haziness adhering to integrated care which thus limits the theoretical foundations of existing instruments [14].A second difficulty is the demonstration of association between the changes in services as part of the integration efforts' and their outcomes [15][16][17].The linkage between changes in services and service outcomes is problematic because most patient or service user outcomes do not emerge from linear cause and effect chains [18].Current measures of quality in health care, such as the structure-processoutcome model, do not clarify the underlying mechanisms governing the components of integrated care [19][20][21][22].A sound analytical method for evaluating the outcomes of integrated care programmes, which would provide insight into why and where they are effective, is lacking [22].Moreover, a recent review by Bautista et al., which has compared 200+ instruments of integrated care by looking at their measurement properties, found that most measurement properties of existing instruments need to be improved [23].Hence the need for measurement, preferably based on sound evidence, of integrated care interventions which captures the complexity of integrated care as reflected in its multiple components and dynamic nature.This measurement will enable identification of potential problems in progress.The insight obtained into the relevant success factors will support the further development of integrated care.
A number of measurement models may be helpful in this activity.One model which has taken the complex dynamic and multiformity nature of integrated care into account and provides insight in the development process of integrated care, by describing four developmental phases, has been designed by Minkman et al. [12].This model intends to be used as a quality management model for integrated care supporting the further development of integrated care practices.The model is, however, developed and used in the context of Dutch disease management programs and it is not clear how the approach extends to complex co-morbidities and long-term conditions.Furthermore, to transfer knowledge about successful integrated care interventions to other settings and thus support the development and scaling up of these interventions, it is important to consider the specific local conditions that influence the implementation and sustainability of a particular integrated care intervention [6].
A second model, has been developed by the B3 Action Group on Integrated Care of the European Innovation Partnership on Active and Healthy Aging (EIP on AHA) [24].Unique to the B3-Maturity Model (B3-MM), is that it is derived from a pragmatic bottom-up approach with decision-makers involved in integrated care delivery from 12 European countries.These experts were interviewed about how healthcare systems are attempting to deliver more integrated care services to citizens.The rich collection of lessons learned are structured into 12 dimensions and reflect the various activities that need to be managed in order to deliver integrated care [25].The B3-MM explicitly focuses on the need to understand the context and environment (i.e. the regional delivery system and political and organisational environment) of integrated care interventions.The goal of the B3-MM is to provide a self-assessment tool for European regions to assess their maturity in the provision of integrated care, thereby revealing strengths and areas for improvement.To demonstrate B3-MM's full potential as a tool for measuring the maturity of integrated care, testing and validation of the tool is, however, needed.
Validity is an important feature in selecting or applying an instrument and is defined as 'the degree to which an instrument truly measures the construct(s) it purports to measure' [26].The construct is a well-defined and precisely demarcated subject of measurement.Three forms of validity can be determined: content validity, criterion validity and construct validity [27].The purpose of a content validation study is to determine whether the instrument adequately represents the construct under study [28].Assessing content validity of an instrument is useful as it provides information on the representativeness and clarity of each item of the instrument.Furthermore, substantial suggestions are obtained to improve the measure, saving numerous revisions of the untested measure through several pilot evaluation studies [29].The improved instrument can then be used in a pilot study to assess other psychometric properties.
This study has two objectives.As part of the European project SCIROCCO [30], it is to test the appropriateness of B3-MM's dimensions, maturity indicators and assessment scale.Moreover, it also aims to test the content validity of B3-MM.In doing so, this paper reports on the following research questions fundamental to the study: 1. How is maturity of integrated care measured by instruments identified in the scientific and nonscientific literature? 2. How relevant are the dimensions, maturity indicators and assessment scale of B3-MM according to international experts in the field of integrated care?

Strategy for expansion of integrated care
Since 2013, the B3 Action Group on Integrated Care of the EIP on AHA has been collecting good practices in integrated care in Europe [25].The extensive collection of good practices has provided a better understanding of the existing solutions, resources and expertise in integrated care delivery.However, the collected good practices are often limited to a particular pilot, project or region and the ambitions of the EIP on AHA and the B3 Action Group in particular is to promote the scaling up of these local initiatives throughout Europe [25].In order to meet this ambition, the challenge remains on how to best leverage the existing evidence and support scaling up of good practices in Europe.
There is an increase in literature describing frameworks for scaling health interventions, the majority of which has an explicit focus on scaling up health actions in low-and middle-income country contexts [31].However, literature is sparse on scaling up long-term care innovations in developed healthcare systems [32].A 2016 study by Nolte et al. [4], examining three pilots in integrated care delivery, focussing on their development, implementation and sustainability including how they impacted the wider system context.It showed that the wider dissemination of the projects studied occurred in an incremental and somewhat random way.To guide the necessary transformation a formal strategy for expansion is needed [4], preferably based on sound evidence.

The B3 Maturity Model
Recognizing the need for a structured approach which could stimulate path-breaking changes towards more sustainable health and care systems, partners of the B3 Action Group on Integrated Care of the EIP on AHA developed the B3-MM (to obtain a more standardised approach for scaling-up integrated care throughout Europe).The B3-MM is derived from an observational study, based on interviews with decision-makers in 12 European countries, or regions within a country, responsible for health care (namely Attica, the Basque Country, Catalonia, Galicia, Northern Ireland Saxony, Medical Delta, Olomouc region, Puglia region, Scotland, Skane, and South Denmark) over 18 months in 2014-2015.The interviews involved asking three sets of questions, to uncover i) the extent of integration already achieved, ii) the journey taken to get to this point, and iii) a view of future plans and investments.The outcomes of the study served as the baseline for the development of the B3-MM [25].The maturity model intends to serve as a selfassessment tool for regions or health care systems that aim to assess progress along 12 dimensions.These dimensions reflect the various aspects to be managed in order to deliver integrated care (Figure 1) [30].By considering each dimension, assessing the current situation within a healthcare system, and allocating a measure of 'maturity' (on a 0-5 scale), it should become possible to develop a simple graphical representation (i.e.spider diagram) of maturity level of the region/healthcare system, including its strengths and weaknesses in the path towards integrated care delivery.Using these insights, and comparing the findings with other regions that have conducted the same self-assessment process, should inform the complementary regions or healthcare systems about the possibility to progress with further knowledge transfer activities in order to improve their maturity in integrated care.The process of information sharing on lessons learned could help other regions is expected to speed up the adoption of integrated care.

Maturity Models
The art of measuring maturity displayed in the Capability Maturity Model (CMM) was introduced in the mid 80's by the Software Engineering Institute (SEI) -Carnegie Mellon University [33].Since then, several disciplines, especially those in the field of information systems, have successfully used maturity models as a way to assess the value and improve the competence of organisations [34].A maturity model is regarded as a conceptual model which is characterised by the display of several maturity levels representing the developmental capabilities of organisations [35].A limited number of studies have adapted these maturity models to the healthcare domain or proposed healthcare specific maturity models [36].Blondiau et al. [37,p. 2] state that these healthcare specific maturity models show "a staged representation of an actual state in relation to a potentially achievable goal state and a description of steps required to achieve this objective."Several models already exist, including the HIMSS Analytics Continuity of Care Model [38], TEMPEST (an Integrative model for health technology assessment) [39], and Maturity Matrix to Support Health and Social Care Integrated Care Partnerships [40].However, the three above mentioned models tend to be either too simplistic [38,40], focusing for example only on the functionality of IT systems, or rather complex, given that they offer a volume of different factors to consider [39].
By regarding the development, implementation and scaling up of health innovations as a multi-stage process [41], the rationale of the B3-MM being a maturity model should be found in the evolution of integrated care services.The B3-MM displays the development of a regional system on several dimensions to achieve integrated care delivery.The B3-MM is considered to be a practical model, and thus easy to use, to comprehensively assess the maturity of the progress of a regional system thereby uncovering gaps and areas for improvement in the development of integrated care delivery.The insights gained by employing the B3-MM are intended to be used as a starting point from which regions with complementary strengths and weakness could be matched and start to share their lessons learned on specific areas.In doing so, the B3-MM is intended to be used to guide the process on how developments in one jurisdiction can inform developments in other regions.This process allows a tailored approach for regions in their journeys towards (more) integrated care as the regions can decide for themselves which areas attention should be paid to in order to speed up the adoption of or the development towards integrated care.Depending on the local needs of the regions, these tailored approaches can operate at the several dimensions and levels of integrated care [8].Sensitivity to the differences between countries in terms of local needs and context to obtain a tailored approach for achieving progress in integrated care is important, as external contextual factors have been found to support the successful implementation of an integrated care model [42].

Methods
In this study, we conducted a literature review and a Delphi study to test the content validity of B3-MM as instrument to measure the level of maturity of integrated care.Content validity can be determined using both quantitative or qualitative methods [43].A qualitative approach consists of an accurate analysis of the representativeness and clarity of items in the literature and by consultation of experts [44].Evidence of content validity is usually obtained by having knowledgeable people look at the test items and make judgments about the appropriateness of each item and overall coverage of the domain [45].

Literature review
A review was conducted to identify articles, papers and/or reports focusing on measures and instruments of the maturity of integrated care.Moreover, we were

BREADTH OF AMBITION STANDARDISATION & SIMPLIFICATION
interested in describing and comparing the dimensions, indicators, measurement scales, and the psychometric property content validity of the selected measures and instruments.
The literature search consisted of two parts.For the first part, we built on the work of Bautista et al. [23] who recently conducted a systematic review in MEDLINE/PubMed on measurement properties of instruments measuring integrated care.The authors selected articles from the systematic literature review which focused on measures and instruments of the development of integrated care with indications of "maturity", "phase", "level", or "degree" of integrated care.
To broaden the search for articles, a narrative review was undertaken.In narrative reviews, the authors have the objective to identify, evaluate and synthesize what is already known about a topic [46].The preliminary search started in the electronic databases Cochrane, Google, Google Scholar, GreyLit, IDEA and OpenGrey using a combination of search terms, as shown in Table 1.The final search was restricted to the databases which retrieved adequate hits; Google (Filter: English only), Google Scholar and IDEA.The search terms used included terms referring to the construct, integrated care, and terms referring to an instrument.We used the terms from the study of Bautista et al. [23] who derived the terms from the work of Uijen et al. [47] and Terwee et al. [48].We added search terms indicating a measurement feature of an instrument.The final key terms used in the ultimate search strategy are presented in Table 5.
To be included in the review, we used the two eligible article criteria: 1. availability of full-text English document; (Due to the large number of hits, we limited the search to that of English language only when possible); 2. description of items/constructs/measurement scales of measures and/or instruments on the maturity of integrated care.
First, one researcher (LG) screened the titles and abstract of the articles from the main search in the three databases to identify articles for full text read.Two researchers (LG and HV) independently screened the full texts to select articles to be included in the final review.
Data extraction and analysis of the literature review Data were extracted by looking for descriptions on dimensions, indicators, and measurement scales in the selected articles which matched with the 12 dimensions, maturity indicators and assessment scale of the B3-MM.We marked all matching items and listed them in a table developed in MS EXCEL.Descriptions on dimensions, indicators or measurement scales in the selected articles which did not match, but which could nevertheless provide an addition to B3-MM, were also identified.Furthermore, we evaluated the overall quality of the measurement property content validity (definition in Box 1) of the instruments identified in the narrative review based on the criteria used by Bautista et al. [23].
In quality assessment, there is an important distinction between the quality of a study on measurement properties and the quality of an instrument [49].In the article by Bautista et al. [23], the quality assessment of the studies and the instruments is guided by the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) [26,28,50,51].In this study, the overall quality of the content validity for the instruments was assessed by the researchers (LG and HV) using the criteria for the levels of evidence and overall assessment of measurement properties of instrument (Table 2) by determining four factors [23,27,47,52].The first factor includes the number of validation studies per instrument.Snowball sampling and hand searching in Google and Google Scholar were performed to identify validation studies on the retrieved instruments from the narrative search.The second factor concerns the assessment of methodological quality of the studies relating to content validity.This assessment was based on the criteria of the COSMIN checklist [28] using the four-point scale in the COSMIN checklist.A study was rated as poor, fair, good, or excellent according to its measurement property content validity.The third factor is about the assessment of the direction of results of the measurement property content validity (whether positive or negative).This was rated using the modified criteria as presented in Table 3 [47].
Box 1: Definition of measurement property content validity (adapted from Uijen et al.) [47] Content validity: the degree to which the content of an instrument is an adequate reflection of the construct to be measured.The fourth factor entails the assessment of the consistency of several studies on the same instrument.

Delphi study
To test the appropriateness of the items of the B3-MM to measure maturity of integrated care, an international Delphi study was performed.The Delphi technique is a widely used research method in healthcare research, which consists of "a series of data collection 'rounds' to capture and structure the knowledge and opinions of a 'panel' of participants on a topic with which they are perceived to have expertise" [42, p. 208].

Selection of experts
The experts were selected on basis of relevant experience in scientific research or having a practical background (medicine, nursing, managerial, policy making) with relevant experience in the development, implementation and/or monitoring of integrated care interventions.An overview of the type of experts who were invited to the first round of the Delphi survey is presented in Table 4.A total number of 55 experts received the email invitation that included information about the purpose and process of the study and a link to an online version of the questionnaire in SurveyMonkey.We asked the experts to commit their participation in two planned Delphi rounds.

First Delphi round
In the first Delphi round, experts were asked to rank the relevance of the dimensions, indicators and assessment scale of B3  [55], referring to the assessment of both the actual rank and the optimum rank of integration, could provide a meaningful addition to the assessment scale as  used in B3-MM.Finally, experts were asked if they had any additional comments/suggestions on B3-MM or the survey.
The survey was anonymised and a single reminder email message was sent to the experts.To diminish potential misunderstandings concerning the interpretation of the survey, the first survey round was pre-tested by two researchers (YM, LB).The survey was adjusted to reflect their feedback, including a clearer introduction to part B and C of the survey about statements on the assessment of the relevance of each indicator and scale.Experts were invited to the first survey in three different streams due to the arrival of late responses to the call for experts.The respondents were given one and a half weeks to complete the first survey.

Second Delphi round
The items for which insufficient agreement was found were rephrased by partners of the SCIROCCO consortium and presented to experts in the second Delphi round.A total number of 44 experts were invited to the second round.They were asked to reassess the relevance of the refined maturity indicators of the B3-MM items on the same 9-point Likert scale.Furthermore, they were asked to what extent they considered the addition to the assessment scale relevant by assessing both the actual rank and the optimum rank of integration using the B3-MM.Again, the experts were asked if they had any comments on the rephrased items or feedback on the survey.The second invitation included a report on the outcomes of round one of the Delphi exercise, including (1) a median agreement rating (interquartile range (IQR)) on every statement, (2) the level of agreement among the experts, (3) the level of disagreement among experts, and (4) whether consensus had been achieved.After discussion among the researchers and members of the SCIROCCO consortium, it was decided to exclude certain participants from the exercise due to a perceived conflict of interest: five members from the SCIROCCO advisory board (who had not participated in the first round of the exercise) and two active members of the SCIROCCO consortium (who had participated in the first round) were excluded from further participation.Again, the experts were given one and a half weeks to complete the second Delphi round.

Third Delphi round
The third Delphi round was conducted to explore the level of agreement among experts on the items with insufficient agreement in the second Delphi round.These items were rephrased by partners in the SCIROCCO consortium.Using the same 9-point Likert scale, experts were asked to reassess the relevance of the refined features of the B3-MM.The 13 experts who participated in the second round were re-invited to participate in the third Delphi round.The invitation included a report on the outcomes of the previous round, including (1) a median agreement rating (IQR) on every statement which was included in the second round, (2) the level of agreement among the experts, (3) the level of disagreement among the experts, and (4) whether consensus had been achieved.Experts were given the opportunity to provide feedback on the survey.Due to the project's deadlines and the small number of statements in the third round, experts were given one week to complete the last round.

Data analysis of the Delphi study
Before conducting the Delphi survey, we defined the conditions of agreement among experts to be applied during the three Delphi rounds.In order to determine consensus within a Delphi study, many studies use a predefined level of agreement among the experts [56].However, no standard threshold for consensus is offered by the literature [53], with thresholds for consensus ranging from 55%-100% [57].In our study we decided on using a 75% cut off point, which is suggested and used by several studies to clearly differentiate the consensus and non-consensus results [53,58,59].The 9-point scale was classified in three options; 1-3 as irrelevant, 4-6 as equivocal and 7-9 as relevant.The experts' overall consensus on every statement on the items in the B3-MM was analysed using the median of the group's scores and the "level of agreement" reached.Agreement among the experts on every statement on the items in the maturity model was reached when more than 75% of the experts' ratings were within the same three-point range (that is, 1-3, 4-6, or 7-9) as well as the observed median.Several studies use a cut-off point of more than 75% of participants scoring 7 to 9, and include the condition (without disagreement) that less than 15% of the participants should have a scoring of between 1 to 3 [60,61].In this study, we used the 75% threshold for reaching consensus, including the condition that less than 15% of the participants should have a scoring in the opposite range of that scale (Figure 2).Furthermore, the qualitative comments derived from the answers to the question on the optimum and actual rank, and the comments/suggestions on B3-MM and feedback on the survey were analysed using a qualitative approach.
Analysis were performed in MS Excel.Under Belgium law no ethical approval is required to interview experts as part of a Delphi panel.

Literature review
Out of the 300 articles included in the study of Bautista et al. [23] a total of seven articles were selected for our review [55,[62][63][64][65][66][67].From the narrative search, an additional number of four articles were retrieved.One duplicate full-text article from Bainbridge et al. [68] selected from Google and Google Scholar described a framework to guide evaluation and a more recent study was available describing the instrument which was based on this framework [69].We included this article in the review instead of the initial full-text article retrieved.Details on the review process are presented in Figure 3.The combination of final search terms used for each database, date searched and the hits retrieved are shown in Table 5.The characteristics of the selected articles are shown in Appendix A.

7-9
Overall, there is considerable similarity between the content of the original B3-MM model, and the instruments described in the articles selected from the literature review.All 12 dimensions and the related indicators described by the B3-MM corresponded with the content of the 11 retrieved articles (Table 6).Two dimensions of the B3-MM ("Information and eHealth services" and "Breadth of Ambition") were described by all 11 articles.The content of over half of the articles matched with descriptions of ten of the dimensions.Less than half of the selected articles described items which matched with the two dimensions, "Population Approach" and "Innovation Management".Apart from looking for matching descriptions, we searched for the use of possible dimensions, indicators or measurement scales which are not part of B3-MM (as it existed at the start of the project), and could complement or refine the B3-MM.One measurement scale was found which could provide a complement to the B3-MM: it was retrieved from the study of Ahgren & Axelsson [55].They use a measurement model that can be used to evaluate the degree of integration, focusing on the functional aspects of clinical integration in arrangements of integrated care.In their model, the actual and the optimum rank of integration between units of the health authority are rated.This measurement feature could provide an extension to the B3-MM.It would enable the B3-MM to assess both the actual rank and the optimum rank of integration.Thus, it would provide a contextual explanation for the current situation in integrated care delivery while measuring the maturity of integrated care.This issue was further explored in the first two rounds of the Delphi study.
Regarding the assessment of the measurement property content validity of the instruments, we retrieved the data on the assessment of the overall quality rating score from the review of Bautista et al. [23] for the seven instruments selected from their study.Out of the 4 articles retrieved from the narrative review, three instruments were identified.No other validation studies on those three instruments were found by the hand searches and snowball sampling.In the dissertation included in the review concerning validation of the DMIC, three more validation studies were found.The results on the quality of the studies, the direction of results and the overall quality of the measurement property content validity of the instruments are shown in Table 7.

First round
A total of 31 experts responded to the first survey round (response rate 56%).Three experts did not complete the survey.Furthermore, two experts were excluded due to a conflict of interest.The final analysis included 26 experts (84% completion rate).Reasons for non-participation included one delivery failure, one retirement, and two time constraints.The rest of the respondents did not provide reasons for not participating.
The outcomes on every statement of the first Delphi round are shown in Appendix B. Sufficient agreement was found among the experts on all 12 dimensions of B3-MM.Insufficient level of agreement was found for the first few indicators per dimension.Additionally, sufficient agreement was found on the assessment scale of the dimensions, except for the scale of "Innovation Management".
Comments and suggestions with regard to the dimensions, indicators or assessment scale of B3-MM were provided by 17 out of 26 experts (65.4%).Although three experts provided positive comments with regard to the B3-MM, three other experts commented that some dimensions were unclear or that indicators in some of the dimensions were already covered by other dimensions.A total of five experts commented that some indicators/scales were ambiguous or contradictory and did not follow a logical structure.From the experts who provided feedback to the survey, two experts stated that the survey was difficult to understand and four experts did not fully understand the scale assessment in part C.
Regarding answers to the question about assessing the actual and optimum rank of integration, 22 out of 26 experts (84.6%) agreed that the actual and the optimum ranks of integration should be taken into account when    3 eHealth deployed in some areas, but limited to specific organisations or patients 1.4 Voluntary use of regional/national eHealth services across the healthcare system 1.5 Mandated or funded use of regional/national eHealth infrastructure across the healthcare system 1.6 Universal, at-scale regional/national eHealth services used by all integrated care stakeholders 1.2 Debate on information standards (e.g., coding, formatting); exploration of options for consolidating ICT 1.3 A recommended set of agreed information standards at local level; a few local attempts at ICT consolidation 1.4 A recommended set of agreed information standards at regional/national level; some shared procurements of new systems at regional/national level; some large-scale consolidations of ICT underway 1.5 A unified set of agreed standards to be used for system implementations specified in procurement documents; many shared procurements of new systems; consolidated data centres and shared services widely deployed 1.6 A unified and mandated set of agreed standards to be used for system implementations fully incorporated into procurement processes; clear strategy for regional/national procurement of new systems; consolidated datacentres and shared services (including the cloud) is normal practice.Minkman [12] Excellent + Minkman [12] Excellent + Minkman [12] Excellent + Minkman [12] Excellent + Longpré [71] Fair ?
a Data on direction of results per instrument was summarised in the review of Bautista et al. [23].No individual data per instrument was provided.
measuring maturity of integrated care in a region or country.

Second round
A total of 14 experts responded to the second survey round (response rate 34%).One expert did not complete the survey.The final analysis included 13 experts (92.9% completion rate).One expert was not able to participate due to time constraints.The rest of the potential respondents did not provide reasons for not participating.
The outcomes for every statement of the second Delphi round are shown in Appendix C. Sufficient agreement was found among experts on the rephrased indicators, except for the two rephrased indicators, 8.2 and 9.1.Furthermore, 92.3% of the experts scored between 7-9 (median 7) in response to the question on the relevance of assessing both the actual rank and the optimum rank of integration, by applying B3-MM to provide a contextual explanation for the current situation while measuring maturity of integrated care.A total of six experts provided comments on the rephrased indicators.Three experts indicated that the rephrasing of the indicators was performed well.Furthermore, two experts emphasised that some of the rephrased indicators could still be made more explicit to distinguish these indicators clearly from the other indicators in their scale.

Third round
A total of 10 experts participated in the third Delphi round (response rate 76.9%).The rest of the potential respondents did not provide reasons for not participating.Sufficient agreement was found on both of the two rephrased indicators 8.2 and 9.1 (Appendix D).
The main characteristics of the expert group who participated in the first, second and Delphi round are presented in Table 8.

Discussion
This study reports on the content validity of the B3-MM instrument, developed to measure the level of maturity of integrated care.The literature review and Delphi study allowed the assessment of the content validity of B3-MM and enabled the instrument to be enhanced.Following on from the review, the dimensions and indicators of the maturity model correspond to the items of instruments measuring maturity of integrated care in the academic literature.The results of the Delphi study showed that all the dimensions of the B3-MM are considered relevant by experts in the field of integrated care.Initially in the first Delphi round, there was insufficient agreement on the first few maturity indicators on every dimension whereas, after rephrasing the indicators during the second and third Delphi rounds, experts agreed that all the indicators were relevant for the assessment of the maturity of integrated care.As a result, B3-MM is considered to be a comprehensive instrument consisting of a wide range of dimensions applicable to the development of integrated care.
The items included in another instrument, called the DMIC, described in two articles, matched all the dimensions of the B3-MM [12,71].While the DMIC is regarded as a validated generic quality management model for integrated care, the model was developed and widely used in the Netherlands [72].In comparison, the B3-MM is of a wider scope, developed on basis of lessons learned in achieving integrated care by 12 different European regions.
In line with other studies [23,47], a variety in the constructs and elements measured by the selected instruments was observed in this study.Furthermore, the level of evidence on the overall quality of the measurement property content validity for only one out of ten instruments assessed in this study, was found to be strong.In their systematic review of measurement properties of care continuity instruments, Uijen et al. [47] indicated that these findings on the levels of evidence do not mean that the quality of the instruments is low, but rather that there is a need for high quality studies that can adequately assess the measurement properties and eventually the instrument quality.Moreover, out of the 300 articles retrieved in the literature review undertaken by Bautista et al. [23], only seven articles were included in this review.The need for high quality studies on measurement properties and the small number of selected articles indicates that the measurement of maturity in integrated care is not yet strongly developed in the academic literature.The complexity of the development, implementation and scale-up of the multi-stage process of integrated care makes the measurement of the maturity of integrated care a difficult exercise.However, if integrated care initiatives are to make a significant contribution to the transformation of health systems, solid measurement of the maturity of integrated care should become an essential element of their development.Measurement of the maturity of integrated care provides insight into both the problems experienced and the success factors that work when making progress on the development of integrated care services.It provides the knowledge needed to guide further development of integrated care initiatives in appropriate directions.
A few limitations need to be considered with regard to this study.The review was based on search terms derived from a systematic literature review which enabled a broad search in several databases.However, a first limitation of the narrative review is the focus on English language studies, which may have led to a language bias.A second limitation is that literature represent a large diversity of concepts (methods and measurements) concerning the measurement of integrated care [73].Since "the definition and application of the concept of integrated care is influenced by the background and health care systems of the various authors" [12, p. 8], the data extraction from the literature conducted by the researchers is inevitably subjective.This is a disputable characteristic of any review that addresses complex interventions focusing on the items described for instruments in different contexts.A third limitation is that the review is susceptible to publication bias, although the search has been broadened to include literature found through various search engines.Concerning the overall assessment of the quality of the measurement property content validity we used data obtained from the review of Bautista et al. [23] on the score for the instruments and applied their criteria to the assessment of the instruments retrieved from our narrative review.The assessment is therefore subject to possible inconsistency although we tried to diminish this by discussing the assessment of the instruments among the researchers (LG and HV).
The Delphi technique has long been regarded as an appropriate research technique to reach consensus amongst groups of experts and has been widely applied in health and social studies [74].However, there are currently no universally agreed criteria for the selection of experts; no directives on the minimum or maximum number of experts on a panel; and no firm guidelines on the correct number of rounds to be organised regarding the Delphi method; "rather the Delphi method appears to be related to common sense and practical possibilities" [46, p. 208].Furthermore, the sample of the expert panels in the Delphi method are not being judged in terms of being representative samples for statistical purposes, but rather assessed on the qualities of the expert [75].Although, we tried to reduce possible artefacts, a few limitations need to be considered for the Delphi study.To reach a reliable consensus in Delphi studies, it is important to establish a balance among the participants who represent a particular topic.The balance between the expert types who were recruited for the Delphi study and who participated in the first Delphi round was as follows: about half of the respondents who participated included researchers with experience in the measurement or development of integrated care.The other half consisted of a pool of experts who were recruited via the SCIROCCO consortium partners (i.e. a mix of experts with a practical experience in the development, implementation and/or monitoring of integrated care interventions, with experience in the field of Information and eHealth services or experts from the B3 Action Group on Integrated care).
The agreement found among the experts on the items of the B3-MM represents the majority opinion of the experts, yet, it does not mean the 'right' answers have been found [53].The results may be biased due to the recruitment strategy that involved partners of the consortium; however, it may be expected that the experts provided their nuanced opinions garnered from their expertise.Furthermore, we provided room for the experts' comments and suggestions as well as ensured that the Delphi rounds were completed anonymously without the influence of other panel members, to obtain a reliable and diverse collection of opinions.Additionally, a gradual decline in the number of experts participating in each Delphi round was observed.Although we provided experts with more than a week for responding and sent reminders, by asking for their participation in several rounds the Delphi technique asks much more dedication from respondents than does a simple survey, and the potential for low responses increases considerably [53].A final limitation to the study is that a few expert respondents found the survey difficult to understand, which indicates that it is not evident that the instrument is easy to understand.The different backgrounds of the experts, concerning their fields of experience and origins (including variations in the types of health care systems, social values, and on-going health reform) may have also an influence on the way in which the instrument is interpreted.To obtain an adequate understanding of the instrument among its users, a clear manual explaining the meaning and application of the instrument would be desirable.

Conclusion
Notwithstanding the pragmatic nature of the initial development of the B3-MM, this study is considered to be the first step to validate the B3-MM instrument to measure the maturity of integrated care.While today the B3-MM is a unique instrument based on existing knowledge and lessons learned in implementing integrated care, further research on its measurement properties is needed to enhance the quality of the B3-MM as instrument.The determination of the validity of an instrument measuring a construct is important.This further research on its measurement properties should preferably be guided by the COSMIN manual [51].Moreover, in the SCIROCCO project, the use of the B3-MM instrument will be further explored as a tool to facilitate the exchange of good practices and scaling-up of integrated care processes in Europe.SCIROCCO will do so by testing a step-based strategy.In the first step, five participating health care regions in Europe are asked to use the tool for self-assessment of the of maturity level of the region or healthcare system in the path towards integrated care delivery.Based on the outcomes of the tool, the regions with complementary levels of maturity are matched.SCIROCCO will then organize twinning and coaching to facilitate shared learning among the regions.The SCIROCCO project will explore how matching the complementary strengths and weaknesses of regions can deliver two major benefits: a strong basis for successful twinning and coaching that facilitates shared learning and a practical support for the scaling up of good practices that promote active and healthy ageing and participation in the community [30].As the B3-MM will be used as a starting point from which regions will be matched and shared learning will be facilitated, insight in the measurement properties of the tool is a prerequisite to ensure a valid and reliable assessment of the maturity level of the regional healthcare system.This will enable the more tailored process of achieving progress in the path towards integrated care for health care regions.

1. 3
Dialogue and consensus-building underway; plan being developed 1.4 Vision or plan embedded in policy; leaders and champions emerging 1.5 Leadership, vision and plan clear to the general public; pressure for change 1.6 Political consensus; public support; visible stakeholder engagement 2. Structure and Governance 6 [12, 55, 64, 67, 69, 71] 1.1 No overall attempt to manage the move to integrated care 1.2 Change underway, but with fragmented organisations & plans 1.3 Formation of task forces, alliances and other informal ways of collaborating 1.4 Governance established at a regional or national level 1.5 Roadmap for a change programme defined and broadly accepted 1.6 Full, integrated programme established, with funding and a clear mandate 3. Information and e-Health Services 11 [12, 55, 62-67, 69-71] 1.1 No connected health services, just isolated medical record systems 1.2No integrated services used, only pilots/local services 1.

1
No systematic attempt to standardise the use of citizen health & care data, or to simplify systems in use

1 1 1 . 1 2
No special funding allocated or available 1.2 Fragmented innovation funding, mostly for pilots 1.3 Consolidated innovation funding available through competitions/grants for individual care providers 1.4 Regional/national (or European) funding or PPP for testing and for scaling-up 1.5 Regional/national funding for scaling-up and on-going operations 1.6 Secure multi-year budget, accessible to all stakeholders, to enable further service development (Contd.)Dimensions and related indicators as described in B3-MM [All projects delayed or cancelled due to inhibitors 1.2 Some projects delayed or cancelled due to inhibitors 1.3 Process for identifying inhibitors in place 1.4 Strategy for removing inhibitors agreed at a high level 1.5 Solutions for removal of inhibitors developed and commonly used 1.6 High completion rate of projects & programmes; inhibitors no longer an issue for service development 7. Population Approach 5 [12, 66, 69-71] No consideration of population health in service provision 1.2 A population focus of risk stratification but no risk stratification tools 1.3 Individual risk stratification for the most frequent service users 1.4 Group risk stratification for those who are at risk of becoming frequent service users 1.5 Population-wide risk stratification started but not fully acted on 1.6 Whole population stratification deployed and fully implemented.8. Citizen empowerment 7 [12, 62, 65-67, 69, 71] 1.1 No systematic plan for empowerment1.2Citizens are not involved in decision-making processes and do not participate in the co-design of their services 1.3 Policies to support citizens' empowerment and protect their rights, but may not reflect their real needs 1.4 Incentives and tools to motivate and support citizens to co-create health and participate in decisionmaking processes 1.5 Citizens are supported and involved in decision-making processes, and have access to information and health data 1.6 Citizens are involved in decision-making processes, and their needs are frequently monitored and reflected in service delivery and policy-making.9. Evaluation methods 6 [12, 64, 67, 69-71] 1.1 No routine evaluation 1.2 Evaluation exists, but not as a part of a systematic approach 1.3 Evaluation established as part of a systematic approach 1.4 Some initiatives and services are evaluated as part of a systematic approach 1.5 Most initiatives are subject to a systematic approach to evaluation; published results 1.6 A systematic approach to evaluation, responsiveness to the evaluation outcomes, and evaluation of the desired impact on service redesign (i.e. a closed loop process) Services in silos; the citizen or their family as the integrator of services 1.3 Integration within the same level of care (e.g., primary care) 1.4 Integration between care levels (e.g., between primary and secondary care) 1.5 Integration includes both social care service and health care service needs 1.6 Fully integrated health & social care services (Contd.)Dimensions and related indicators as described in B3-MM [

Table 1 :
Search terms used in narrative literature review.

Table 3 :
Criteria for rating the adequacy of the reported measurement properties.

Table 2 :
Criteria for the level of evidence and overall assessment of measurement properties.

Table 4 :
List of experts in the first Delphi round.

Table 5 :
Oversight narrative review search terms and hits.

Table 6 :
Overview of articles matching descriptions with B3-MM.

Table 7 :
Number of validation studies, the methodological quality of the studies, the direction (positive or negative) of results of the measurement properties and overall quality measurement property content validity score.

Table 8 :
Characteristics of experts in Delphi rounds 1, 2 and 3 (in % unless stated otherwise).