What makes a good healthcare system?: comparisons, values, drivers

enod40 is a plant gene that participates in the regulation of symbiotic interaction between legumi-nous plants and bacteria or fungi. Furthermore, it has been suggested to play a general role in non-symbiotic plant development. Although enod40 seems to have multiple functions, being present in many land plants, the molecular mechanisms of its activity are unclear; they may be determined though, by short peptides and/or RNA structures encoded in the enod40 genes. We utilized conserved RNA structures in enod40 sequences to search nucleotide sequence databases and identified a number of new enod40 homologues in plant species that belong to known, but also, to yet unknown enod40-containing plant families. RNA secondary structure predictions and comparative sequence analysis of enod40 RNAs allowed us to determine the most conserved structural features , present in all known enod40 genes. Remarkably, the topology and evolution of one of the conserved structural domains are similar to those of the expansion segments found in structural RNAs such as rRNAs, RNase P and SRP RNAs. Surprisingly, the enod40 RNA structural elements are much more stronger conserved than the encoded peptides. This finding suggests that some general functions of enod40 gene could be determined by the encoded RNA structure, whereas short peptides may be responsible for more diverse functions found only in certain plant families.


INTRODUCTION
While a majority of land plants are able to enter an endosymbiotic programme with mycorrhizal fungi (1)(2)(3), root nodule symbiosis is almost strictly confined to legumes and a few non-legumes that interact with rhizobia and other nitrogen-fixing bacteria (4,5). In both cases, specific signalling pathways activate, establish and maintain the symbiotic plant-microbe programme (6)(7)(8)(9). The soyabean enod40 gene was initially identified as one of the plant genes that are expressed during the early stages of the formation of nitrogen-fixing root nodules in the symbiotic association of legumes with soil rhizobial bacteria (10,11). It is also activated in roots colonized by fungi forming phosphate-acquiring arbuscular mycorrhizae (12).The enod40 gene is present in all legumes studied so far, and is also found in many non-legume plants [reviewed in (13)].
In both legumes and non-legumes, various experiments have demonstrated enod40 expression to be important in nodule organogenesis and development [e.g. (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)]. The data accumulated so far on the biological effects of enod40 suggest that this gene may have multiple functions that are not restricted to the regulation of symbiosis. However, the molecular mechanisms of its activity are unclear. The enod40 genes lack long open reading frames (ORFs), but encode for short conserved peptides which were shown to be functional (26,27). The soyabean Enod40 peptides bind to sucrose synthase, suggesting a role in the regulation of sucrose utilization in nodules (27). The analysis of enod40 sequences and RNA secondary structures from various plants also depicts a role for enod40 as a regulatory RNA (14,26,(28)(29)(30). This role is supported by experiments in alfalfa roots which showed that deletion of RNA structural elements in a mutated enod40 gene, while retaining proper translation, decreased enod40 activity with respect to stimulation of cortical cell division (26). Furthermore, an alfalfa enod40 RNAbinding protein MtRBP1 was isolated and found to colocalize with enod40 RNA in cytoplasmic granules during nodulation (31). MtRBP1 and its homologues possess an RNA recognition motif (RRM), but the binding sites in enod40 RNA have not yet been identified (31).
A comparative analysis of possible enod40 RNA structures (29) suggests that the presence of some structural domains correlates with the plant's ability to form nitrogen-fixing root nodules. While part of the enod40 structure seems to be well conserved in several plant families, certain domains are typical for legumes, the only group able to develop root nodule symbioses with rhizobia. Furthermore, a structured domain conserved in enod40 RNA of leguminous plants forming indeterminate nodules is completely eliminated in plants forming determinate nodules. In general, the non-legume enod40 RNAs seem to be less structured as compared to those of legumes (29).
The presence of strongly conserved RNA structural elements may be used to increase the efficiency of database mining for un-annotated enod40 homologues. The nucleotide sequence similarity in distantly related species is rather low, and only two high sequence similarity regions (named region I and region II) have been revealed [e.g. (13,17,32)]. While the most conserved short ORF I is encoded by region I, the highest conservation at the nucleotide level is observed in the short region II, where no conserved peptides can be proposed (13). On the other hand, the core of region II is flanked by previously identified (29) conserved RNA secondary structure elements. In this work, we have used this feature to search for unidentified enod40 orthologs in nucleotide sequence databases, in particular, in the GenBank database of expressed sequence tags (ESTs). This allowed us to extend considerably the number of known enod40-possessing non-legume families and species. Furthermore, the analysis of possible RNA secondary structures reveals structural elements that are conserved in enod40 RNAs across the plant kingdom. A comparison of the predicted structures suggests that the evolution of one of the conserved domains resembles that of expansion segments that are found in some structural RNAs.

RESULTS
The sequential application of sequence similarity searches and RNA structure predictions (see Materials and methods section) allowed us to identify a number of enod40-like sequences in various angiosperm species. The list of non-leguminous enod40 homologues found in GenBank at the time of writing is given in Table 1, together with nucleotide positions of conserved structural domains and ORFs. In the case of multiple enod40 EST sequences with minor variations, produced by large-scale sequencing projects for some species, we selected only one representative. In addition to the recent enod40 sequence compilation (13), we have found 22 new enod40 homologues. In particular, we discovered putative enod40 genes in another five plant families: Myrtales, Malvales, Brassicales, Apiales and Gentianales.
Despite the relatively low global sequence similarity in some cases, the suggested enod40 assignments are strongly supported by the presence of similar secondary structures. The deduced global enod40 structure is shown schematically in Figure 1. The most conserved structure is domain 3 [nomenclature of (29)], represented by a relatively small, though stable, hairpin found in all (putative) enod40 RNAs. Figure 2 shows these hairpins in non-legume sequences [legume analogues are described in (29)], with the nucleotide positions given in Table 1. The hairpins consist of 5-9 bp, sometimes interrupted by a mismatch. All hairpins are located in the 3 0 -proximal part of region II, where many base covariations are observed and sequence diversity is increased as compared to the 5 0 -end of the region. The hairpins are located at similar positions downstream of the conserved region II core (13) and are easily found by eye inspection of the alignment in this region, even without using an RNA folding program. The only sequence motif, present in almost all hairpins, is a 3-bp unit [CUC/GAG] in the middle of the stem. With some deviations, the motif is found in all hairpins. A similar pattern is observed in homologous legume structures (29). Domain 2 is more variable compared to domain 3. While the 3 0 -ends of the predicted domain 2 are all located at similar positions just upstream of the region II, the size of this structure varies in a broad range of 40-140 nt (Table 1). Nevertheless, the shape of the domain is strongly conserved in all putative enod40 sequences: it is an extended stem-loop structure, sometimes (in larger domains) with branching in the interior ( Figure 3). Similar structures can also be folded in legume enod40 RNAs, typically with a size of 120-135 nt, which are sometimes extended in paralogous genes (29). Bearing in mind a very high diversity of enod40 sequences in this region, such a conserved shape is remarkable. Interestingly, the majority of structures contain rather similar paired sequences GUUUG and CAAAC, or their minor variations (examples are shown in Figure 3), preserving the pairing at the very ends, while interior sequences from different families are not similar and difficult to align.
Within plant families, conservation of terminal sequences and structures is extended further inside domain 2, whereas the interior parts are more variable (multiple alignment of enod40 sequences is shown in Figure S1 of Supplementary Data). This variation originates from frequent insertions that occur predominantly in the loops. Examples of such domain 2 evolution within a family are shown in Figure 3 for Asterales, Brassicales and Solanales. For instance, comparison of four Solanales enod40 sequences (tomato, potato and two tobacco species) shows a gradual increase of the domain size from 55 to 120 nt while the sequence at the lowest part of the structure remains almost unchanged. Similar types of insertions are found in enod40 RNAs from other families as well (not shown). Interestingly, the same type of domain 2 extension occurs in some paralogous enod40 genes of legumes. For instance, in Lotus japonicus, domain 2 is extended from 132 nt in enod40-1 gene to 176 nt in the enod40-2 (29). A comparison of alignments for Trifolium repens enod40 sequences with structural predictions (20,29) indicates an insertion of 70 nt in the interior of enod40-3 domain 2 as compared to the homologous structure in enod40-1, resulting in a domain expansion from 135 to 205 nt. Such a remarkable pattern of domain 2 extensions, observed in enod40 RNA from various plant families, is similar to the evolution of expansion segments in ribosomal RNAs, SRP and RNase P RNAs (37)(38)(39).
Domain 1, located upstream of the domain 2, has been previously predicted by various algorithms in a number of enod40 RNAs (26,28,29). It is conserved in legumes and represented by a stem-loop structure of variable length, usually with a purine-rich 5 0 -half and a pyrimidine-rich 3 0 -half, resulting in possible 'flipping' of base pairs. While we could putatively locate this structure in the majority of enod40 genes (not shown), in some of them the presence of alternative structures and poor conservation of sequences hampered accurate assignment of the domain borders. For instance, in the absence of sufficient sequence data, the previous analysis of enod40 RNA structures (29) apparently misinterpreted partial predictions for domain 2 in Hordeum vulgare and Lolium perenne sequences as putative domain 1 structures. Furthermore, in some of the RNAs, this part seems to be located in EST regions that are not reliably determined or not sequenced at all.
Upstream of domain 1, we could not reliably predict any conserved secondary structure. This region corresponds to the high-similarity region I containing translatable sORFs (17,18,21,26,27,32) and apparently evolves without strong secondary structure constraints. Also, similar to our previous conclusions (29), we could not detect any structure downstream of domain 3, that might be conserved across both leguminous and non-leguminous plants.

DISCUSSION
We have reliably identified the presence of two conserved secondary structure domains, common to both leguminous and non-leguminous plants in a rather large number of enod40 homologues from various angiosperm plant families. Named according to the previously used nomenclature (29) as domains 2 and 3, these two structures flank the core of region II which has the highest level of sequence similarity shared by enod40 genes (13,17,32). The sequences within the secondary structure domains are more diverse than the conserved spacer between them ( Figure 1). Despite this diversity, the structural features of domains 2 and 3 are absolutely conserved in all currently found enod40 homologues. The most frequently recurring motifs in double-stranded regions, both GUUUG/CAAAC at the bottom of domain 2 and CUC/GAG in the stem of the domain 3 hairpin (Figures 1-3), allow some deviations. Occurrence of the domain 2 motif seems to be different in eudicots and monocots, the latter represented by Poales. The motif is very conserved in eudicots, albeit sometimes with substitutions disrupting 2 bp at most, with a maximum of 3 nt changed, in Euphorbia tirucalli (UUUUG/CGGAC) and Casuarina glauca (CAUUG/CAAAU). In Poales, the motif, with some variations, is present in only four of the enod40 homologues found (two Zea mays genes, Sorghum bicolor and Saccharum officinarum) and the perfect combination is in one of the rice genes, enod40-2 (AU101849). In other Poales species, the motif seems to be lost.
The topology and evolution of the enod40 domain 2 resemble those of expansion segments, also called divergent or D-domains, which are well known for eukaryotic ribosomal RNAs (37,(40)(41)(42) and have recently been discovered in signal recognition particle (SRP) RNA and RNase P (38,39). Similar to these structures, the domain 2 is an extended stem-loop structure of variable length with more diverse distal sequences as compared to the stems closing the domain. However, some differences either in an expansion mechanism or constraints imposed by secondary structure may exist. In rRNA expansion segments, domain elongation was suggested to proceed mostly via a compensatory slippage mechanism, with sequential duplications of short low complexity sequences (e.g. nucleotide repeats) on one of the helix strands during DNA replication, accompanied by similar compensatory changes on the opposite strand to restore helix symmetry (43)(44)(45). Such a mechanism explains the frequent occurrence of insertion 'indels' on both strands in the middle of a stem-loop structure (45)(46)(47). We did not find any examples of such internal insertions of stem modules in enod40 domain 2: when large insertions occurred in the middle of one of the strands, they were compensated simply by point substitutions on the other, frequently accompanied by opening of some stems and building new ones to preserve the shape, like in Brassicales and Solanales (Figure 3).
In addition to the replication slippage mechanism, other mechanisms of the length increase of expansion segments are possible as well. Particularly, the insertion of large sequences in some rRNA expansion segments was suggested to originate from the (quasi)palindromic character of sequences leading to the formation of stem-loop structures in one of the DNA strands during replication and hence to incorrect copying of the template strand (47,48). Such models may explain large insertions observed in the enod40 expansion domains (e.g. Figure 3).
In some of the rRNA expansion segments (46) and leguminous enod40 domains (29), a relatively frequent occurrence of U-rich bulges and internal loops has been observed. In rRNA, this has been associated with a slippage-like mechanism of helix-length increase leading to frequent 'leftover' bulged Us, in particular, in sequences with biased nucleotide composition (46). In leguminous enod40, such bulges and loops seem to play a functional role, because their positions in domain 2 are conserved and there is an additional domain with conserved U-containing loops in molecules from species forming a specific type of nodules, namely indeterminate nodules (29). Although enod40 domain 2 of non-legumes seem to expand in the same way as legume structures, U-containing loops are less frequent and their positions are variable. We did not notice any other statistically significant bias in nucleotide composition of loops in nonlegume enod40 domain 2.
The function of expansion segments in structural RNAs is not clear. One of the hallmarks of their secondary structure--conserved terminal pairings embracing selfcontained internal structure--apparently allows their hypervariability to be compatible with conserved functional cores of RNA molecules. This has led to the suggestion that in rRNA they do not have any function and are only tolerated because their elongation does not  Table 1. The sequences correponding to the CUC/GAG motif are boxed. For A. thaliana, a genomic sequence is given-it differs from GenBank entry AK220907 by one substitution in the loop of hairpin 3. Species names are abbreviated by the first two characters, for complete names see Table 1. disrupt any functional domain (49). On the other hand, some of their structural features seem to be important for the biogenesis and stability of rRNA (50)(51)(52). Correlations between sequences and sizes of various rRNA expansion segments indicate possible functional relationships between them (53,54). Size correlations are also observed for RNase P variable domains (39). The size of enod40 expansion domain seems to weakly correlate with the plant's ability to form nitrogen-fixing root nodules: in legume, the domain is typically 123-135 nt (29) while in non-legumes, with some exceptions, it is usually smaller ( Table 1). Similar to rRNA expansion segments, possible  Table 1. The conserved closing stem is boxed. The insertion locations are indicated by small arrows, inserted nucleotides are in different letter font. Large arrows indicate the transitions between structures of various species determined by insertions (but they should not always correspond to real evolutionary events that may occur in reverse order or include branching). Species names are abbreviated by first two characters, for complete names see Table 1. functions of the enod40 domain 2 may include stabilization of RNA structure and/or interactions with other enod40 domains or molecules.
In contrast to the expansion segments of rRNAs, RNase P and SRP RNAs, the enod40 expansion domain 2 does not seem to be inserted into a conserved structural core, but is located upstream of conserved sequence motif. Whatever the function of the enod40 RNA structure, the overall configuration of secondary structure elements in enod40 RNA (Figure 1) is more conserved than the encoded sORFs. The most conserved sORF is located in region I (sORF I) and encodes a short peptide of 10-15 amino acids shown to be translated in several species (17,18,21,26,27,32). Although the homologous ORF is found in almost all enod40 genes where the appropriate region is sequenced, there are a number of exceptions ( Table 1). The first is a deletion of one nucleotide in both Fagales enod40 RNAs (C. glauca and Betula pendula), leading to far longer encoded peptides due to a frameshift. Nevertheless, a major part of the characteristic peptide motif is present. The second, which is more puzzling, is the complete absence of this motif in all enod40 homologues from Brassicales, Apiales and Asterales families. Among all possible reading frames in these sequences, we could not identify any that would be similar to known enod40 sORF I sequences. On the other hand, the suggested enod40 assignments are supported by the presence of typical enod40 RNA structures (Figures 2 and 3). In case of the Arabidopsis thaliana enod40 homologue, the cDNA sequence (AK220907) is also validated by genomic BLAST comparison showing only two substitutions in the transcript, which are neutral for the proposed structure model.
Probably, for some of the enod40 functions the secondary structure of domains 2 and 3 is absolutely required, while conserved peptide sequences are needed for other purposes and the constraints to preserve them may be released in some species. For instance, the absence of conserved sORFs in Brassicales enod40 sequences (A. thaliana, Thlaspi caerulescens and Brassica napus) could be related to the fact that, in contrast to the majority of angiosperms, these species in natural environment do not form effective symbiotic mycorrhizal associations with fungi (3,55). Mycorrhizal symbioses are probable evolutionary predecessors of nitrogen-fixing nodule symbioses (7,8), and enod40 seems to be involved in both (12,19). There is a precedent of lower similarity of an A. thaliana homologue of multifunctional protein required for symbiotic nodule development: the A. thaliana calcium/ calmodulin-dependent protein kinase is different from related proteins, presumably because of the nonmycotrophic character of Arabidopsis (56). On the other hand, it is more difficult to explain our failure to find any trace of the conserved enod40 sORFs in sequences from Apiales and Asterales (Table 1): these plants can form arbuscular mycorrhiza (3). Moreover, according to the database annotation, the EST-encoding putative Daucus carota enod40 (BI452209) was isolated from a fungus extraradical mycelium during arbuscular mycorrhizal symbiosis with the plant. Of course, it is also possible that in some of available ESTs, the 5 0 -proximal sORFI-encoding enod40 sequences are missing.
Apparently, multifunctionality of enod40 is determined by a complex combination of functions of both encoded peptide(s) and RNA structure(s). sORF I is less conserved than the topology of domains 2 and 3, but more conserved than domains 1, 4, 5 and 6, predicted in some species (29). While domain 1 is probably present in many species, domains 4-6 seem to be specific for legumes only. Thus, the enod40 RNA 5 0 -proximal region has properties of a peptide-encoding mRNA, while the core of enod40 RNA (Figure 3) has hallmarks of structural RNAs, namely a strongly conserved secondary structure topology despite very high sequence diversity. The non-coding character of the enod40 core is further emphasized by the presence of an expansion segment reminiscent of highly ordered RNAs such as rRNAs, RNase P and SRP RNAs. Furthermore, the conserved RNA structural domains 2 and 3 seem to determine some general enod40 functions whereas enod40-encoded peptides may be responsible for more diverse specific roles.

Sequence database search
Due to the high sequence diversity of enod40, a straightforward BLAST search (57) using complete enod40 sequences as queries was not very efficient to retrieve distant enod40 homologies. Therefore, we restricted sequence similarity searches by conserved regions only and complemented it with RNA secondary structure analyses. The most conserved region in both legume and non-legume enod40 genes is the so-called region II (13,17,32). Thus, the most conserved region II core sequences ($30 nt, the location is indicated in Figure  S1 of the Supplementary Data) were used as queries in BLAST searches 'for short, nearly exact matches'. This BLAST option is more suitable for retrieving distant short similarities due to shorter 'word' size (7 nt) used in the initial search for matches, as compared to that of standard BLAST search (11 nt). The relatively significant (E51) non-redundant sequence hits were further analysed for the presence of conserved secondary structure elements, so-called domains 2 and 3, located near the potential region II (29). The searches were done in GenBank including EST sequences. We started from the recent compilation of enod40 sequences (13), and the region II of newly found enod40 genes were subsequently used as queries for similar searches as well.
In order to distinguish putative enod40 homologues from BLAST hits produced by chance, two criteria, derived from known enod40 gene features, were used. First, the potential for coding amino acid sequences homologous to known enod40 sORF1 was explored. The second, independent, criterion required the possibility of folding of characteristic structural domains 2 and 3, flanking the conserved core of the potential region II, described (with small deviations) as the consensus sequence CGGCAAGUCA-N(6)-GGCAAN (Figure 1). Both domains should be located at 1-3 nt upstream or downstream of the consensus core, domain 3 being a stable hairpin and domain 2 being an extended structure flanked by typical sequences GUUUG and CAAAC or their variations. The sequences satisfying one or two of the described criteria were considered as enod40 homologues.
RNA secondary structure predictions RNA secondary structure predictions were performed using the genetic algorithm of STAR package (58) and Mfold program (59).