Biological systems are complex. In particular, the interactions between molecular components often form dense networks that, more often than not, are criticized for being inscrutable “hairballs.” We argue that one way of untangling these hairballs is through cross-disciplinary network comparison—leveraging advances in other disciplines to obtain new biological insights. In some cases, such comparisons enable the direct transfer of mathematical formalism between disciplines, precisely describing the abstract associations between entities and allowing us to apply a variety of sophisticated formalisms to biology.
Inside Adobe Reader X, click on the Edit menu and select 'Preferences.' In the 'General' tab, enable the option 'Enable Protected Mode at startup.' Then click the 'OK' button. When you re-launch Adobe Reader, the dialog shown in step #2 will reappear. Molecular Biology of the Cell, 5th edn. Biological networks: The tinkerer as an engineer. Addison-Wesley & The Santa Fe institute, Reading, MA. ReqVar.pdf (accessed on November 20.
In cases in which the detailed structure of the network does not permit the transfer of complete formalisms between disciplines, comparison of mechanistic interactions in systems for which we have significant day-to-day experience can provide analogies for interpreting relatively more abstruse biological networks. Here, we illustrate how these comparisons benefit the field with a few specific examples related to network growth, organizational hierarchies, and the evolution of adaptive systems.
Molecular biologists used to studied protein complexes consisting of a few dozen proteins; however, proteomic methods are now able to probe the interactions between thousands of proteins. Similarly, geneticists who would previously manipulate a single gene for functional characterization can now employ high-throughput techniques to study the relationships between all genes in an organism. In many cases, genome-scale information describing how components interact is captured best by a network representation (. What approaches might help in deciphering these network “hairballs”?
Throughout the history of science, many advances in biology were catalyzed by discoveries in other disciplines. For instance, the maturation of X-ray diffraction facilitated the discovery of the double helix and, subsequently, the characterization of structures containing thousands of proteins. Thus, one may wonder whether ideas in other areas of science could help us with the “hairball challenge.” While the influx of ideas related to reductionism mostly originated from subfields of physics and chemistry, in order to understand biology from a systems perspective, we may benefit from new catalysts originating in disciplines as diverse as engineering, behavioral science, and sociology. These new ideas are centered on the concept of the network. In fact, comparisons and analogies are not new to biology. For instance, to illustrate the principles of selection, Dawkins coined the meme, a unit carrying cultural information analogous to the gene in biology that undergoes a similar form of selection (. Given the complexity of the cell, a certain level of simplification is necessary for useful discussion.
The description of cellular systems can be seen as a spectrum. On one extreme, there is a complete three-dimensional (3D) or four-dimensional (4D) picture of how cellular components and molecules interact in space and time. On the other extreme, there is a simple list of parts that enumerates each component without specifying any relationships. However, neither extreme affords the best understanding for the data we have to hand. The complete 4D picture of all the molecules in a cell is far too ambitious for the current state-of-the-art technology in data acquisition. Conversely, it is widely appreciated that the characteristics of a cellular system cannot be explained by the properties of individual components—the whole is greater than the sum of its parts—and the data we have to hand is considerably richer than the parts list representation.
The network representation conveniently spans these extremes, capturing some of the relationships between individual components in a flexible fashion, especially where connectivity rather than exact spatial location determines function. Networks help reveal and convey the relationships between components of a biological system. Different levels of information can be represented using a network. At an abstract level, a network can denote associations between various nodes. More details, such as excitatory and inhibitory regulatory relationships, can then be layered on top of this basic network.
As additional information about the nodes and the relationships between them is added, the network begins to resemble the real-world entity that it models. For example, the addition of 3D structural information and temporal dynamics onto a network of molecular machine components leads it to more closely resemble the molecular machine itself. There are two approaches for thinking about networks.
In the purest form, a network is an abstract representation of the connections (edges) between constituents (nodes). As physical associations between components in all sorts of complex systems can be viewed as networks, such an abstract approach to networks offers a common mathematical framework for different systems.
In a biological context, in addition to physical associations, connections can be defined more loosely by statistical association. This is exemplified by disease networks (. The second way of thinking about networks aims to decipher the organization principles behind a complex system. The underlying network is assumed to be a backbone that captures the essence of the system.
This is particularly true for networks that capture the mechanistic interactions within systems—for instance, the cellular networks resulting from protein-protein interactions and transcriptional regulation. Thinking about networks in a mechanistic way is a process of concretization, as opposed to the approach in abstract, associative networks. Concrete mechanistic networks aim to get closer to the complete 4D picture. They are intended to describe and integrate many of the physical processes happening inside a living system—for instance, the processing of information, the chemistry of metabolites, and the assembly of molecular machines—and therefore focus on incorporating various details of interactions. Adding further mechanistic detail onto a simple nodes-and-edges skeleton can be visualized as decorating edges with direction, color, thickness, etc.
However, incorporating too much detail makes the description intractable. In particular, the network formalism breaks down if we try to load spatial or temporal information as well as higher-order interactions onto the diagram. At a certain point, the actual 4D picture is required. The two network approaches essentially complement each other. On one hand, thinking in an abstract fashion allows one to transfer mathematical formalism readily between disciplines. This can be beneficial for the biological sciences, as it allows the application of formalism developed elsewhere to find fruitful application in biology.
On the other hand, thinking mechanistically focuses more on the conceptual resemblances between networks. Comparison of appropriately matched networks may provide additional insight into the interactions between molecular components of cells by examining analogous interactions in complex systems for which we have more day-to-day experience.
Abstract Approach: Comparison Leverages Mathematical Formalism. Scale-free networks: The degree distribution of the network is a statistical property that can be used to understand some of the organizing principles of the network. The degree distribution of a random network is a Poisson distribution. Most real-world networks, including biological networks, are organized in the form of scale-free networks that contain a small number of hubs that are highly connected in the network. The degree distribution in a scale-free network is better modeled as a power-law distribution.
Hubs in a scale-free network also lead to the formation of small-world networks. Modules (community structure of networks): Most real-world networks can be divided into smaller modules that have a large density of internal edges but relatively fewer edges that connect nodes from different modules. For instance, social networks tend to have communities within them due to the relatively larger number of interactions between people in the same neighborhood, school, or workplace. Similarly, in a biological context, a large number of biological components can form a single functional macromolecular complex, such as the ribosome. A wide variety of methods have been developed to uncover the modular structure of networks.
Most of these methods are based on optimizing the modularity of the network that compares the number of intra- and inter-module links within the network. One can see analogous examples of this robustness in many contexts: just as the Internet functions without any major disruptions even though hundreds of routers malfunction at any given moment, individuals belonging to the same species in general can tolerate considerable numbers of random mutations. However, a cell is not likely to survive if a hub protein is knocked out. For example, highly connected proteins in the yeast protein-protein interaction network are three times more likely to be essential than proteins with only a small number of links (.
The number of connections of a node reflects its centrality in the network. There are more elaborate approaches to determining centrality than just counting neighbors, the most famous example of which is the original PageRank algorithm underlying the Google search approach. One can also try to define centrality via network paths using such quantities as “betweenness”. It has been reported that bottlenecks (nodes with high betweenness) in biological networks are more sensitive to mutations than the rest of the network—even more so than hubs for regulatory networks (. A particular way to utilize genes with special features is based on the concept of “seed” genes, a form of biological prior knowledge, to drive network creation.
Instead of identifying central genes based on connectivity, i.e., hub genes, they can be defined in the literature as being causally implicated in a particular disease or phenotype. In one such example, genes implicated through copy-number variation in autism were used to cluster an expression network in healthy brain development in order to identify larger sets of putative autism-related genes as candidates for future investigation and diagnosis (. The same question is extremely common in biology but is discussed using the term “reverse engineering.” For example, how can we infer the developmental gene regulatory network from temporal gene expression dynamics? Ideally, one could fit the relative temporal data using dynamical equations so as to infer the topology.
However, cellular processes happen too fast, and thus most functional genomics experiments do not contain enough time points. To overcome this drawback, data-mining techniques such as matrix factorization are employed. For instance, given the genome-wide expression profile at different time points, one could project the high-dimensional gene expression data to low-dimensional space and write differential equations to model the dynamics of the projections (. Despite an increasing number of studies applying networks in an abstract mathematical context, scientific concerns have been raised.
A major concern regarding network analysis comes from the criticism that statistical patterns (e.g., the scale-free degree distribution mentioned above) offer limited insights. Other examples of these patterns include the enrichment of network motifs (small recurrent subgraphs in a network). Statistical patterns suggest that network structures are potentially interesting; nevertheless, understanding their actual functioning requires studying the detailed dynamics of each constitutive part (. The previous section discussed insights gained by applying formalisms from various social and technological networks to biological networks. Such wide-ranging insights were possible only because the detailed characterization of the nodes in the network was neglected in the abstract approach.
On the other hand, if details are added to the picture, insights about a system become more specific and, in a sense, more meaningful. However, it is typically harder to apply the same formalism equivalently to two different networks characterized in this more detailed fashion. This situation is manifest, for example, when trying to explain the scale-free degree distribution of various networks described above. Different Mechanistic Intuition for Scale-free Structure. The scenario can be illustrated by the hub-and-spoke system of the airline network. Every time a new airport is created, the airlines have to balance available resources and customer satisfaction, i.e., the cost of adding a new flight and customer comfort due to connectivity between the new airport and a larger number of other airports. (We acknowledge that this is arguably an idealistic view.
Some may argue that airline companies do not care about customer satisfaction at all and instead only care about their revenues. As a result, they consider customer satisfaction only within limits imposed by revenue maximization.
Nevertheless, the motivation behind the airlines does not affect the conclusion of the model.) The most efficient use of these limited resources occurs if the new airport connects to pre-existing hubs in the network, as this reduces the average travel time to any airport in the entire system due to the small-world nature of scale-free networks. The model is called “preferential attachment” because the newly created nodes prefer to connect to pre-existing hubs in the network.
Such a duplication-divergence model is, in a sense, equivalent to the preferential attachment model since it is more likely for a hub to increase its connectivity, simply because it is more likely to be attached to a neighbor that is being duplicated. However, the model provides more intuition for biological networks via comparison. As gene duplication is one of the major mechanisms driving the evolution of protein families, scale-free behavior in the protein-protein interaction network was proposed to arise via duplication-divergence (. Thus, many networks that exhibit similar topologies are the result of different underlying growth mechanisms.
Specifically, in the case of scale-free networks, there exists a common topological property but a somewhat different mechanistic explanation in different domains (e.g., airline networks versus gene networks). Some of the domains share the same mechanistic explanation—i.e., the scale-free structure in both protein-protein interaction and web-link networks can be explained by duplication and divergence. Moreover, this latter commonality provides additional intuition about the biological network through comparison to the more commonplace web network, which is conceptually much easier to understand. More Intuition from Social Networks.
The ability to gain intuition about the often arcane world of molecular biology via comparison to commonplace systems is even more clear-cut when considering social networks, where people have very strong intuition for how a “system” can work. A good example of this is transferring understanding of organizational hierarchy to biology.
Many biological networks, such as those involved in transcriptional regulation, have an intrinsic direction of information flow, forming a natural but loose hierarchy (. In the purest form of a military hierarchy, multiple individuals of lower rank each report to a single individual of a higher rank, and there are fewer and fewer individuals on the upper levels, eventually culminating in a single individual commanding an entire army. This structure naturally leads to information flow bottlenecks, as all the orders and information related to many low-rank privates must flow through a limited number of mid-level majors. In a biological hierarchy of transcription factors (TFs), one sees a similar pattern, with bottlenecks in the middle. In many cases, such bottlenecks create vulnerabilities (. Comparison between the hierarchical organizations in social networks versus biological networks illustrates design principles of biological networks.
The hierarchical organization in biological networks resembles the chain of command in human society, e.g., in the context of the military. The top panel shows a conventional autocratic military hierarchy. The structure is intrinsically vulnerable in the sense that, if a bottleneck agent (star) is disrupted, information propagation breaks down. The introduction of cross-links (blue) avoids the potential problem (middle) because the private at the bottom can then take commands from two different superiors above. The bottom panel shows the hierarchical organization of a biological network, with the existence of cross-links between pathways. These observations reflect a democratic hierarchy as opposed to an autocratic organization.
Moreover, further comparison provides easy intuition into the biological characteristics of regulators at different levels in the hierarchy. Conventionally, one expects the CEOs of companies to gather information from all of their sources and make the widest-ranging and most influential decisions in the company. One also stereotypically expects people at the top of conventional social hierarchies to be the most “conservative” and resistant to change. Likewise, TFs at the top of the hierarchy tend to be more evolutionarily conserved.
They are more connected in the protein-protein interaction network, as they modulate gene expression based upon internal and external stimuli through these interactions (. ).
Sarpeshkar and colleagues have explored the similarities between the biochemical reactions within cells and electron flow in analog circuits. These similarities have enabled the application of intuitive electronic circuit diagrams to describe the processes underlying TF networks. In this analogy, chemical concentrations are represented as electronic currents. For example, mRNA molecules can be thought of as accumulating on a capacitor while a resistor represents mRNA degradation. The analogy extends beyond simply intuitive representations since the mathematical formalisms describing electron flow in subthreshold transistors can be adapted to capture the dynamics of chemical reactions.
Thus, this comparison allows us to potentially connect intuitions and mathematical models developed for electronics to transcription (. ), it is worth noting some of the differences. For example, in biological networks, more connected components (as measured by their degree or betweenness) tend to be under stronger constraint than less connected ones. This is evident in numerous studies that have analyzed the evolutionary rate of genes in many networks (e.g., protein interaction and transcription regulatory networks) in many organisms (e.g., humans, worms, yeast, E. coli) using many different metrics of selection (e.g., variation within a population or dN/dS for fixed differences) (.
One’s intuition here is obvious: biological systems seek to decentralize functionality, minimizing average connectivity on nodes and making the system robust to a random mutation. However, this architecture requires a few hubs to connect everything, and these more connected components are particularly vulnerable. Is this finding true in general?
And if not, why? Software systems provide insight into this question: software engineers tend to reuse certain bits of code, leading to the sharing of components between modules and thus arriving at highly connected components (. The results are also extremely useful for therapeutics in which a drug targeting a highly connected target can have a very efficient effect on an entire cell, albeit often with the sacrifice of low specificity. However, the measurement of connectivity/constraint depends on the cellular context. In regulatory networks and similar systems involving information transfer, the measurement is often better conceptualized in terms of bottlenecks, while in protein-protein interactions it is often better conceptualized in terms of hubs. An example of a chemically exploitable hub in the regulatory network is the bacterial ribosome, which is the target of most antibiotics that broadly inhibit protein translation, leading to the rapid death of the organism (. Biology is a subject with a strong tradition of utilizing comparative methods.
One hundred years ago, biologists compared the phenotypes of different species. Since the discovery of DNA, biologists have been comparing the sequences of different genes and various “omes” across species. Perhaps we should extend the tradition further by comparing networks in biology to those in other disciplines.
In fact, efforts have already been made in this direction. We have described how abstract approaches that focus on simple connections between entities allow the application of mathematical formalisms across disciplines. We then showed how mechanistic details can be placed onto these simple networks, thereby enabling them to better explain a real process, such as transcriptional regulation or software code development. In this case, the networks are often too detailed to allow for direct transfer of formalisms. Nevertheless, one can gain meaningful intuition about a biological system by comparing it to a more commonplace network, such as a social system, using a similar mechanistic description.
Abstract Complex networks serve as generic models for many biological systems that have been shown to share a number of common structural properties such as power-law degree distribution and small-worldness. Real-world networks are composed of building blocks called motifs that are indeed specific subgraphs of (usually) small number of nodes. Network motifs are important in the functionality of complex networks, and the role of some motifs such as feed-forward loop in many biological networks has been heavily studied. On the other hand, many biological networks have shown some degrees of robustness in terms of their efficiency and connectedness against failures in their components. In this paper we investigated how random and systematic failures in the edges of biological networks influenced their motif structure. We considered two biological networks, namely, protein structure network and human brain functional network.
Furthermore, we considered random failures as well as systematic failures based on different strategies for choosing candidate edges for removal. Failure in the edges tipping to high degree nodes had the most destructive role in the motif structure of the networks by decreasing their significance level, while removing edges that were connected to nodes with high values of betweenness centrality had the least effect on the significance profiles. In some cases, the latter caused increase in the significance levels of the motifs. Citation: Mirzasoleiman B, Jalili M (2011) Failure Tolerance of Motif Structure in Biological Networks. PLoS ONE 6(5): e20512. Editor: Matjaz Perc, University of Maribor, Slovenia Received: April 9, 2011; Accepted: April 28, 2011; Published: May 26, 2011 Copyright: © 2011 Mirzasoleiman, Jalili. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was funded by Sharif University of Technology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.
Introduction Many real-world complex systems can be described as networks. Examples include the Internet, World Wide Web, the brain functional/anatomical networks, genetic regulatory networks, metabolism of biological species, ecological systems, and networks of author collaborations,. Scholars have found that many real-world networks from physics to biology, engineering and sociology have some common structural properties such as power-law degree distribution and small-worldness. Studying the properties of such networks could shed light on understanding the underlying phenomena or developing new insights into the system. For example, studying biological networks helps us to better understand the organization and evolution of their units.
Recent developments in computing facilities let researchers mine the data of real-world networks to discover their topological properties. In its simplest form, a network consists of a set of discrete elements called nodes (or vertices), and a set of connections linking these elements called edges (or links). One of the tricky parts of research in this field is to extract the graph of system under study that is to identify the individual nodes and reconstruct the links connecting them. As network structure is identified, its structural and dynamical properties are investigated. Network motifs are among such attributes that are usually tested for natural networks. It has been shown that networks in various fields exhibit interesting features in terms of repeated occurrences of certain subgraphs, i.e. Network motifs are patterns (particular subgraphs) that statistically overrepresented or underrepresented within the network.
The significance of a particular subgraph in a network is usually measured by comparing its occurrences in the original network against some properly randomized networks. Network motifs have been identified in networks from different branches of science and are suggested to be the basic building blocks of most complex networks.
Analysis of this over/under abundant substructures can help us in determining different network properties and functions such as its hierarchal structure. The motif structure of a network might be important in determining its dynamical properties. For example, the evolution of cooperativity, has been linked to the motif structure in real networks.
One of the important features of many engineering and biological networks is robustness against component failure,. Real-world networks may undergo random or systematic failures and consequently lose some of their components, i.e. Nodes and/or edges. Therefore, it is essential to investigate the tolerance of critical network properties to errors– failures of randomly chosen nodes and/or edges of the networks and attacks– systematic failures of components that play a critical role in the network,. It has been shown that many biological networks exhibit high degrees of robustness against random errors that might happen in their structure,.
In general, it has been shown that scale-free networks, i.e. Networks whose node-degree distribution follows a power-law, are robust against errors, but, at the same time, they are fragile in response to systematic attacks,. Several measures have been proposed for measuring robustness of networks against attacks and errors. One of the frequently used ones is the largest connected component whose size scales linearly with the number of nodes in the network,. Efficiency is another important measure that is studied in the context of robustness of complex networks against attacks/errors.
The errors/attacks influence the evolution of dynamical processes happening on the networks. Network cooperativity, for instance, has been shown to be extremely robust against random failures, while it is fragile when nodes with maximum degree are removed from the network. In this paper we investigated the influence of link failures in the profile of network motifs. We considered protein structure network and functional network of human brain extracted through functional magnetic resonance imaging technique. A number of strategies for choosing candidates edge for removal were taken into account that included random removal, removing edges based on the degrees of the end nodes, based on the betweenness centrality of the nodes, and based on the closeness centrality of the nodes. We then compared the profile of the network motifs as a function of the percentage of removed edges. Interestingly, different failure strategies resulted in different pattern of changes in the motif structure where the strategy based on the betweenness centrality was the most different with the other three.
![Biological networks the tinkerer as an engineer pdf reader free Biological networks the tinkerer as an engineer pdf reader free](/uploads/1/2/5/6/125636718/605729356.png)
Motif Structure Many real-world complex networks have been shown to be composed of well-defined building blocks called motifs. Network motifs are patterns of interconnection or subgraphs that occur in natural networks much more frequent than those in randomized networks,. They can be thought of as simple building blocks of complex networks, which can provide valuable information about structural design principles of networks. First discovered in the gene regulation (transcription) network of the bacteria Escherichia coli by Alon and his team, they have been found in many networks ranging from biochemistry to neurobiology networks, ecology, and engineering,. Study of network motifs is therefore propitious for revealing the basic building blocks of most complex networks. Some studies have related the function of networks to the structure of their motifs.
Transcription networks are among those heavily studied both theoretically and experimentally. For example, negative-autoregulation which is one of the simplest and most abundant motifs in Escherichia coli has been shown to be response-acceleration and repair system. Positive-autoregulation motif is important in biomodal distribution of protein levels in cell population. Feed-forward loop that is commonly found in many gene systems and organisms is important in speeding up the response time of the target gene expression following stimulus steps, pulse generation and cooperativity. Dense Overlapping Regulons that occur when several regulators combinatorially control a set of genes with diverse regulatory combinations, has also been shown to be important in the function of Escherichia coli. Although subgraphs of different sizes can be studied in natural networks, among them, biological networks contain three and four-node substructures far more often compared to randomized networks with similar structural properties. Many beneficial outcomes have been ensued from these observations.
Often the network motifs are detected by comparing the network against a null hypothesis, that is, the number of appearance of a specific subgraph is counted in the networks and is subsequently compared with the number of appearances in properly randomized networks. The randomized networks can be constructed in various ways. However, they should at least share some common properties with the original network. For example, the randomized networks should have the same number of nodes and edges with the original network. One possible method is to build the corresponding Erdos-Renyi version for the networks. A better way of constructing the randomized networks is to preserve not only their size and average degree but also their degree distribution or at least degree sequence. This can be simply done by shuffling the adjacency matrix.
Many of the motif detection strategies use this algorithm for constructing the randomized version of the original network under study. Two Biological Networks Techniques from complex networks have been widely applied to many biological systems (e.g. See reviews, ). Recent developments in designing efficient techniques in molecular biology have led to extraordinary amount of data on key cellular networks in a variety of simple organisms,.
This allowed scholars to study networks such as protein interaction, transcriptional regulatory, and metabolic in different organisms. Networks have also been widely studied in neurosciences,. The brain networks can be studied on a micro-scale containing a number of neurons with some excitatory/inhibitory connections in-between,. However, this approach cannot be used for studying the whole-brain connectivity network. For such cases, one should use functional magnetic resonance imaging, diffusion imaging, magnetocephalography, or electroencephalography techniques to extract the large-scale functional/anatomical brain connectivity networks,.
In this work, we have considered two biological networks: protein structure network, and human brain functional network extracted through functional magnetic resonance imaging. Shows their structure by representing the nodes and edges connecting them. Their properties including, size, average degree, standard deviation of the degrees, average path length and clustering coefficient is represented in. We used Mfinder to determine the significance of all three- and four-nodes subgraphs of these networks.
In order to obtain a high level of accuracy, we set the parameters of random network generation algorithm and counting motifs in the tool as follows. Number of random networks = 10000. Uniqueness threshold is ignored. No threshold on mfactor to use when counting motifs.
No threshold on Z-score to use when counting motifs. Default values were considered for other parameters, including switching method for generating random networks. Characteristics of considered biological networks. Summarizes the set of three- and four-node motifs with their corresponding normalized and non-normalized Z-scores in the networks.
As we can see motif #7 — a four-node motif with five edges — has the highest positive Z-score, and thus, is the most significance motif structure in both of the networks and can be considered as the dominant motif. On the other hand, motif#1 has the highest negative Z-score in both of the networks, and thus, is the most significant anti-motif in the set of three- and four-node subgraphs. There is a significant direct correlation between the Z-scores of the motifs in these two networks ( r = 0.9328, P. Random and Systematic Failures in the Edges Random or systematic failures can occur in some of the networks' components, i.e. Nodes and edges. For example in protein-protein interaction network, while attacking nodes may correspond to breakdown of polypeptides by appropriate enzymes, attacking edges of the network can be interpreted as preventing physical interaction between two polypeptides in order to prevent carrying out their biological functions.
In this work we considered failures in the edges and investigated its influence on the profile of the motif structure of the networks. Failures in the networks are of two types, in general: random failures that are called errors or systematic failures that are called attacks.
Let define some preliminary metrics of graph theory. Consider an undirected and unweighted network with adjacency matrix A = ( a ij), i, j = 1, N, where N is the size of the network. Let denote the edges between the node i and the node j by e ij. The degree of the node i can be obtained as (3) Edge betweenness centrality (load) is a centrality measure of an edge in a graph, which counts the number of shortest paths passing through the edge. The betweenness centrality L ij of the edge e ij between nodes i and j that is defined by (4)where Γ pu is the number of shortest paths from nodes p to u in the graph and Γ pu( e ij) is the number of these shortest paths making use of e ij.
The betweenness centrality of an edge is indeed the load of shortest paths using that edge, i.e. The larger the betweenness centrality of an edge is the more its significance in the formation of the shortest paths in the network is.
In a topological space and complex network analysis, closeness is a basic and important concept. In graph theory closeness is the inverse of the sum of the shortest distances between each node in the network. In other worlds, the closeness centrality C i of node i is defined as (5)where d( i, j) is the length of the shortest path between the nodes i and j. Indeed, the closeness centrality of node i is the inverse of the average shortest path from i to other nodes in the network. We considered different failure strategies in the networks. In order to choose candidate edges for removal four strategies were considered as follows:.
Random failure: at each step, one edge was randomly chosen and removed from the network. Systematic failure based on the node degrees: at each step, the quantity k ik j was calculated for each edge e ij, and then, the edge with the maximum amount of k ik j was removed from the network. If some edges have the same value of k ik j, one of them was removed randomly. Systematic failure based on the edge betweenness centrality: at each step, the quantity L ij was calculated for each edge e ij, and then, the edge with the maximum amount of L ij was removed from the network. Systematic failure based on the node closeness centrality: at each step, the quantity C iC j was calculated for each edge e ij, and then, the edge with the maximum amount of C iC j was removed from the network. Results and Discussion We applied the failure strategies to the networks, i.e. Protein structure and human brain functional networks.
Starting from the original network and at each step, a candidate edge (based on a failure strategy) was removed, and the Z-scores of all undirected three-and four-nodes subgraphs were calculated for the resulting network. Since in calculating the subgraph ratio profile described by Eq. (2) all terms are affected by the removal, the effect of removal on each subgraph is not clear. Therefore, we studied the non-normalized Z-scores. After each removal, the profiles of non-normalized Z-scores were calculated with respect to corresponding randomized networks with the same degree distribution. Then, the results were displayed as a function of the percentage of removed edges.
Because motifs correspond to particular functions, the evolution of the frequencies of the motifs with the percentage of removed edges is at least as important as their Z-score. The Z-scores are indeed relativized to a random network, and thus, from this metric it is not clear how the frequency of each subgraph changes. To understand better what happens with motifs composition in the considered networks and their randomized alternatives, we also plotted the motifs frequencies vs. The percentage of removed edges. And show the profile of Z-scores of motifs of size three and four in the networks. We removed edges based on different strategies, i.e.
Random failure (failure strategy 1), systematic failure based on node degrees (failure strategy 2), systematic failure based on betweenness centralities (failure strategy 3), and systematic failure based on node closeness centralities (failure strategy 4). It can be seen that random failure in the edges and systematic failures based on degree or closeness centrality always weakened the significance of the subgraphs in the resulting networks, i.e.
The significance level of the Z-scores decreased. However, the systematic failure based on the betweenness centralities showed different effects. Removing edges with the highest betweenness centrality resulted in networks with increasing significance of some of their motifs, while the significance of some other motifs decreased. Z-score of motifs #1–#8 as a function of the percentage of removed edges for protein structure network. The blue, green, red and cyan lines show the changes in the Z-score for random failure (failure strategy 1), systematic failure based on node degrees (failure strategy 2), systematic failure based on betweenness centralities (failure strategy 3), and systematic failure based on node closeness centralities (failure strategy 4), respectively. The case with random failure is averaged over 10 realizations. Z-score of motifs #1–#8 as a function of the percentage of removed edges for human brain functional network.
The blue, green, red and cyan lines show the changes in the Z-score for random failure, systematic failure based on node degrees, systematic failure based on betweenness centralities, and systematic failure based on node closeness centralities, respectively. The case with random failure is averaged over 10 realizations. Interestingly, systematically removing those edges tipping to high degree nodes had the most catastrophic influence in decreasing the absolute value of the Z-scores, i.e.
Decreasing the significance level of the network motifs and anti-motifs, in both networks. In other words, the more the degree of the vertices at the ends of an edge is the more critical that edge is for the motif structure. Network motifs are important in its functionality. For example, the dynamical property of many real-world networks are highly correlated with the relative abundance of motifs in those networks,.
In gene regulatory networks, their motif structure is important in the response time of the target gene expression following stimulus steps, pulse generation and cooperativity. Thus, the degree-based attack on the edges might affect the networks' functionality through weakening the significance of their motifs. As a result, in order to make the network motifs robust against such attacks, one should protect the edges connecting the hub nodes in the network.
On the other hand, preventing the system from doing a well-specific functionally might be desired in some applications. This can be done by removing those edges connecting hub nodes in the network, if such functionality is linked to the motif structure of the network. Another interesting observation is that, in most cases, random removal of the edges is not the weakest strategy in breaking the significance of the motifs.
In some cases, e.g. Motif #4 in human brain functional network, it is the most effective strategy in reducing the significance of network motifs. Therefore, in real-world biological networks, such as the two examples studied in this work, errors, i.e. Random failures, can be as effective as attacks, i.e. Systematic failures, in influencing the motif structure.
Among different strategies for systematic removal of the edges the one based on the betweenness centrality has the least influence on the Z-scores. The profiles of Z-scores are largely robust against systematically removing the highly loaded edges.
In some cases, e.g. Motif #1 and motif #2, removing such edges resulted in increasing the significance level of the motif structure in the final networks. This can be due to the fact that the edges with high betweenness centrality are probably those connecting two parts of the network, i.e. Bridges or local bridges. Such links usually participate in few graphlets of size three or four. Removing such edges may increase the relative abundance of the graphlets in the resulting network as compared to those in the randomized networks. And show the rate of decrease of the motifs' frequencies in different failure strategies.
![Pdf Pdf](/uploads/1/2/5/6/125636718/169144708.png)
The results revealed that the removal strategy based on the betweenness centrality is the most influential one in decreasing the number of the anti-motifs, i.e. Motif #1, motif #3 and motif #4. For subgraphs with positive Z-scores, removing edges connected to high degree nodes in the network had the most influence in decreasing the motifs frequencies. Similar to the case of subgraph significance profiles, random strategy is not the weakest strategy in reducing the number of subgraphs in most cases. It is usually more effective than systematic failures based on betweenness or closeness centrality.
Therefore, different failure strategies have different influence on the frequency of occurrence and significance profile of the network motifs in biological networks. Our results showed that removing edges connected to high degree nodes in the network has the most influence, in general, in decreasing the relative appearance of three and four-node subgraphs in the resulting networks as compared to random networks.
This strategy also plays an important role in decreasing the motifs frequency. On the other hand, removing the highly loaded edges has the least influence on the changes of the motifs significance profiles. Frequencies of motifs #1–#8 as a function of the percentage of removed edges for human brain functional network. The blue, green, red and cyan lines show the changes in the Z-score for random failure, systematic failure based on node degrees, systematic failure based on betweenness centralities, and systematic failure based on node closeness centralities, respectively. The case with random failure is averaged over 10 realizations. In summary, we investigated the effect of random and systematic failures on the profile of their three- and four-node motifs. As network examples we considered protein structure network and human brain functional network extracted through functional magnetic resonance imaging.
We considered four strategies to choose edges for removal: random failure where the edges are randomly removed, systematic failure in the edges connected to high degree nodes, systematic failure in the edges with high betweenness centrality, and systematic failure in the edges connected to the nodes with high values of closeness centrality. We showed that although biological networks have been shown to be robust against random failures in terms of network connectedness and efficiency, such failures can have destructive effects on network motifs.
Degree-based systematic failure had the most destructive role in most cases, i.e. Causing in the largest decrease in the frequency of occurrence and absolute value of the Z-scores. While, attacks in the highly loaded edges had the least influence on the motif profile, and in some cases, such attacks resulted in networks enhancing the significance of the motif structures. Since motifs play important roles in the functionality of real-world biological networks, these results are important in studying error and attack tolerance of biological networks.