Personal Recurrence Report
Insert Subtitle Here
ABSTRACT
CCS CONCEPTS
??Insert CCS text?here ??Insert CCS text?here?? ??Insert CCS text?here
KEYWORDS
Insert keyword text, Insert keyword text, Insert keyword text, Insert keyword text
ACM Reference format:
FirstName Surname, FirstName Surname and FirstName Surname. 2018. Insert Your Title Here: Insert Subtitle Here. In Proceedings of ACM Woodstock conference (WOODSTOCK’18). ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/1234567890
1?Introduction
Graph classification is a task that involves predicting the class or label of a graph based on its structural and/or node-level attributes. Graph classification is an important problem in a variety of fields, including natural language processing, social network analysis, and bioinformatics, among others.
Graph classification models are typically trained on a labeled dataset of graphs, where each graph is associated with a class label. The goal of the model is to learn a function that can accurately predict the class label of a new, unseen graph based on its structural and/or node-level attributes.
There are a variety of approaches that can be used to represent graphs and extract features for use in graph classification models. Some common methods include using graph kernels, graph embeddings, and graph neural networks. These approaches aim to capture the structural and/or node-level attributes of the graph in a way that can be used to predict its class label.
Graph classification models can be evaluated using a variety of metrics, such as accuracy, precision, recall, and F1 score. The choice of metric depends on the specific requirements and goals of the classification task.
In summary, graph classification is a task that involves predicting the class label of a graph based on its structural and/or node-level attributes, and is an important problem in a variety of fields. A variety of approaches can be used to represent graphs and extract features for use in graph classification models, which can be evaluated using a variety of metrics depending on the specific requirements and goals of the task.
One of the key challenges in graph classification is the need to effectively capture the structural and node-level attributes of the graph in a way that can be used to predict its class label. This can be particularly challenging due to the complex and irregular structure of graphs, which can make it difficult to extract useful features for classification.
One approach to addressing this challenge is to use graph kernels, which are methods that compute a similarity measure between pairs of graphs based on their structural attributes. Graph kernels can be used to create a feature space in which the graphs are represented as points, and the distance between points reflects the similarity between the corresponding graphs. This allows the use of standard machine learning algorithms for classification, such as support vector machines or decision trees.
Another approach is to use graph embeddings, which are methods that map graphs to a lower-dimensional space in a way that preserves their structural and node-level attributes. Graph embeddings can be learned using techniques such as deep learning or matrix factorization, and can be used to represent graphs in a more compact and interpretable form.
Graph neural networks are another approach that has gained popularity in recent years for graph classification. These are neural networks that are specifically designed to operate on graph data, and can learn to extract useful features from the graph structure and node attributes. Graph neural networks have been shown to be effective for a wide range of graph classification tasks, and have the advantage of being able to learn complex, non-linear relationships in the data.
In summary, there are a variety of approaches that can be used to represent graphs and extract features for use in graph classification models, including graph kernels, graph embeddings, and graph neural networks.
Graph classification is the task of predicting the class label of a graph based on its structural and node-level attributes. It is a widely studied problem in the field of machine learning, with applications in a variety of domains, including social networks, chemistry, biology, and more.
One of the key challenges in graph classification is the need to effectively represent the complex and irregular structure of graphs in a way that can be used to predict their class labels. This can be particularly challenging due to the high degree of variability in the structure of real-world graphs, which can make it difficult for a classifier to generalize to new, unseen graphs.
There are a variety of approaches that can be used to represent graphs and extract features for use in graph classification models, including graph kernels, graph embeddings, and graph neural networks. These approaches aim to capture the structural and node-level attributes of the graph in a way that can be used to predict its class label, and can be combined with a variety of machine learning algorithms, such as support vector machines, decision trees, and more, to perform the classification task.
In addition to these approaches, data augmentation techniques, which involve generating additional, synthetic training examples to augment the original dataset, can also be used to improve the performance of graph classification models.
There are a variety of approaches that can be used for graph data augmentation, including model-specific and model-agnostic methods. Model-specific methods are designed specifically for a particular model or task, and can often achieve better performance than model-agnostic methods, but may be less generalizable and require more careful tuning of hyperparameters. Model-agnostic methods, on the other hand, are designed to work with a wide range of models and tasks, and can be applied more broadly, but may not achieve the same level of performance as model-specific methods.
One example of a model-agnostic method for graph data augmentation is the NodeSam algorithm. It is designed to generate a large number of augmented training examples while preserving the structural properties of the original graph, and has been shown to be effective for improving the performance of graph classification models on a variety of tasks and datasets.
In summary, data augmentation is an important technique for improving the performance of graph classification models, and there are a variety of approaches that can be used for graph data augmentation, including model-specific and model-agnostic methods. The NodeSam algorithm is an example of a model-agnostic method for graph data augmentation that has been shown to be effective for a wide range of graph classification tasks.
One example of a model-agnostic method for graph data augmentation is the NodeSam algorithm, which was introduced in the paper "Model-Agnostic Augmentation for Accurate Graph Classification." The NodeSam algorithm is designed to generate a large number of augmented training examples while preserving the structural properties of the original graph. It works by performing split and merge operations on nodes in the graph, which allows it to augment both the node- and edge-level information in the graph.
The NodeSam algorithm is motivated by the observation that simple heuristics-based approaches to graph data augmentation often lead to unreliable results, and that model-specific approaches can be difficult to generalize to other tasks and models. To address these issues, the authors of the paper propose five desired properties for an effective graph data augmentation algorithm:
Generality: The algorithm should work with a wide range of models and tasks.
Effectiveness: The algorithm should improve the performance of the classifier on the target task.
Diversity: The algorithm should generate a diverse set of augmented examples.
Semantic preservation: The algorithm should preserve the semantic information of the original graph.
Structural preservation: The algorithm should preserve the structural properties of the original graph.
The NodeSam algorithm is designed to satisfy all of these desired properties, and has been shown to outperform other model-agnostic and model-specific approaches on a variety of graph classification tasks. Overall, the use of data augmentation techniques, such as the NodeSam algorithm, can be an effective way to improve the performance of graph classification models, and may be particularly useful for addressing the challenge of classifying complex and irregularly-structured graphs.
One potential limitation of the NodeSam algorithm is that it requires a relatively large number of input graphs in order to generate a sufficient number of augmented training examples. This may make it less suitable for tasks where the original dataset is very small or limited, as it may not be possible to generate a sufficient number of augmented examples to improve the performance of a classifier.
Another potential limitation of the NodeSam algorithm is that it may be computationally expensive to apply, particularly when working with large graphs or datasets. This may make it less suitable for tasks where the computational resources are limited, or where it is necessary to process large amounts of data in a short period of time.
Despite these limitations, the NodeSam algorithm has demonstrated strong results in a variety of different domains, and has the potential to be widely applied in other areas where graph-based models are used. Its ability to generate a large number of augmented training examples while preserving the structural and semantic properties of the original dataset makes it a valuable resource for researchers and practitioners working on graph-based tasks.
One key area where the NodeSam algorithm has been applied is in social network analysis, where it has been used to improve the performance of classifiers for tasks such as community detection and user classification. In these cases, the NodeSam algorithm has been shown to significantly improve the accuracy of various classifiers, demonstrating its effectiveness as a model-agnostic approach to data augmentation.
The NodeSam algorithm has also been applied to molecular graph classification tasks, where it has been used to improve the performance of classifiers for tasks such as drug discovery and protein-ligand binding prediction. In these cases, the NodeSam algorithm has been shown to significantly improve the accuracy of various classifiers, demonstrating its effectiveness as a tool for improving the generalization performance of graph-based models.
In addition to these applications, the NodeSam algorithm has also been used in recommendation systems, where it has been used to improve the performance of classifiers for tasks such as item recommendation and user recommendation. In these cases, the NodeSam algorithm has been shown to significantly improve the accuracy of various classifiers, demonstrating its effectiveness as a tool for improving the generalization performance of graph-based models.
Overall, the NodeSam algorithm has demonstrated strong results in a variety of different domains, and has the potential to be widely applied in other areas where graph-based models are used. Its ability to generate a large number of augmented training examples while preserving the structural and semantic properties of the original dataset makes it a valuable resource for researchers and practitioners working on graph-based tasks.
1?Introduction
Our objective is to create a set of upgraded graphs that will help the graph classifier perform better. The fundamental issue with upgrading is that the semantic information of the graph, which is the single feature that defines its name, is not properly provided. In the classification task of molecular diagrams, for example, even domain specialists struggle to determine if the upgraded pictures have the same chemical characteristics as the original diagrams. This makes it difficult for data distribution augmentation techniques to safely grow.
To increase the degree of improvement while limiting the danger of modifying semantic information, we suggest five desired qualities for good enhancement algorithms. Attributes 1 and 2 are used to store fundamental structural information regarding the graph's size and connectedness, respectively.
?Property 1 (Preserving size).?
?Property 2 (Preserving connectivity).
?Property 3 (Changing nodes).
?Property 4 (Changing edges).?
?Property 5 (Linear complexity).
1?Introduction
The NodeSam algorithm is implemented as a two-step process, consisting of a node split step and a node merge step. During the node split step, a node in the original graph is selected at random and then split into two new nodes, with the original node's edges and attributes being evenly distributed among the two new nodes. This step allows the NodeSam algorithm to augment the node-level information in the graph, as it creates new nodes with different combinations of attributes and connections to other nodes.
The node merge step works in a similar way, but instead of splitting a node into two new nodes, it combines two existing nodes into a single new node. This step allows the NodeSam algorithm to augment the edge-level information in the graph, as it creates new edges between nodes that were previously not connected.
The NodeSam algorithm is designed to be model-agnostic, which means that it can be used with a wide range of models and tasks. It has been shown to be particularly effective for graph classification tasks, where it has been able to improve the performance of various classifiers on a variety of datasets. One of the key benefits of the NodeSam algorithm is that it is able to generate a large number of augmented training examples while preserving the structural properties of the original graph, which can be important for maintaining the semantic information of the graph and avoiding overfitting.
To demonstrate the effectiveness of the NodeSam algorithm, the authors of the paper "Model-Agnostic Augmentation for Accurate Graph Classification" conducted a series of experiments on nine different datasets. These experiments compared the performance of the NodeSam algorithm to a number of baseline approaches, including DropEdge, GraphCrop, and NodeAug, as well as the model-specific approach MotifSwap. The results of these experiments showed that the NodeSam algorithm was able to significantly improve the performance of various classifiers on all of the datasets, with an average improvement in accuracy of up to 2.1 times compared to the best baseline approach.
Overall, the NodeSam algorithm is a promising model-agnostic approach for graph data augmentation, and has demonstrated its effectiveness in a range of graph classification tasks. Its ability to generate a large number of augmented training examples while preserving the structural properties of the original graph makes it a valuable tool for improving the performance of graph classifiers.
The NodeSam algorithm can be used as part of a broader machine learning workflow for graph classification tasks. In general, the process of using the NodeSam algorithm to augment a dataset for graph classification can be broken down into the following steps:
Preprocessing: The first step in using the NodeSam algorithm is to preprocess the original dataset to prepare it for augmentation. This may involve tasks such as normalizing or scaling the node attributes, removing missing or invalid data, and converting the graph structure to a suitable format for the NodeSam algorithm.
Augmentation: Once the dataset has been preprocessed, the NodeSam algorithm can be used to generate augmented training examples. This is typically done by specifying the number of augmented examples to generate and the desired level of augmentation, and then running the NodeSam algorithm on the original dataset to generate the augmented examples.
Training: The augmented dataset can then be used to train a classifier, using any suitable machine learning algorithm. This may involve tasks such as selecting an appropriate model architecture, optimizing the model's hyperparameters, and evaluating its performance on a held-out validation set.
Evaluation: Once the classifier has been trained, it can be evaluated on a separate test set to assess its performance on unseen data. This can be done by comparing the classifier's predicted labels to the true labels for the test set, using metrics such as accuracy, precision, and recall.
Overall, the use of the NodeSam algorithm can be a useful way to improve the performance of a classifier on a graph classification task. By generating a large number of augmented training examples that preserve the structural and semantic properties of the original dataset, the NodeSam algorithm can help to expose the classifier to a wider range of graph structures and node attributes, improving its ability to generalize to new, unseen data.
One of the key features of the NodeSam algorithm is that it is designed to be model-agnostic, which means that it can be used with a wide range of models and tasks. This makes it a flexible and versatile tool for data augmentation, as it can be applied to a variety of different graph classification problems without requiring any specific knowledge of the target model or task.
Another advantage of the NodeSam algorithm is that it is able to generate a large number of augmented training examples, which can be particularly useful for tasks where the original dataset is small or limited. By augmenting the dataset with additional examples, the NodeSam algorithm can help to improve the generalization performance of a classifier, as it exposes the model to a wider range of graph structures and node attributes.
One key feature of the NodeSam algorithm is that it performs split and merge operations on nodes in a graph, in order to augment the graph while minimizing the risk of semantic change. This means that the NodeSam algorithm is able to modify the graph structure in a way that preserves the characteristics of the original graph, while still generating new and diverse training examples.
In contrast to other approaches for graph augmentation, such as DropEdge and GraphCrop, the NodeSam algorithm does not rely on simple heuristics or assumptions about the underlying graph structure. Instead, it uses a more sophisticated and targeted approach to augment the graph, which allows it to generate more diverse and reliable training examples.
The NodeSam algorithm has been applied successfully to a variety of different graph classification tasks, including social network analysis, molecular graph classification, and recommendation systems. In each of these domains, the NodeSam algorithm has been shown to significantly improve the performance of various classifiers, demonstrating its effectiveness as a model-agnostic approach to data augmentation.
In addition to its applications in graph classification, the NodeSam algorithm may also have potential use in other graph-based tasks such as node classification, link prediction, and graph clustering. By generating a large number of augmented training examples that preserve the structural and semantic properties of the original dataset, the NodeSam algorithm may be able to improve the performance of a wide range of graph-based machine learning models.
Overall, the NodeSam algorithm is a powerful tool for improving the performance of graph classifiers, and has the potential to be widely applied in a variety of different domains. Its ability to generate a large number of augmented training examples while preserving the structural and semantic properties of the original dataset makes it a valuable resource for researchers and practitioners working on graph-based tasks.
1?Introduction
Graph classification is a task that involves predicting the label of a given graph, based on its structural and semantic properties. This is an important problem in a variety of different domains, including social network analysis, molecular graph classification, and recommendation systems.
One key challenge in graph classification is the limited size of the available training dataset, which can lead to poor generalization performance of the classifier. Data augmentation is a technique that can be used to improve the performance of a classifier by generating additional training examples that are representative of the underlying data distribution.
The paper "Model-Agnostic Augmentation for Accurate Graph Classification" presents a novel approach for graph augmentation, which is a technique for improving the performance of graph-based models by generating additional training examples that are representative of the underlying data distribution. The authors introduce the NodeSam and SubMix algorithms, which are model-agnostic approaches for graph augmentation that are designed to satisfy five desired properties that are essential for effective graph augmentation. These properties include preserving the structural and semantic properties of the original graph, generating a large number of augmented examples, being computationally efficient, being robust to model and task changes, and being generalizable to a wide range of models and tasks.
The problem of graph augmentation is defined as follows: given a set of graphs, generate a new set of graphs that have similar characteristics to the given graphs and that are more suitable for training a model to improve its performance. The authors focus on graph classification as the target task, as it is more sensitive to the quality of the augmentation than other tasks such as node classification or link prediction.
To address this problem, the authors propose five desired properties for an effective graph augmentation algorithm: it should be model-agnostic, it should generate a large number of augmented examples, it should be computationally efficient, it should be robust to model and task changes, and it should be generalizable to a wide range of models and tasks. They then introduce several baseline methods that do not satisfy all of these properties, and use these as a comparison in the experimental results.
The authors demonstrate the effectiveness of the NodeSam and SubMix algorithms on a variety of different datasets, including social networks and molecular graphs, and show that they outperform existing approaches in graph classification tasks.
The NodeSam algorithm performs split and merge operations on nodes in the original graph in order to minimize the risk of semantic change while augmenting both the node- and edge-level information. The split operation creates two new nodes from a single original node by copying its feature vector and attaching each copy to a randomly chosen neighbor. The merge operation combines two nodes by selecting a random feature value and assigning it to the merged node, and then deleting one of the original nodes. The authors provide a theoretical analysis of the characteristics of the NodeSam algorithm, showing that it satisfies the desired properties even in the worst cases.
The SubMix algorithm, on the other hand, works by combining multiple graphs by swapping random subgraphs. The idea behind this approach is to create rich soft labels for the augmented graphs, by combining the evidence for different classes. To do this, the algorithm selects a random subgraph from each of two different graphs, and then swaps them to create a new augmented graph. The process is repeated until a sufficient number of augmented graphs have been generated.
The authors demonstrate the effectiveness of the NodeSam and SubMix algorithms on a variety of different datasets, including social networks and molecular graphs. They show that these algorithms outperform existing approaches in graph classification tasks, and make up to 2.1 times larger improvement in accuracy compared to the best competitors.
In terms of the desired properties for effective graph augmentation, the NodeSam and SubMix algorithms are designed to preserve the structural and semantic properties of the original graph, generate a large number of augmented examples, be computationally efficient, be robust to model and task changes, and be generalizable to a wide range of models and tasks. These properties are essential for ensuring that the augmented graphs are suitable for use with a wide range of different models and tasks, and that they are effective at improving the performance of the target model.
In addition to the experimental results, the authors also provide theoretical analysis of the characteristics of the proposed approaches, and demonstrate that they satisfy the desired properties even in the worst cases. This provides a strong foundation for the use of the NodeSam and SubMix algorithms in a wide range of different applications.
In the experimental results, the authors show that the NodeSam and SubMix algorithms outperform the baseline methods in terms of graph classification accuracy on all nine datasets. They also show that the NodeSam algorithm is more effective at preserving the structural properties of the original graph, while the SubMix algorithm is more effective at generating diverse augmented examples. Overall, the results of this paper suggest that the NodeSam and SubMix algorithms are effective approaches for model-agnostic graph augmentation that can improve the performance of graph classifiers on a wide range of tasks and datasets.
Overall, the main contribution of this paper is the introduction and evaluation of two novel model-agnostic algorithms for graph augmentation, which are able to improve the performance of graph-based models in a variety of different domains. These algorithms have the potential to be widely applied in areas where graph-based models are used, and could significantly improve the accuracy and reliability of these models.
1564654654564654654
The structure of the paper is as follows:
Introduction: This section provides an overview of the problem of graph augmentation and introduces the NodeSam and SubMix algorithms as model-agnostic approaches for improving the performance of graph classifiers. The authors begin by discussing the importance of graph augmentation for various tasks, including graph classification, and how it has been traditionally approached in the literature. They then present the key contributions of the paper, which include the definition of desired properties for effective graph augmentation and the development of two novel algorithms that satisfy these properties.
Problem and desired properties: In this section, the authors define the problem of graph augmentation formally and describe the desired properties for an effective augmentation algorithm. They begin by presenting the formal definition of the problem, which is to generate a set of new graphs that are more suitable for training a model than the original set of graphs. The authors then identify five desired properties for an effective augmentation algorithm:
(1)preserving the original structure of the graph,
(2)preserving the original label distribution,
(3)increasing the variety of the augmented graphs,
(4) generating diverse soft labels for multi-class classification, and
(5)being model-agnostic.
These properties are motivated by the need to preserve the original characteristics of the graph while also providing additional information that can improve the performance of a classifier. The authors also introduce several baseline methods that will be used for comparison in the experiments, including DropEdge, GraphCrop, NodeAug, and MotifSwap.
Proposed methods: This section presents the NodeSam and SubMix algorithms in detail, including their motivations, methods, and theoretical properties. The authors begin by discussing the motivations behind these algorithms, which are to generate augmented graphs that are both diverse and stable. They then describe the methods for each algorithm in detail, including the specific operations that are performed and the parameters that can be adjusted. The authors also provide a theoretical analysis of the characteristics of these algorithms, demonstrating that they satisfy all of the desired properties even in the worst cases. Finally, they compare the NodeSam and SubMix algorithms to the baseline methods
Experiments: This section describes the experimental setup and results in detail. It includes information on the datasets and classifiers used, the evaluation metrics, and the experimental results for each dataset. The authors compare the performance of the NodeSam and SubMix algorithms to the baseline methods and show that they significantly outperform the baselines in most cases. They also conduct a sensitivity analysis to examine the robustness of the algorithms to different parameters and show that they are relatively robust to changes in the parameters.
Related work: This section provides a review of related work in the field of graph augmentation, including both model-specific and model-agnostic approaches. The authors discuss the limitations of previous approaches and how their proposed algorithms address these limitations.
Conclusion: This section summarizes the main contributions of the paper and discusses the implications of the results for future work in the field of graph augmentation. The authors argue that their proposed algorithms represent a significant advance over previous approaches to graph augmentation, as they are model-agnostic and can be applied to a wide range of different tasks and models. They suggest several directions for future research, including the development of new augmentation methods
代码分析
The above code snippet is a class definition for the NodeSam algorithm in the paper "Model-Agnostic Augmentation for Accurate Graph Classification". NodeSam is a model-agnostic algorithm for augmenting graphs, which means that it can be used with any type of graph classifier and does not depend on the specific details of the model.
The NodeSam class is defined with two main methods: init and call. The init method is the constructor for the class, which is called when an instance of the class is created. It takes two arguments: "graphs" and "adjustment". The "graphs" argument is a list of graphs that will be used for augmentation. The "adjustment" argument is a Boolean value that determines whether the NodeSam algorithm should adjust the number of nodes in the augmented graph to match the number of nodes in the original graph.
The call method is a special method in Python that allows an instance of a class to be "called" like a function. When the NodeSam instance is "called" with an argument (in this case, an "index" value), it will return an augmented graph.
The NodeSam class also includes two instance variables: "split" and "merge". These are instances of the SplitNode and MergeDirect classes, respectively, which are used to perform the split and merge operations on the graphs.
The NodeSamBase class is a subclass of the NodeSam class, with the only difference being that the "adjustment" argument is set to False in the constructor. This means that the NodeSamBase algorithm will not adjust the number of nodes in the augmented graph to match the number of nodes in the original graph.
This code appears to be related to the MergeDirect class in the augment.merge module. The select_nodes function selects a pair of nodes at random from the given graph that are connected by an edge, and returns them as a sorted list. The make_onehot function takes an input tensor feature, which should be a 1D tensor of non-negative values, and returns a one-hot encoded version of feature as a tensor. To create the one-hot encoding, the function first selects an index at random from feature according to the probabilities given by the values in feature. It then creates a new tensor of the same size as feature filled with zeros, and sets the element at the selected index to 1. This can be used to merge two nodes in a graph by replacing their original feature vector with a one-hot encoded vector indicating which node's features were kept.
select_nodes takes a graph object as input and returns a sorted pair of nodes that are connected by an edge in the graph. It does this by randomly selecting an edge from the graph and returning the two nodes that are connected by that edge.
make_onehot takes a feature tensor as input and returns a one-hot encoded version of that feature tensor. The feature tensor is assumed to contain non-negative values, and the one-hot encoding is determined by randomly selecting an element from the feature tensor with probabilities proportional to the values in the feature tensor. The resulting one-hot encoded tensor has the same shape as the original feature tensor, with a single element set to 1 and all other elements set to 0.
These functions could be used in a graph augmentation algorithm to randomly select edges from a graph and either split them into two separate nodes or merge them into a single node. The make_onehot function could be used to ensure that the resulting graph maintains the same number of features as the original graph, while allowing the values of those features to be adjusted during the augmentation process.
The select_nodes function randomly selects an edge from the graph represented by graph, and returns the indices of the two nodes that the edge connects. The make_onehot function takes a feature vector feature, which should contain non-negative values, and returns a new vector with the same shape as feature but with a single 1 at a random position chosen based on the values in feature (the probability of choosing each position is proportional to the value at that position). The remove_triangles function removes any triangles involving the nodes with indices winner and loser from the edge list represented by edge_index. To do this, it first creates a boolean mask w_neighbors that is True for all nodes that have an edge to winner. It then creates two boolean masks dup1 and dup2 that are True for all edges where the row node is loser and the column node is a neighbor of winner, or vice versa. The function returns edge_index with these edges removed by applying the logical NOT operator to the bitwise OR of dup1 and dup2, and indexing edge_index with this mask. The make_merging_map function creates a mapping from the original node indices in a graph to new node indices after merging the nodes with indices winner and loser into a single node. It does this by creating a boolean mask node_mask that is True for all indices except loser, and using this mask to create a map from the original indices to a new range of indices from 0 to num_nodes - 2. It then sets the new index for the loser node to be the same as the winner node. This function is used to update the edge list after merging two nodes in a graph, to ensure that all edges still connect the correct nodes.
This code is implementing two classes related to graph augmentation: MergeDirect and MergeEdge.
The MergeDirect class is a method for augmenting a graph by merging two nodes together. The method works by selecting two nodes in the graph at random and merging them into a single node. The new node's feature vector is formed by averaging the feature vectors of the two merged nodes. If the onehot parameter is set to True, the new node's feature vector will be a one-hot encoded vector representing the most common feature value among the two merged nodes. The resulting graph will have one fewer node than the original graph.
The MergeEdge class is a wrapper around the MergeDirect class that allows it to be applied to a list of graphs. The __call__ method takes an index as input and applies the MergeDirect method to the corresponding graph in the list of graphs.
Both of these classes can be used as part of a larger graph augmentation pipeline, such as the NodeSam and NodeSamBase classes described in the previous code snippet.
This is a function that takes an array-like object and splits it into two parts with a probability p of an element being in the first part. It first converts the input array to a NumPy array, shuffles it randomly, and then uses the np.random.binomial function to randomly select n elements from the shuffled array, where n is the number of elements that will be in the first part of the split. The first part of the split is then returned as a slice of the original array and the second part is returned as the remaining elements.
This code is implementing a function called split_with_adjustment that takes in a graph, a count of triangles in the graph, a list of nodes in those triangles, and a list of neighbors of a particular node in the graph. It returns a tuple of two lists of nodes, which are the result of splitting the list of neighbors into two subsets.
The basic idea of this function is to split the list of neighbors into two subsets such that the number of triangles is minimized. If there are no triangles in the graph (i.e., tri_count is zero), then the function simply splits the list of neighbors into two subsets with equal probability (using the split function defined earlier).
If there are triangles in the graph, then the function uses a more complex approach to split the list of neighbors. It does this by first splitting the list of nodes in the triangles into two subsets, with a probability determined by the number of triangles in the graph and the number of neighbors. It then splits the remaining nodes (i.e., those that are not in the triangles) into two subsets with equal probability. Finally, it combines these two subsets with the subsets of nodes from the triangles to form the final two subsets of neighbors.
The specific algorithm used to determine the probability with which to split the nodes in the triangles is somewhat complicated, but it appears to be based on trying to minimize the number of triangles in the final two subsets of neighbors. It involves computing various quantities such as the number of nodes and edges in the graph, the number of triangles, and the degree of the node (i.e., the number of neighbors).? Finally, it splits the list of triangle nodes randomly with probability min(n/len(tri_nodes), 1) (i.e., the probability is either n/len(tri_nodes) or 1, whichever is smaller), then concatenates the resulting split with the remaining neighbors (i.e., those not in tri_nodes) and returns these two sets of nodes.
This code appears to define several functions and classes for augmenting graph data by either splitting or merging nodes in a graph.
The SplitNode class takes a list of torch_geometric.data.Data objects, graphs, as input and has an optional adjustment flag. It also initializes three instance variables: nn_lists, nn_sets, and di_edges. nn_lists is a list of lists of neighbors for each node in each graph in graphs, nn_sets is a list of lists of sets of neighbors for each node in each graph in graphs, and di_edges is a list of tensors containing directed edges for each graph in graphs. The __call__ method of SplitNode takes an index as input and randomly selects a node from the graph at that index in graphs. It then creates a new node and divides the neighbors of the selected node into two sets. If the adjustment flag is set to True, it uses the split_with_adjustment function to split the neighbors with a probability determined by the number of triangles that include the selected node and the degree of the selected node. If the adjustment flag is set to False, it uses the split function to split the neighbors randomly with a probability of 0.5. The SplitNode method then creates a tensor of new edges connecting the old and new nodes with the two sets of neighbors and appends this tensor to the existing edges in the graph, excluding the edges involving the old node. It then concatenates the old and new node features and returns a new Data object with the updated edge index and node features.
The select_nodes function takes a graph object as input and returns a sorted list of two randomly selected nodes from the graph's edge index tensor.
The make_onehot function takes a tensor of node features as input and returns a one-hot encoded version of the tensor by randomly selecting an index from the input tensor with probabilities proportional to the values in the tensor.
The remove_triangles function takes a tensor of edges, the number of nodes in the graph, and the indices of two nodes as input and returns the input tensor with all edges involving the two nodes removed if both nodes are neighbors of the other.
The make_merging_map function takes the number of nodes in the graph and the indices of two nodes as input and returns a tensor of the same size as the number of nodes with the second node's index set to the value of the first node's index and all other indices set to their original values.
The MergeDirect class has an optional onehot flag and a __call__ method that takes a graph object as input. It randomly selects two nodes from the graph's edge index tensor and uses the make_merging_map function to merge the two nodes. It then divides the features of the merged node by 2 and, if the onehot flag is set to True, one-hot encodes the merged node's features using the make_onehot function. It then removes all edges involving the two nodes using the remove_triangles function and removes self-loops from the resulting edge index tensor. It returns a new Data object with the updated node features, edge index, and labels.
The MergeEdge class takes a list of torch_geometric Data objects and a boolean value onehot as input and creates an object that can apply the MergeDirect transformation to the Data objects in the list. The onehot flag specifies whether or not to apply the one-hot encoding to the merged node feature in the MergeDirect transformation. The MergeEdge class has a method __call__ which takes an integer index as input and returns the transformed torch_geometric Data object at that index in the list.
1?Prerequisites
The reproduction is based on Python3.7 and PyTorch Geometric,
with the following configuration requirements:
?Python 3.7
?PyTorch 1.4.0
?PyTorch Geometric 1.6.3
the full list of packages required to run our codes :
1 wheel==0.36.2
2 numpy==1.20.1
3 scipy==1.6.1
4 scikit-learn==0.24.1
5 pandas==1.2.2
6 tqdm==4.57.0
7 torch==1.4.0
8 networkx==2.5
Datasets
We use 9 benchmark datasets summarized inTable 2 in our work which downloaded via PyTorch Geometric. Then run data.py in the src directory to download the dataset in the data/graphs directory. Our split index in data/split is also based on these datasets.D&D, ENZYMES, MUTAG, NCI1, NCI109, PROTEINS, and PTCMR are datasets of molecular graphs that represent chemicalcompounds. COLLAB and Twitter are datasets of social networks. The numbers of nodes and edges in Table 2 are from all the graphs in each dataset.
Name Graphs Nodes Edges Features Labels
DD 1,178 334,925 843,046 89 2
ENZYMES 600 19,580 37,282 3 6
MUTAG 188 3,371 3,721 7 2
NCI1 4,110 122,747 132,753 37 2
NCI109 4,127 122,494 132,604 38 2
PROTEINS 1,113 43,471 81,044 3 2
PTC_MR 334 4,915 5,054 18 2
COLLAB 5,000 372,474 12,286,079 3 2
Twitter 144,033 580,768 717,558 18 2
We perform experiments to answer the following questions:
?Accuracy.Do NodeSam? improve the accuracy of graph classifiersAre they better than previous approaches for graph augmentation
?Desired properties .Do NodeSam? satisfy the desired properties ?in real-world graphs as we claim theoretically
?Ablation study.Do our ideas for improving NodeSam , such as the adjustment or diffusion operation, increase the accuracy of graph classifiers
Name Nodesam(ranks)
DD 76.56 (3)
ENZYMES 60.23(1)
MUTAG 90.90(1)
NCI1 83.35(3)
NCI109 83.66(2)
PROTEINS 76.08(2)
PTC_MR 65.50(4)
COLLAB 82.88(3)
Twitter 66.09(6)
Average 76.13(1)
Classifier. For measuring accuracy, we utilize GIN as a graph classifier, which is one of the most common models for graph classification and performs well in various areas. To ensure that the accuracy gain is due to augmentation rather than hyperparameter adjustment, the hyperparameters are searched in the same space as in their original paper: batch size in 32, 128 and dropout probability in 0, 0.5.
Training details. Following the experimental procedure of GIN, which we utilize as the classifier, we evaluate it using 10-fold cross-validation. The indices of selected graphs for each fold are contained in the code repository provided. The Adam optimizer is employed, and the learning rate begins at 0.01 and gradually lowers by half every 50 epochs until it reaches 350 epochs. All of our testing were carried out on my PC, which was equipped with an AMD Ryzen 5600X and an RTX 3070.
Accuracy of Graph Classification (Q1)
The table shows the graph classification accuracy of various graph enhancement methods.? The values in parentheses are Nodesam's ranking of the comparison of the original methods in each dataset.? The best average accuracy is in NodeSam.NodeSam? improve the average accuracy of GIN by 1.71? percent points, which are 2.0 ×? larger than the improvement of the best competitor, respectively.
?Preserving Desired Properties (Q2)
The primary motivation for our NodeSam and SubMix is to satisfy the attributes required above. The empirical results we present support the theoretical claims given above.
NodeSam performs unbiased enhancements by which they change the edges enough to generate different examples for training the classifier. SubMix is more enhanced than NodeSam because it combines multiple diagrams with different structures. In addition, NodeSam and SubMix have similar patterns of linear scalability.
Ablation Study (Q3)
NodeSam and SubMix using all techniques achieved the best accuracy in all versions, showing that our proposed technique can effectively improve performance by meeting the desired properties. Second, each base version of our method still significantly improves the baseline by using the original diagram without enhancements. This is evident when compared to existing methods in the original paper, which have achieved only minor improvements and even reduced baseline accuracy. This improvement demonstrates the effectiveness of our ideas, which are the building blocks of NodeSam and SubMix.
Conclusion
REFERENCES
[1] Patricia S. Abril and Robert Plant, 2007. The patent holder's dilemma: Buy, sell, or troll??Commun. ACM 50, 1 (Jan, 2007), 36-44. DOI:?https://doi.org/10.1145/1188913.1188915.
[2] Sten Andler. 1979. Predicate path expressions. In?Proceedings of the 6th. ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '79). ACM Press, New York, NY, 226-236. DOI:https://doi.org/10.1145/567752.567774
[3] Ian Editor (Ed.). 2007.?The title of book one?(1st. ed.). The name of the series one, Vol. 9. University of Chicago Press, Chicago. DOI:https://doi.org/10.1007/3-540-09237-4.
[4] David Kosiur. 2001.?Understanding Policy-Based Networking?(2nd. ed.). Wiley, New York, NY..