Guiding Search in Relational Pathfinding-based Concept Discovery via Bivariate Statistical Methods
Keywords:concept discovery, frequency ratio, hazard index, heuristic, relational-path, weight of evidence
AbstractRelational pathfinding-based systems learn concept descriptors by extending candidate concept descriptors by one literal at a time. As such learning systems usually deal with large search spaces, choosing literals to extend candidate concept descriptors becomes an essential issue. In this study we empirically analyze applicability of three bivariate statistical methods namely, frequency ratio, hazard index, and weight of evidence, as heuristics to choose literals to extend candidate concept descriptors. 10-fold experiments conducted on three benchmark datasets showed that frequency ratio, hazard index, and weight of evidence were able to reduce the space and hence provided speedups when compared to extending candidate concept descriptors by a randomly chosen literal. Moreover, the heuristic-based settings provided improved predictive accuracy.
S. Dzeroski, “Multi-relational data mining: an introduction”, ACM SIGKDD Exploration Newsletter, vol. 5, no. 1, pp. 1-16, 2003.
S. Muggleton and L. De Raedt, “Inductive logic programming: Theory and methods”, The Journal of Logic Programming, vol. 19, pp. 629-679, 1994.
J. R. Quinlan, “Learning logical definitions from relations”, Machine Learning, vol. 5, no. 3, pp. 239-266, 1990.
A. Srinivasan, “The aleph manual”, 2001.
S. Muggleton, “Inverse entailment and PROGOL”, New Generation Computing, vol. 13, no. 3-4, pp. 245-286, 1995.
M. J. Sternberg and S. Muggleton, “Structure activity relationships (SAR) pharmacophore discovery using inductive logic programming (ILP)”, QSAR and Combinatorial Science, vol. 22, no. 5, pp. 527-532, 2003.
H. Blockeel, S. Dzeroski and J. Grbovic, “Simultaneous prediction of multiple chemical parameters of river water quality with TILDE”, in Proc. European Conference of Data Mining and Knowledge Discovery, Prague, Czech Republic, 1990, pp. 32-40.
B. Dolsak, I. Bratko and A. Jezernik, “Knowledge base for finite-element mesh design learned by inductive logic programming”, AI EDAM, vol. 12, no. 2, pp. 95-106, 1998.
D. J. Cook, L. B. Holder and S. Djoko, “Knowledge discovery from structured data”, Journal of Intelligent Information Systems, vol. 5, no. 3, pp. 229-248, 1995.
J. A. Gonzalez, L. B. Holder and D. J. Cook, “Graph-based concept learning”, in Proc. FLAIRS Conference, Key West, Florida, USA, 2001, pp. 377-381.
B. L. Richards and R. J. Mooney, “Learning relations by pathfinding”, in Proc. National Conference on Artificial Intelligence, San Jose, CA, USA, 1992, pp. 50-55.
Z. Gao, Z. Zhang and Z. Huang, “Learning relations by pathfinding and simultaneous covering”, in Proc. World Conference on Computer Science and Information Engineering, Los Angeles, CA, USA, 2009, pp. 539-543.
D. J. Cook, L. B. Holder, G. Galal and R. Maglothin, “Approaches to parallel graph-based knowledge discovery”, Journal of Parallel and Distributed Computing, vol. 61, no. 3, pp. 427-446, 2001.
K. Xirogiannopoulos and A. Deshpande, “Extracting and analyzing hidden graphs from relational databases”, in Proc. ACM International Conference of Management of Data, Chicago, IL, USA, 2017, pp. 897-912.
I. M. Ong, I. de Castro Dutra, D. Page and V. S. Costa, “Mode directed path finding”, in Proc. European Conference on Machine Learning, Porto, Portugal, 2005, pp. 673-681.
S. Lee and B. Pradhan, “Landslide hazard mapping at
Selangor, Malaysia using frequency ratio and logistic regression models”, Landslides, vol. 4, no. 1, pp. 33-41, 2007.
N. R. Regmi, J. R. Giardino and J. D. Vitek, “Modeling susceptibility to landslides using weight of evidence approach: Western Colorado, USA”, Geomorphology, vol. 115, no. 1-2, pp. 172-187, 2010.
R. Angles and C. Guttierez, “A survey of graph database models”, ACM Computing Surveys, vol. 40, no. 1, pp. 1-39, 2008.
A. Silvescu, D. Caragea and A. Atramentov, “Graph Databases”, Artificial Intelligence Research Laboratory Department of Computer Science, Iowa State University, 2012, [online] http://people.cs.ksu.edu/~dcaragea/papers/report.pdf.
Q. Zeng, J. M. Patel and D. Page, “QuickFOIL: scalable inductive logic programming”, Proceedings of the VLDB Endowment, vol. 8, no. 3, pp. 197-208, 2014.
S. Sathyadevan, M. S. Devan and S. Gangadharan, “Crime analysis and prediction using data mining”, in Proc. IEEE First International Conference on Networks and Soft Computing, 2014, pp. 406-412.
X. Yu, Y. Sun, P. Zhao, J. Han, “Query driven discovery of semantically similar substructures in heterogeneous networks”, in Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 1500-1503.
M. Jayashree and M. Barathi, “Parallel Mining of Frequent Itemsets using FINM on Neo4j”, International Journal of Innovative Research in Computer and Communication Engineering, vol. 5, no. 4, 2017.
F. Goz and A. Mutlu, “Learning Logical Definitions of n-ary Relations in Graph Databases”, in Proc. The 13th International Conference on Hybrid Artificial Systems, Ovideo, Spain, 2018, pp. 50-61.
N. C. Abay, A. Mutlu and P. Karagoz, “A Graph-based Concept Discovery Method for n-ary Relations”, in Proc. The 17th International Conference on Big Data Analytics and Knowledge Discovery, Valencia, Spain, 2015, pp. 391-402.
A. Mutlu and P. Karagoz, “Policy-based memoization for ilp-based concept discovery systems”, Journal of Intelligent Information Systems, vol. 46, no. 1, pp. 99-120, 2016.
J. Derrac, S. Garcia, D. Molina and F. Herrera, “A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms”, Swarm and Evolutionary Computing, vol. 1, no. 1, pp. 3-18, 2011.
R. C. Prati, G. E. A. P. A. Batista and M. C. Monard, “A survey on graphical methods for classification predictive performance evaluation”, IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1601-1618, 2011.
How to Cite
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.