Guiding Search in Relational Pathfinding-based Concept Discovery via Bivariate Statistical Methods

Alev Mutlu; Furkan Goz; Hatice Zeren

doi:10.18201/ijisae.2019355373

Authors

Alev Mutlu Department of Computer Engineering Kocaeli University http://orcid.org/0000-0003-0547-0653
Furkan Goz Department of Computer Engineering Kocaeli University http://orcid.org/0000-0002-6726-3679
Hatice Zeren Vadi Kurumsal Bilgi Sistemleri, Istanbul http://orcid.org/0000-0002-0798-2874

DOI:

https://doi.org/10.18201/ijisae.2019355373

Keywords:

concept discovery, frequency ratio, hazard index, heuristic, relational-path, weight of evidence

Abstract

Relational pathfinding-based systems learn concept descriptors by extending candidate concept descriptors by one literal at a time. As such learning systems usually deal with large search spaces, choosing literals to extend candidate concept descriptors becomes an essential issue. In this study we empirically analyze applicability of three bivariate statistical methods namely, frequency ratio, hazard index, and weight of evidence, as heuristics to choose literals to extend candidate concept descriptors. 10-fold experiments conducted on three benchmark datasets showed that frequency ratio, hazard index, and weight of evidence were able to reduce the space and hence provided speedups when compared to extending candidate concept descriptors by a randomly chosen literal. Moreover, the heuristic-based settings provided improved predictive accuracy.

Downloads

Download data is not yet available.

References

S. Dzeroski, “Multi-relational data mining: an introduction”, ACM SIGKDD Exploration Newsletter, vol. 5, no. 1, pp. 1-16, 2003.

S. Muggleton and L. De Raedt, “Inductive logic programming: Theory and methods”, The Journal of Logic Programming, vol. 19, pp. 629-679, 1994.

J. R. Quinlan, “Learning logical definitions from relations”, Machine Learning, vol. 5, no. 3, pp. 239-266, 1990.

A. Srinivasan, “The aleph manual”, 2001.

S. Muggleton, “Inverse entailment and PROGOL”, New Generation Computing, vol. 13, no. 3-4, pp. 245-286, 1995.

M. J. Sternberg and S. Muggleton, “Structure activity relationships (SAR) pharmacophore discovery using inductive logic programming (ILP)”, QSAR and Combinatorial Science, vol. 22, no. 5, pp. 527-532, 2003.

H. Blockeel, S. Dzeroski and J. Grbovic, “Simultaneous prediction of multiple chemical parameters of river water quality with TILDE”, in Proc. European Conference of Data Mining and Knowledge Discovery, Prague, Czech Republic, 1990, pp. 32-40.

B. Dolsak, I. Bratko and A. Jezernik, “Knowledge base for finite-element mesh design learned by inductive logic programming”, AI EDAM, vol. 12, no. 2, pp. 95-106, 1998.

D. J. Cook, L. B. Holder and S. Djoko, “Knowledge discovery from structured data”, Journal of Intelligent Information Systems, vol. 5, no. 3, pp. 229-248, 1995.

J. A. Gonzalez, L. B. Holder and D. J. Cook, “Graph-based concept learning”, in Proc. FLAIRS Conference, Key West, Florida, USA, 2001, pp. 377-381.

B. L. Richards and R. J. Mooney, “Learning relations by pathfinding”, in Proc. National Conference on Artificial Intelligence, San Jose, CA, USA, 1992, pp. 50-55.

Z. Gao, Z. Zhang and Z. Huang, “Learning relations by pathfinding and simultaneous covering”, in Proc. World Conference on Computer Science and Information Engineering, Los Angeles, CA, USA, 2009, pp. 539-543.

D. J. Cook, L. B. Holder, G. Galal and R. Maglothin, “Approaches to parallel graph-based knowledge discovery”, Journal of Parallel and Distributed Computing, vol. 61, no. 3, pp. 427-446, 2001.

K. Xirogiannopoulos and A. Deshpande, “Extracting and analyzing hidden graphs from relational databases”, in Proc. ACM International Conference of Management of Data, Chicago, IL, USA, 2017, pp. 897-912.

I. M. Ong, I. de Castro Dutra, D. Page and V. S. Costa, “Mode directed path finding”, in Proc. European Conference on Machine Learning, Porto, Portugal, 2005, pp. 673-681.

S. Lee and B. Pradhan, “Landslide hazard mapping at

Selangor, Malaysia using frequency ratio and logistic regression models”, Landslides, vol. 4, no. 1, pp. 33-41, 2007.

N. R. Regmi, J. R. Giardino and J. D. Vitek, “Modeling susceptibility to landslides using weight of evidence approach: Western Colorado, USA”, Geomorphology, vol. 115, no. 1-2, pp. 172-187, 2010.

R. Angles and C. Guttierez, “A survey of graph database models”, ACM Computing Surveys, vol. 40, no. 1, pp. 1-39, 2008.

A. Silvescu, D. Caragea and A. Atramentov, “Graph Databases”, Artificial Intelligence Research Laboratory Department of Computer Science, Iowa State University, 2012, [online] http://people.cs.ksu.edu/~dcaragea/papers/report.pdf.

Q. Zeng, J. M. Patel and D. Page, “QuickFOIL: scalable inductive logic programming”, Proceedings of the VLDB Endowment, vol. 8, no. 3, pp. 197-208, 2014.

S. Sathyadevan, M. S. Devan and S. Gangadharan, “Crime analysis and prediction using data mining”, in Proc. IEEE First International Conference on Networks and Soft Computing, 2014, pp. 406-412.

X. Yu, Y. Sun, P. Zhao, J. Han, “Query driven discovery of semantically similar substructures in heterogeneous networks”, in Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 1500-1503.

M. Jayashree and M. Barathi, “Parallel Mining of Frequent Itemsets using FINM on Neo4j”, International Journal of Innovative Research in Computer and Communication Engineering, vol. 5, no. 4, 2017.

F. Goz and A. Mutlu, “Learning Logical Definitions of n-ary Relations in Graph Databases”, in Proc. The 13th International Conference on Hybrid Artificial Systems, Ovideo, Spain, 2018, pp. 50-61.

N. C. Abay, A. Mutlu and P. Karagoz, “A Graph-based Concept Discovery Method for n-ary Relations”, in Proc. The 17th International Conference on Big Data Analytics and Knowledge Discovery, Valencia, Spain, 2015, pp. 391-402.

A. Mutlu and P. Karagoz, “Policy-based memoization for ilp-based concept discovery systems”, Journal of Intelligent Information Systems, vol. 46, no. 1, pp. 99-120, 2016.

J. Derrac, S. Garcia, D. Molina and F. Herrera, “A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms”, Swarm and Evolutionary Computing, vol. 1, no. 1, pp. 3-18, 2011.

R. C. Prati, G. E. A. P. A. Batista and M. C. Monard, “A survey on graphical methods for classification predictive performance evaluation”, IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1601-1618, 2011.