iZi: Easy Prototyping of Interesting Pattern Mining Algorithms

LIRIS - Université de Lyon

PPME - University of New Caledonia

iZi intranet

Although guided by real problems, pattern mining techniques are still marginally used and their implementation can only be carried out by specialists programmers familiar with “low-level” code. The technology transfer is thus slowed down by some limitations, among which the time necessary to the development of operational programs.

iZi has been devised to simplify the prototyping of an important and broad class of pattern mining problems: the discovery of interesting patterns in large databases for which the pattern set is representable as sets [1], i.e. the pattern space is isomorphic to a boolean lattice. Many problems fit into this class, from frequent itemsets to learning of boolean functions and inclusion dependencies.

In this setting, the iZi library aims to simplify the C++ coding of such problems. iZi uses a generic algorithm, called ABS (FIMI 2004 paper). ABS is a simple combination of both a levelwise search (aka Apriori) and a dualization-based search (aka Dualize&Advance).

In the current version of iZi (version 1.2, July, 2014) , three apparently quite different pattern mining problems, namely Frequent Itemsets (FIM), Minimal Keys (Keys) and INclusion Dependencies (INDs), have been implemented with iZi and can be tested easily in command line (see instructions in the README.txt file).

From those three instances of pattern mining problems, the developer can (hopefully!) easily guess how to deal with her own pattern mining problem. She only has to code the specific properties of her problem (mainly data format and interestingness predicate).

The library is developed in C++ and is under GNU General Public License.

iZi is based on the following seminal work: [1] Heikki Mannila and Hannu Toivonen, Levelwise Search and Borders of Theories in Knowledge Discovery, Data Mining and Knowledge Discovery, volume 1, number 3, 241-258, 1997.

Installation

The following pattern mining problems are already integrated into iZi :

  • The discovery of frequent itemsets (FIM) in transactional databases
  • The discovery of minimal keys (Keys) in tables
  • The discovery of satisfied inclusion dependencies (INDs) in databases

Library

Download the library from here

It can be compiled on different environnements (Linux Ubuntu 12 or Mac OS X).

Citing the library

If you want to refer to our library in a publication, please cite the following publication:

Frédéric Flouvat, Fabien De Marchi et Jean-Marc Petit. The iZi project: easy prototyping of interesting pattern mining algorithms. New Frontiers in Applied Data Mining, PAKDD 2009 International Workshops, Revised Selected Papers, LNCS 5669, ©Springer-Verlag, p. 1-15, 2009. pdf

  1. Lhouari Nourine, Jean-Marc Petit. Extending Set-Based Dualization: Application to Pattern Mining. ECAI 2012, IOS Press ed. Montpellier, France. 2012. pdf
  2. Frédéric Flouvat, Fabien De Marchi et Jean-Marc Petit. The iZi project: easy prototyping of interesting pattern mining algorithms. New Frontiers in Applied Data Mining, PAKDD 2009 International Workshops, Revised Selected Papers, LNCS 5669, ©Springer-Verlag, p. 1-15, 2009. pdf
  3. Hélène Jaudoin, Frédéric Flouvat, Jean-Marc Petit et Farouk Toumani. Towards a scalable query rewriting algorithm in presence of value constraints. Journal on Data Semantics (JoDS XII), Vol. 12, p.37-65, 2009.
  4. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit. The open source library iZi for pattern mining problems, Open Source in Data Mining (OSDM) workshop, in conjonction with PAKDD'09, p. 14-25, Bangkok, Thailand, April 2009.
  5. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit. iZi: a new toolkit for pattern mining problems, 17th International Symposium on Methodologies for Intelligent Systems (ISMIS), p. 131-136, Toronto, Canada, May 2008.
  6. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit. Rapid prototyping of pattern mining problems isomorphic to boolean lattices, IEEE International Conference on Research Challenges in Information Science (RCIS), p. 1-11, Marrakech, Morocco, June 2008. pdf 41 long papers accepted on 108 submissions.
  7. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit. Pattern mining problems isomorphic to boolean lattices: from problem statements to efficient implementations , Third Franco-Japanese Workshop on Information Search, Integration and Personalization (ISIP'07), Hokkaido, Japan, June 2007. (invited paper)
  8. Fabien De Marchi, Frédéric Flouvat and Jean-Marc Petit. Adaptive strategies for mining the positive border of interesting patterns: application to inclusion dependencies in databases, Constraint-based Mining and Inductive Databases, Revised Selected Papers, Jean-Francois Boulicaut, Luc De Raedt and Heikki Mannila Ed., LNCS 3848, p. 81-101, 2005. ©Springer-Verlag 2005.
  9. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit, A thorough experimental study of datasets for frequent itemsets, 5th IEEE International Conference on Data Mining (ICDM'05), Houston, USA, November 2005.
  10. Frédéric Flouvat, Fabien De Marchi and Jean-Marc Petit, ABS: Adaptive Borders Search of frequent itemsets, 2nd Workshop on Frequent Itemsets Mining Implementation (FIMI'04), in conjonction with ICDM'04, Brighton, UK, November 2004. CEUR Workshop Proceedings, Vol. 126.
  11. Fabien De Marchi and Jean-Marc Petit, Zigzag: a new algorithm for mining large inclusion dependencies in database, 3rd IEEE International Conference on Data Mining (ICDM'03), Boston, USA, November 2003.
  12. Fabien De Marchi, Stéphane Lopes and Jean-Marc Petit, Efficient Algorithms for Mining Inclusion Dependencies, 8th International Conference on Extending Database Technology (EDBT 2002), Prague, Czech Republic, March 2002.
  13. Stéphane Lopes, Jean-Marc Petit and Lotfi Lakhal, Efficient Discovery of Functional Dependencies and Armstrong Relations, 7th International Conference on Extending Database Technology (EDBT 2000), Konstanz, Germany, March 2000.
  • Data Mining Template Library (DMTL), M.J. Zaki and al.,Computer Science Department, Rensselear Polytechnic Institute. DMTL is an open-source generic data mining library for frequent patterns (itemsets, sequences, trees and graphs) discovery.
  • DAG (2009-2013) (ANR DEFIS 2009 funding)

iZi contributors

Frédéric Flouvat, Associate professor, PPME, University of New Caledonia, New Caledonia.

Jean-Marc Petit, Professor, LIRIS, Université de Lyon, INSA-Lyon, France.

Fabien De Marchi, Associate professor, LIRIS, Université de Lyon, Université Lyon 1, France.

Emmanuel Coquery, Associate professor, LIRIS, Université de Lyon, Université Lyon 1, France

Internships

Carine Lacroix, undergraduate student, INSA (M1 level, 4 months, 2014)

Prisca Bonnet, undergraduate student, INSA (L3 level, 2 months, 2012)

Florent Weillaert, undergraduate student, INSA (L3 level, 2 months, 2012)

Feedbacks

If you have any questions or suggestions concerning use and development of the library, please mail to Frédéric Flouvat.

start.txt · Dernière modification: 2014/08/06 11:01 de jmpetit
CC Attribution-Noncommercial-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0