Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/2440
DC FieldValueLanguage
dc.contributor.authorSayrafi, Bassem
dc.contributor.authorVan Gucht, Dirk
dc.contributor.authorPurdom, Paul W.
dc.date.accessioned2016-10-13T05:28:14Z
dc.date.available2016-10-13T05:28:14Z
dc.date.issued2005
dc.identifier.urihttp://hdl.handle.net/20.500.11889/2440
dc.descriptionA paper submitted to : OSDM '05 Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, Pages 46-55, Chicago, Illinois — August 21 - 21, 2005
dc.description.abstractWe study the relative effectiveness and the efficiency of computing support-bounding rules that can be used to prune the search space in algorithms to solve the frequent item-sets mining problem (FIM). We develop a formalism wherein these rules can be stated and analyzed using the concept of differentials and density functions of the support function. We derive a general bounding theorem, which provides lower and upper bounds on the supports of item-sets in terms of the supports of their subsets. Since, in general, many lower and upper bounds exists for the support of an item-set, we show how to the best bounds. The result of this optimization shows that the best bounds are among those that involve the supports of all the strict subsets of an item-set of a particular size q. These bounds are determined on the basis of so called q-rules. In this way, we derive the bounding theorem established by Calders [5]. For these types of bounds, we consider how they compare relative to each other, and in so doing determine the best bounds. Since determining these bounds is combinatorially expensive, we study heuristics that efficiently produce bounds that are usually the best. These heuristics always produce the best bounds on the support of item-sets for basket databases that satisfies independence properties. In particular, we show that for an item-set I determining which bounds to compute that lead to the best lower and upper bounds on freq(I) can be done in time O(|I|). Even though, in practice, basket databases do not have these independence properties, we argue that our analysis carries over to a much larger set of basket databases where local “near” independence hold. Finally, we conduct an experimental study using real baskets databases, where we compute upper bounds in the context of generalizing the Apriori algorithm. Both the analysis and the study confirm that the q-rule (q odd and larger than 1) will almost always do better than the 1-rule (Apriori rule) on large dense baskets databases. Our experiment reveal that on these baskets databases, the 3-rule prunes almost 100% of the search space while, the 1-rule prunes 96% of the search space in the early stages of the algorithm. We also observe a reduction in wasted effort when applying the 3-rule to sparse baskets databases. In addition, we give experimental evidence that the combined use of the lower and upper bounds determine the exact support of many frequent item-sets without counting.
dc.language.isoenen_US
dc.publisherRobert Schuman Centre for Advanced Studiesen_US
dc.subject.lcshData mining
dc.subject.lcshInformation storage and retrieval systems
dc.subject.lcshDatabase management
dc.titleOn the effectiveness and efficiency of computing bounds on the support of item-sets in the frequent item-sets mining problemen_US
dc.typeReportsen_US
newfileds.conferenceOpen source data mining: frequent pattern mining implementations (1st : 2005 : Chicago, Illinois)
newfileds.item-access-typeopen_accessen_US
newfileds.general-subjectLaw and Judiciaryen_US
item.grantfulltextopen-
item.languageiso639-1other-
item.fulltextWith Fulltext-
Appears in Collections:Fulltext Publications
Files in This Item:
File Description SizeFormat
6503.pdf452.57 kBAdobe PDFView/Open
Show simple item record

Page view(s)

49
Last Week
0
Last month
2
checked on May 11, 2022

Download(s)

3
checked on May 11, 2022

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.