高效用占有模式挖掘論文在國際期刊IEEE TCYB在線發(fā)表
最近,哈工大深圳的甘文生博士關(guān)于效用挖掘的論文"HUOPM: High Utility Occupancy Pattern Mining", 先后歷時(shí)24個(gè)月的peer-review, 在人工智能等領(lǐng)域的權(quán)威期刊IEEE Transactions on Cybernetics (SCI, IF:10.387, JCR Q1, 中科院一區(qū), CCF B) 在線發(fā)表, DOI: 10.1109/TCYB.2019.2896267。哈爾濱工業(yè)大學(xué)(深圳)為論文的第一作者單位, 該項(xiàng)研究得到了國家自然科學(xué)青年基金、深圳市孔雀計(jì)劃專項(xiàng)和中國國家留學(xué)基金的資助。本文的完成人包括哈工大深圳的甘文生、西挪威應(yīng)用科技大學(xué)的林?,|教授 (原哈工大深圳的副教授, 已于2018年6月離職)、哈工大深圳的Philippe Fournier-Viger教授,臺(tái)灣東華大學(xué)的趙涵捷教授、美國伊利諾伊大學(xué)芝加哥分校的Philip S. Yu教授。
論文鏈接:https://ieeexplore.ieee.org/abstract/document/8645787
該論文提出一個(gè)基于效用占有(Utility Occupancy)衡量準(zhǔn)則的高效用占有模式挖掘算法。該HUOPM算法首次提出兩個(gè)高度壓縮的數(shù)據(jù)結(jié)構(gòu):效用占有列表和頻率效用表,用于存儲(chǔ)事務(wù)數(shù)據(jù)中各個(gè)數(shù)據(jù)項(xiàng)的頻度和效用信息;同時(shí)首次提出了剩余效用占有的概念,用于計(jì)算出效用占有度的上界值,從而減少實(shí)際的搜索空間。大量的實(shí)驗(yàn)結(jié)果表明 HUOPM 算法可以從事務(wù)型數(shù)據(jù)中有效地挖掘出有價(jià)值的高效用占有模式,而且能保證挖掘結(jié)果完整不遺漏,挖掘性能表現(xiàn)好。該算法成功解決了挖掘高效用占有模式的新研究問題,相關(guān)概念與技術(shù)有望擴(kuò)展到處理其他類型的數(shù)據(jù),有助于進(jìn)一步擴(kuò)大效用挖掘的內(nèi)涵與外延。
IEEE Transactions on Cybernetics (SCI, IF:10.387, JCR Q1, 中科院一區(qū), CCF B), IEEE TCYB是計(jì)算機(jī)科學(xué)的人工智能領(lǐng)域具有高影響力的國際學(xué)術(shù)刊物之一,在2018年該領(lǐng)域120余種JCR期刊中排名前列,影響因子為10.387,中科院一區(qū),主要發(fā)表和報(bào)道計(jì)算智能、人工智能、數(shù)據(jù)科學(xué)、神經(jīng)網(wǎng)絡(luò)、遺傳算法、機(jī)器學(xué)習(xí)、模糊系統(tǒng)、認(rèn)知系統(tǒng)等領(lǐng)域的最新研究進(jìn)展和技術(shù)。
論文題目: HUOPM: High-Utility Occupancy Pattern Mining
文章鏈接:https://ieeexplore.ieee.org/abstract/document/8645787
Authors: Wensheng Gan, Jerry Chun-Wei Lin*, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu
Abstract:
Mining useful patterns from varied types of databases is an important research topic, which has many reallife applications. Most studies have considered the frequency as sole interestingness measure to identify high-quality patterns. However, each object is different in nature. The relative importance of objects is not equal, in terms of criteria, such as the utility, risk, or interest. Besides, another limitation of frequent patterns is that they generally have a low occupancy, that is, they often represent small sets of items in transactions containing many items and, thus, may not be truly representative of these transactions. To extract high-quality patterns in real-life applications, this paper extends the occupancy measure to also assess the utility of patterns in transaction databases. We propose an efficient algorithm named high-utility occupancy pattern mining (HUOPM). It considers user preferences in terms of frequency, utility, and occupancy. A novel frequency-utility tree and two compact data structures, called the utility-occupancy list and frequency-utility table, are designed to provide global and partial downward closure properties for pruning the search space. The proposed method can efficiently discover the complete set of high-quality patterns without candidate generation. Extensive experiments have been conducted on several datasets to evaluate the effectiveness and efficiency of the roposed algorithm. Results show that the derived patterns are intelligible, reasonable, and acceptable, and that HUOPM with its pruning strategies outperforms the state-of-the-art algorithm, in terms of runtime and search space, respectively.
學(xué)者網(wǎng)

評(píng)論 0