Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.
We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR)-based query approaches, especially for continuous queries.
In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concept attribute,context attribute and meaningless attribute,according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances,expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.
In this paper we propose a Filter-based Uniform Algorithm (FbUA) for optimizing top-κ query in distributed networks, which has been a topic of much recent interest. The basic idea of FhUA is to set a filter at each node to pre vent it from sending out the data with little chance to contrib ute to the top-κ result. FbUA can gain exact answers to top-κ query through two phrases of round trip communications between query station and participant nodes. The experiment results show that FbUA reduces network bandwidth consumption dramatically.
ZHAO Zhibin YAO Lan YANG Xiaochun LI Binyang YU Ge