There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
Due to the importance of patent, many studies have been done in patent analysis. However, the problem of finding the hotspots of competitors is seldom considered. Although there exist some hotspot discovery methods in Micro-blog and online public opinion, it is difficult to be directly applied because of the particularity of the patent text. In this paper, we proposed a text-clustering-based patent hotspot discovery method to find the hotspots of competitors. We first measure the similarity between patents by both semantic association and IPC association. After that, we use a hierarchical clustering algorithm to find the research topics and name for them. Then, we calculate the hotness of the technical phrases in order to find the hotspots. Finally, we use a case study of Huawei company to show the effectiveness of the proposed method.
In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, correlations among attributes cannot be captured. In this paper, for base tuples with multiple uncertain attributes, we define attribute level annotation to annotate each attribute. Utilizing these annotations to generate lineages of result tuples can realize more precise derivation. Simultaneously,they can be used for dependency graph construction. Utilizing dependency graph, we can represent not only constraints on schemas but also correlations among attributes. Combining the dependency graph and attribute level lineage, we can correctly compute probabilities of result tuples and precisely derivate data. In experiments, comparing lineage on tuple level and attribute level, it shows that our method has advantages on derivation precision and storage cost.