A web service wrapping approach for command line programs,which are commonly used in scientific computing,is proposed.First,software architecture for a basic web service wrapper implementation is given and the functions of the main components are explained.Then after a comprehensive analysis of data transmission and a job life cycle model,a novel proactive file transmission and job management mechanism is devised to enhance the software architecture,and the command line programs are wrapped into web services in such a way that they can efficiently transmit files,supply instant status feedback and automatically manage the jobs.Experiments show that the proposed approach achieves higher performance with less memory usage compared to the related work, and the usability is also improved.This work has already been put into use in a production system of scientific computing and the data processing efficiency of the system is greatly improved.
To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.