Weka provides a large collection of machine learning algorithms written in Java for data pre-processing, classification, clustering, association rules, and visualization, which can be invoked through a common graphical user interface. In Weka, the overall data mining process takes place on a single machine, since the algorithms can be executed only locally.
The goal of Weka4WS is to extend Weka to support remote execution of the data mining algorithms through WSRF Web Services. In such a way, distributed data mining tasks can be concurrently executed on decentralized Grid nodes by exploiting data distribution and improving application performance. In Weka4WS, the data mining algorithms for classification, clustering and association rules can be also executed on remote Grid resources. To enable remote invocation, all the data mining algorithms provided by the Weka library are exposed as a Web Service, which can be easily deployed on the available Grid nodes. Thus, Weka4WS also extends the Weka GUI to enable the invocation of the data mining algorithms that are exposed as Web Services on remote Grid nodes.
To achieve integration and interoperability with standard Grid environments, Weka4WS has been designed by using the Web Services Resource Framework (WSRF) as enabling technology. In particular, Weka4WS has been developed by using the WSRF Java library provided by Globus Toolkit 4.0.x (GT4).
In the Weka4WS framework all nodes use the GT4 services for standard Grid functionalities, such as security and data management. Those nodes can be distinguished in two categories:
- user nodes, which are the local machines of the users providing the Weka4WS client software;
- computing nodes, which provide the Weka4WS Web Services allowing the execution of remote data mining tasks.
Weka4WS is therefore distributed in two separated packages:
- Weka4WS-client, which contains the client software (including the extended Weka GUI) to be installed on the user nodes;
- Weka4WS-service, which contains the WSRF-compliant Web Services to be installed on the computing nodes.