This research is focused on a development and generalisation of Kron's method of diakoptics applied to nonlinear systems. The nonlinear systems need to be solved subject to a constraint that the corrections to partial solutions have restricted magnitude so that the local mathematical models of subsystems are not invalidated in the process. The system is solved as a sequence of Newton-Raphson iterations with each iterative correction calculated as diakoptical coordinated solution to linearised subsystem. In order to facilitate an efficient use of a parallel processing hardware, the algorithm has been designed to minimise the data traffic between the subsystem solvers and the coordinating task.
Let's consider a typical network of N nodes, with each node connected to 4 other nodes in the network. A transfer of nonzero elements of the Jacobian matrix from Task-A to Task-B would involve sending 5xN real values together with associated pointers, resulting in a transfer of at least 11xNx4 bytes of data (if a sparse storage structure such as pointer-column-index is used, or 15xNx4 bytes if it is not). Instead, in our implementation, the Jacobian matrices are derived locally within Task-B from network topological data, which require sending of 6xNx4 bytes of data. Apart from an obvious reduction of a data traffic an additional benefit of concurrent evaluation of Jacobian matrices has been achieved.
The coordination process, equation (24), requires the knowledge of the inverse of Jacobians. If implemented directly, this would imply the need to send from Task-B to Task-A 3xNxNx4 bytes of data! This problem had been noted by other researchers [2] but no effective solution was proposed at the time. Our implementation resolves the problem by stipulating Task-B to evaluate explicitly selected columns of inverse Jacobian which is all that is needed by Task-A. By doing so, the required data transfer is reduced to 3xNxCx4 bytes; where C is a number of inter-subnetwork connections and is much smaller than N. The above modification resulted in a reduction of data traffic between Task-B and Task-A by a factor of 10 to 100 for a typical large scale system.
Testing the program on real-life networks, it has been found that the time required by Task-B depends quasi-linearily on the number of nodes in the subnetwork, and the time required by Task-A depends quadratically on the number of inter-subnetwork connections. Since the number of inter-subnetwork connections does not depend on the network size but only on the number of subsystems and the average node connectivity, the algorithm offers a potential of significant concurrent processing gains for large scale systems.
With the reduction of the volume of transferred data, the time spent on communication constituted approximately 1% of total processing time for a 100 node system. As the volume of data transmitted between two tasks is linearily proportional to the network size, the data transfer time will continue to be negligible.