XDSM - X11-based Distributed Shared Memory System
High performance decision support systems using parallel
processing require distributed processing to overcome the
limitations imposed by single processor systems.
However, when moving from local to distributed
programming, the conventional method of intertask communication by
message passing poses major problems. Message passing forces the
programmer to deal with problems ( multiple message
sources, destinations, transmission protocols, formats )
which can become quite complex in
rapidly evolving distributed systems, especially
if there is no software layer translating the programmers
communication requests into the lower level communication
requests. A general solution to the problem of providing
transparent communication links is provided by using shared memory (figure 1).
Shared memory gives the programmer a shared address space
linking separate processes, separating coding
requirements from the complexity of the data transfer (figure 2).
Figure 1. The Basic Concept of Shared Memory.
Figure 2. The Implimentation Reality of Shared Memory.
Figure 3. The Actual Implimentation of Shared Memory.
Distributed shared memory (DSM) represents shared memory when
applied to separate CPU systems where true shared memory cannot be
supported (figure 3). However, inherent problems associated with loosely
coupled systems have resulted in few widely accepted DSM implementations.
Figure 4. The Distributed Heterogeneous Architecture Support Harness
We have developed a distributed shared memory system
designed for networks of computers with differing architectures (figure 4).
The system was developed for industrial process control
applications requiring extensive computational power, but
involving only moderate interprocess communication. In
particular, it has been used to support the implementation a
large software suite for real time water network monitoring and
control. The system is based on the use of the X11
Windows property events, since this offers the advantage of both
portability and an integrated graphics environment for the
development of graphics user interfaces (figure 5). This is reflected in
its name: the X11-based Distributed Shared Memory system - XDSM.
Figure 5. How X11 Windows can be used to support distributed
XDSM is constructed on the basis of various library modules which
are used for creating graphical user interfaces (GUI's), user
language code interfacing, and GUI message generation.
At the center of an X11 client there is an infinite loop checking
the input event queue for incoming information (figure 6). X11 I/O events
are extracted and processed by XDSM library code on the basis of
whether they may be intended for the X11 GUI tool kit, or are
control area X11 property change events required by the control
module. Reception of relevant property change events causes the
configuration and control module to instruct the data access
module to copy XDSM control data to a local copy of the control
Figure 6. X11 and XDSM main event procesing
Using the XDSM, the following functionality can be provided for
an application suite:
- Definition of shared data areas - with user monitoring and
control of the shared data provided by an interactive GUI
graphics user interface.
- Shared or exclusive access to global data.
- Task coordination and control, including automatic task
start-up and local task control.
- Distributed error handling and recovery - achieved by
maintaining shadow copies of the shared data, and by error
Early work used a network of 4 Sun SC Sparcstations running the
simulation, telemetry, estimation, and operator interface
modules respectively. The cycle time achieved for a 65 node
network was approximately 10 seconds. This is at least an order
of magnitude better than would be expected in real life.
Projecting the results for larger networks, it was expected that
the communications overhead would, at worst, increase linearly
with network size, so the communication to computation time
ratio would actually decrease.
Detailed performance tests were done using SLC Sparcstations
with 16 Mb RAM, each connected by a 10 Mb rated ethernet. The
tests used from 1 to 5 clients, each on a separate Sparcstation
using asynchronous TCP/IP communication sockets. Testing
consisted of clients reading or writing 1 megabyte of data using
various sizes of shared memory block to a central server. The
timings from clients using TCP/IP where compared with clients
using XDSM. It was found that the X11/XDSM communication system
compares favorably with the simpler socket based communication
system with respect to timings, with a 32 Kilobyte optimum.
However, very small shared memory sizes should be avoided (0 to
8 Kilobytes). Indeed, other researchers work using small
blocksizes (0 to 2 Kilobytes) suggests that 1 Kilobyte
represents a performance minimum. We consider that
the user and TCP level transmission overheads associated with
small datagrams may outweigh the lower IP level fragmentation
overheads of larger datagrams. The communications protocol
required to support mutual exclusion imposes asignificant
overhead, depending on the nature of the algorithm used.
Within the limited network size range studied it was arguably
demonstrated that the communications overhead increased linearly
with network size. It was also observed that distributed
communicationsare considerably faster than local communications
(unpublished results). This may be attributed to local
bottlenecking due to competitive access needs in the underlying
TCP/IP communication system.
XDSM has been used to successfully facilitate the distributed
implementation of a decision support system for the monitoring
and control of water distribution networks. The suite comprises
network and telemetry simulators, state estimators and an
operator interface, configured to provide a classical feedback
control loop. Other modules, which are scheduled for future
incorporation into the suite, are concerned with optimal control
and telemetry confidence limit analysis.
- Argile A., Peytchev E., Bargiela A., Kosonen I., DIME: A Shared Memory Environment for Distributed Simulation, Monitoring and Control of Urban Traffic, Proceedings of European Simulation Symposium ESS'96, Genoa, 1996, pp.152-156, PDF
- Argile A., Bargiela A., XDSM - an X11 based virtual distributed shared memory system, Proc. of 2nd Int. Conf. on Software for Supercomputers and Multiprocessors, SMS-94, Moscow, 1994, pp. 250-259, PDF
- Argile A., Bargiela A., Using the X11 Windows System to Support the Bulk Synchronous Model of Parallel Computation, Proc. IFIP/IEEE Int. Conf. on Distributed Platforms, Dresden, 1996, ISBN 3 86012 023 9, pp. 308-313.
- Argile A., Bargiela A., Using X11 Windows to provide shared task memory in distributed computer systems, in Integrated Computer Applications, Ed. Coulbeck B., Vol. 1, J Wiley, 1993, ISBN 0 471 94232 4, pp. 305-316.
- Bargiela A., Parallel and distributed telemetry data processing, Proceedings of Parallel Computing and Transputers Conference, PCAT’93, Brisbaine, 1993, ISBN 90 5199 1495, pp. 269-275.
- Osman T., Bargiela A., Error detection for reliable distributed simulations, Proc. of European Simulation Symposium, Proc. of 7th European Simulation Symposium ESS’95, Erlangen-Nuremberg, October 1995, ISBN 1-56555-083-8, pp. 358-362, PDF
- Osman T., Bargiela A., FADI: A fault tolerant environment for open distributed computing, IEE Proceedings Software , vol. 147, no. 3, June 2000, pp. 91-99, doi:10.1049/ip-sen:20000702, PDF
- Osman T., Bargiela A., Reliable Distributed Computing for Decision Support Systems, Measurement and Control , Vol. 32, No. 4, May 1999, pp.115-118
- Wagealla W., Osman T. Bargiela A., Error detection algorithm for agent-based distributed applications, Proc. Agent-Based Simulation Workshop 2001, Passau, Germany May, 2001
- Osman T., Bargiela A., Process Checkpointing in an Open Distributed Environment, Proc. of European Simulation Multiconference, ESM'97, Istambul, June 1997, ISBN 1-565555-115 X, pp. 536-540, PDF
Last update: 7/11/97