Evaluating a Dependable Distributed System with Multiple Critical Tasks

Yinong Chen    and    Zhongshi He*
Highly Dependable Systems Research Programme
University of the Witwatersrand , Johannesburg, South Africa
{yinong, zhe}@cs.wits.ac.za
http://www.cs.wits.ac.za/research/programme.html



Abstract

The aim of our research is to develop a distributed system that support a variety of tasks. Currently, we are implementing Internet applications on the system, including firewall, web and mail applications. These applications have different levels of dependability requirements. Depending on their criticality, a single task may execute on one, two or more computer nodes. Fault tolerant protocols are used to detect the disagreement among replicas. A reconfiguration protocol is used to identify the faulty nodes according to the fault reports from the fault tolerant protocols. It then isolates the faulty nodes from the system and reallocates their tasks to other working nodes. As a part of the project, this work focuses on the dependability analysis. The dependability attributes under modelling are the reliability of the system and the risk that an unacceptable packet is accepted in the firewall application.
Keywords: reliability, risk, modelling, fault tolerance.

1. Introduction
Dependability has been defined as the property of a computer system such that reliance can justifiably be placed on the service it delivers [3]. Different kinds of software and hardware dependable techniques have been developed to produce various kinds of highly dependable systems with different dependability attributes, including
( Reliability:  The property of continuity regarding service delivery. Reliability is denoted by R(t), which is the probability that no failure occurs in the time period  [0, t].
( Availability:  The property of readiness regarding service delivery.
( Safety:  The property of non-occurrence of catastrophic consequence due to a computer failure.
( Confidentiality:  The property of non-occurrence of unauthorized disclosure of information.
( Integrity:  The property of non-occurrence of improper alteration of information.
( Security:  The property of system availability, information confidentiality and integrity.
    Traditionally, highly dependable systems are used in safety-critical control and monitoring systems like nuclear reactors, flight control and traffic scheduling, etc. The recent development of using commercial off-the-shelf components to build dependable systems has greatly encouraged the use of dependable computing techniques in cost-sensitive commercial systems. Internet and e-commerce are good examples of such systems. Services provided via Internet are not safety-critical, at least at this stage. It has, however, become business-critical. In South Africa, all major banks have offered Internet-based services to their clients. The banks heavily rely on the correct, secure and continuous operation of their servers. Another example is the traffic cache server in Internet Service Providers (ISPs). A large portion of South African Internet usage is related to overseas accesses. A cost-effective approach for ISPs is to cache information in the traffic servers. Highly dependable cache servers are extremely important for ISPs. 
Supported by South African National Research Foundation and a local ISP, we have been working on developing a dependable distributed system since 1996. The distributed system uses software and hardware replication to support dependable Internet applications. Reliability, availability and security of the system are main attributes to be studied. The design of the distributed system is given in [1] and the prototype of the system will be described in [3]. Verification of core parts and applications of the system are outlined in [2]. This work extends our research to dependability modelling in a multitasking environment. 
In the next section, we describe the requirements and the design objectives of the system. Section 3 briefly outlines the design and the implementation of the system aimed at meeting these objectives., and discusses the fault-tolerant protocols.  section 4 examines the dependability models and shows the evaluation results related to system reliability and risk. Section 5 concludes the paper.
5. Concluding Remarks
    We have been working on the experimental distributed system since 1996. The focus was put on the operating system and the fault-tolerant protocol layers. Recently we have moved our research on implementing multiple tasks on the system [2]. This work explores the relationship between the dependability requirements and fault-tolerant protocols. Fail-safe (risk) and reliability are dependability requirements, and comparison and voting protocols are fault-tolerant techniques under consideration. The modelling results show that task duplication supported by the comparison protocol is simply the most effective technique for fail-safe applications like firewalls. Traditional triple modular redundancy can secure higher reliability but is not suitable for fail-safe applications. We also studied the cooperation of multiple tasks with different levels of redundancy in the distributed system.
References
[1] Y. Chen, "On development of a dependable distributed system", Proc. of the 1998 IFIP International Workshop on Dependable Computing and its Applications, Johannesburg, January 1998, pp. 83 - 96.
[2] Y. Chen, V. Galpin, S. Hazelhurst, R. Mateer, and C. Mueller, "Modelling software development of a decentralised virtual service redirector for Internet applications", The 7th IEEE Workshop on Future Trends of Distributed Computing Systems, Cape Town, December 1999, pp.235 - 241
[3] J. -C. Laprie, "Dependable computing and fault tolerance: Concept and terminology", IEEE 15th Annual int'l symposium on fault-tolerant computing (FTCS-15), Michigan, June 1985, pp. 1 - 11.
[4] R.I. Mateer; and Y. Chen, "Highly-Available Firewall Service using Virtual Redirectors", Technical Report TR-Wits-CS-1999-11, August,1999.www.cs.wits.ac.za/research/pubs.html.


*Dr. He is a faculty member of Chongqing University, P.R. China, and is currently a visiting scholar at the University of Witwatersrand.
1