Testing for Software Reliability Yinong Chen Programme for Highly Dependable Systems (PHDS) University of the Witwatersrand yinong@cs.wits.ac.za This article introduces different software testing techniques that have been used to evaluate of software reliability. Testing is the process of executing a program with the intention of finding design errors in a given environment. Testing can only prove the incorrectness of software but not its correctness. Some people have argued that testing is useless because what we want is the correctness of software instead of its incorrectness. This idea has led to the research of techniques for developing correct software and proving software correctness. It has been defined that a program is correct, if it meets its specification for all valid inputs. The reliability of a program is 1 if it is correct and 0 if it is incorrect, taken into account that a correct program will not be worn out. Unfortunately, progress in this direction is far from the stage where techniques for developing error-free software or proving software correctness can be used in practice to handle today's large software systems. Obviously, a more strict development process can reduce the possibility of design errors, but it can never guarantee 100% error-free of software, as long as software development is engineered by humans. Since none of today's large software systems are error-free, it doesn't make sense to define software reliability according to the correctness. A more practical approach is to define software reliability as the probability that no failure (an incorrect output) occurs in a specified environment, during a specified exposure period. For some programs the appropriate time unit of exposure period is the calendar or CPU time, and for other programs the appropriate time unit of exposure period is an operational run corresponding with the selection of an input from the input domain of the program. This definition implies that software reliability is evaluated during the execution of a program in a specific environment either during testing or during operation. Different kinds of reliability models are available to evaluate software reliability. According to the nature of failure process, Goel [1] classified reliability models into four kinds: input domain-based models, times-between-failures models, failure-count models and fault seeding models. Following sections discuss how testing techniques can be used in software reliability models. Random Testing The first software reliability model is input-domain based with random selection of inputs, i.e., it randomly generates inputs for testing. Assume that n inputs have been used for testing in which f tests have detected failures. The reliability of the program under testing in an operational run is evaluated by: The reliability in t operational runs is then: ...formula... The problem with random testing is that we don't know whether all parts of the program have been tested. If the random generation of inputs is uneven for some reason, we may end up testing only a small part of the program, no matter how many tests have been executed. Partition Testing Partition testing tries to address the problem of random testing by dividing the input domain into m classes. The critical part of partition testing is to divide the input domain in such a way that inputs in different classes would execute different parts of the program. For example, we can put all inputs that will execute the same path in the program flowchart into one class. Suppose ni inputs are generated from class i, where ni „ 1, and fi failures are detected when these inputs are applied for testing, then the reliability of the program under testing is evaluated by: ...formula... and ...formula... Normally, partition testing is conducted with rounds of testing. In each testing round, m tests are executed with one input selected from each class. Input selection within the class is still random. After k rounds of testing is conducted, the reliability is ...formula... and ...formula... where gj is the number of failures detected in round j. Let's consider the two extreme cases of partition testing. Case 1: If the number of classes is equal to 1, partition testing is in fact random testing. Case 2: If the number of classes is equal to the number of inputs, i.e., each input forms a class, partition testing is equivalent to exhaustive testing that uses all possible inputs for testing. Testing and Operational Profiles The most generic way of generating inputs for testing is to use a probability density function on the input domain to guide the input generation. In other words, each input in the input domain is associated to a probability with which the input is selected for testing. The sum of probability values associated with all possible inputs is equal to 1, therefore the probability distribution forms a probability density function. This function is called testing profile. Similarly, during the application (operation) of a program, there is a probability that associates with each input. The probability distribution forms a density function which is called the operational profile of a program. The design of a testing profile should be guided by the operational profile of the program under testing to ensure that the heavily used operations receive the most testing. Musa [2] described a step by step procedure to develop a practical operational profile for testing purpose. In some cases it is not easy to obtain or to follow the actual operational profile. In some cases, operative usage of software can be unpredictable, unknown, or different for different users. Even if a testing profile is given, testing may not be able to follow the given profile due to the non-determinism of input generation to, say, cover a certain program path. This problem has been addressed by some researchers through theoretical and experimental approaches to sensitivity study of operational profile errors and how to correct the errors [3, 4]. Reliability Growth with Error Corrections Techniques discussed above don't include the effect of error corrections performed during testing. A program is considered as a new program and has to be re-tested and re-evaluated after each error correction. The techniques that take error correction history into account are called reliability growth models. The idea of these models is to measure the change of time intervals between failures when error corrections are performed. These kinds of models are much more complex than the models that don't deal with error corrections. Current reliability growth models cannot accurately predicate the reliability of software in its operation. Researchers and testing practitioners worldwide are working together to find an ultimate solution. Reliability is an important aspect of software quality. A more comprehensive merit is called dependability, that includes reliability, availability, safety and security. The Programme for Highly Dependable Systems is a research group working on software and system dependability. We investigate all aspects that can improve the dependability of computer systems. We are working on formal methods for software specification and verification, software testing and dependability evaluation. Our current projects are related to building reliable and secure network software, including task allocation, traffic redirection and firewall systems. References [1] Goel85 A. L. Goel, Software reliability models: assumptions, limitations, and applicability, IEEE, Trans. Soft. Eng., SE-11, No. 12, Dec. 1985, pp. 1411 - 1423. [2] J.D. Musa, Operational Profiles in Software-Reliability Engineering, IEEE Software, Vol.10, No.2, March 1993. [3] Y. Chen, Modelling Software Operational Reliability via Input Domain-Based Reliability Growth Model, IEEE 28th Annual International Symposium on Fault-Tolerant Computing, Munich, June 1998, pp.314 - 323. [4] A.N. Crespo, P. Matrella, A. Pasquini, Sensitivity of reliability growth models to operational profile error, in Proc. 7th International Symposium on Software Reliability Engineering, IEEE Computer Society Press, October 1996, pp. 35 - 44.