\title{A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering} \author{Judith Zimmermann\\ Research Group `Algorithms, Data Structures, and Applications'\\ Institute of Theoretical Computer Science,\\ ETH Zurich, CH-8092 Zurich\\ \ema{judithz@inf.ethz.ch}\and Zsuzsanna Lipt\'ak\\ Universit\"at Bielefeld\\Technische Fakult\"at\\ AG Genominformatik\\ 33594 Bielefeld, Germany.\\ \ema{zsuzsa@CeBiTec.Uni-Bielefeld.DE} \and Scott Hazelhurst\\ School of Computer Science\\ University of the Witwatersrand\\ Johannesburg\\ South Africa\\ \ema{scott@cs.wits.ac.za} } \begin{abstract} We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim ({\em EST Simulator}), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST ({\em Evaluator for CLusterings of ESTs}), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones. \end{abstract}