do not necessarily reflect the views of UKDiss.com.

IMPROVING PROFESSIONAL PRACTICE FOR MOBILE OPERATORS THROUGH THE DEVELOPMENT OF A DECISION SUPPORT TOOL FOR OUTSOURCING

### Decision Tree for Prognostic Classification of Multivariate Survival Data and Competing Risks

### 1. Introduction

Decision tree (DT) is one way to represent rules underlying data. It is the most popular tool for exploring complex data structures. Besides that it has become one of the most flexible, intuitive and powerful data analytic tools for determining distinct prognostic subgroups with similar outcome within each subgroup but different outcomes between the subgroups (i.e., prognostic grouping of patients). It is hierarchical, sequential classification structures that recursively partition the set of observations. Prognostic groups are important in assessing disease heterogeneity and for design and stratification of future clinical trials. Because patterns of medical treatment are changing so rapidly, it is important that the results of the present analysis be applicable to contemporary patients.

Due to their mathematical simplicity, linear regression for continuous data, logistic regression for binary data, proportional hazard regression for censored survival data, marginal and frailty regression for multivariate survival data, and proportional subdistribution hazard regression for competing risks data are among the most commonly used statistical methods. These parametric and semiparametric regression methods, however, may not lead to faithful data descriptions when the underlying assumptions are not satisfied. Sometimes, model interpretation can be problematic in the presence of high-order interactions among predictors.

DT has evolved to relax or remove the restrictive assumptions. In many cases, DT is used to explore data structures and to derive parsimonious models. DT is selected to analyze the data rather than the traditional regression analysis for several reasons. Discovery of interactions is difficult using traditional regression, because the interactions must be specified a priori. In contrast, DT automatically detects important interactions. Furthermore, unlike traditional regression analysis, DT is useful in uncovering variables that may be largely operative within a specific patient subgroup but may have minimal effect or none in other patient subgroups. Also, DT provides a superior means for prognostic classification. Rather than fitting a model to the data, DT sequentially divides the patient group into two subgroups based on prognostic factor values (e.g., tumor size < 2 cm vs tumor size ³ 2 cm). The repeated partitioning creates “bins” of observations that are approximately homogeneous. This permits the use of some summary functions (e.g., Kaplan-Meier or cumulative incidence function (CIF)) to compare prognosis between the “bins.” The combination of binning and the interpretability of the resulting tree structure make DT extremely well suited for developing prognostic stratifications.

The landmark work of DT in statistical community is the Classification and Regression Trees (CART) methodology of Breiman et al. (1984). A different approach was C4.5 proposed by Quinlan (1992). Original DT method was used in classification and regression for categorical and continuous response variable, respectively. In a clinical setting, however, the outcome of primary interest is often duration of survival, time to event, or some other incomplete (that is, censored) outcome. Therefore, several authors have developed extensions of original DT in the setting of censored survival data (Banerjee & Noone, 2008).

In science and technology, interest often lies in studying processes which generate events repeatedly over time. Such processes are referred to as recurrent event processes and the data they provide are called recurrent event data which includes in multivariate survival data. Such data arise frequently in medical studies, where information is often available on many individuals, each of whom may experience transient clinical events repeatedly over a period of observation. Examples include the occurrence of asthma attacks in respirology trials, epileptic seizures in neurology studies, and fractures in osteoporosis studies. In business, examples include the filing of warranty claims on automobiles, or insurance claims for policy holders. Since multivariate survival times frequently arise when individuals under observation are naturally clustered or when each individual might experience multiple events, then further extensions of DT are developed for such kind of data.

In some studies, patients may be simultaneously exposed to several events, each competing for their mortality or morbidity. For example, suppose that a group of patients diagnosed with heart disease is followed in order to observe a myocardial infarction (MI). If by the end of the study each patient was either observed to have MI or was alive and well, then the usual survival techniques can be applied. In real life, however, some patients may die from other causes before experiencing an MI. This is a competing risks situation because death from other causes prohibits the occurrence of MI. MI is considered the event of interest, while death from other causes is considered a competing risk. The group of patients’ dead of other causes cannot be considered censored, since their observations are not incomplete.

The extension of DT can also be employed for competing risks survival time data. These extensions can make one apply the technique to clinical trial data to aid in the development of prognostic classifications for chronic diseases.

This chapter will cover DT for multivariate and competing risks survival time data as well as their application in the development of medical prognosis. Two kinds of multivariate survival time regression model, i.e. marginal and frailty regression model, have their own DT extensions. Whereas, the extension of DT for competing risks has two types of tree. First, the “single event” DT is developed based on splitting function using one event only. Second, the “composite events” tree which use all the events jointly.

### 2. Decision Tree

A DT is a tree-like structure used for classification, decision theory, clustering, and prediction functions. It depicts rules for dividing data into groups based on the regularities in the data. A DT can be used for categorical and continuous response variables. When the response variables are continuous, the DT is often referred to as a regression tree. If the response variables are categorical, it is called a classification tree. However, the same concepts apply to both types of trees. DTs are widely used in computer science for data structures, in medical sciences for diagnosis, in botany for classification, in psychology for decision theory, and in economic analysis for evaluating investment alternatives.

DTs learn from data and generate models containing explicit rule-like relationships among the variables. DT algorithms begin with the entire set of data, split the data into two or more subsets by testing the value of a predictor variable, and then repeatedly split each subset into finer subsets until the split size reaches an appropriate level. The entire modeling process can be illustrated in a tree-like structure.

A DT model consists of two parts: creating the tree and applying the tree to the data. To achieve this, DTs use several different algorithms. The most popular algorithm in the statistical community is Classification and Regression Trees (CART) (Breiman et al., 1984). This algorithm helps DTs gain credibility and acceptance in the statistics community. It creates binary splits on nominal or interval predictor variables for a nominal, ordinal, or interval response. The most widely-used algorithms by computer scientists are ID3, C4.5, and C5.0 (Quinlan, 1993). The first version of C4.5 and C5.0 were limited to categorical predictors; however, the most recent versions are similar to CART. Other algorithms include Chi-Square Automatic Interaction Detection (CHAID) for categorical response (Kass, 1980), CLS, AID, TREEDISC, Angoss KnowledgeSEEKER, CRUISE, GUIDE and QUEST (Loh, 2008). These algorithms use different approaches for splitting variables. CART, CRUISE, GUIDE and QUEST use the statistical approach, while CLS, ID3, and C4.5 use an approach in which the number of branches off an internal node is equal to the number of possible categories. Another common approach, used by AID, CHAID, and TREEDISC, is the one in which the number of nodes on an internal node varies from two to the maximum number of possible categories. Angoss KnowledgeSEEKER uses a combination of these approaches. Each algorithm employs different mathematical processes to determine how to group and rank variables.

Let us illustrate the DT method in a simplified example of credit evaluation. Suppose a credit card issuer wants to develop a model that can be used for evaluating potential candidates based on its historical customer data. The company’s main concern is the default of payment by a cardholder. Therefore, the model should be able to help the company classify a candidate as a possible defaulter or not. The database may contain millions of records and hundreds of fields. A fragment of such a database is shown in Table 1. The input variables include income, age, education, occupation, and many others, determined by some quantitative or qualitative methods. The model building process is illustrated in the tree structure in 1.

The DT algorithm first selects a variable, income, to split the dataset into two subsets. This variable, and also the splitting value of $31,000, is selected by a splitting criterion of the algorithm. There exist many splitting criteria (Mingers, 1989). The basic principle of these criteria is that they all attempt to divide the data into clusters such that variations within each cluster are minimized and variations between the clusters are maximized.

### The follow-

Name

Age

Income

Education

Occupation

…

Default

Andrew

42

45600

College

Manager

…

No

Allison

26

29000

High School

Self Owned

…

Yes

Sabrina

58

36800

High School

Clerk

…

No

Andy

35

37300

College

Engineer

…

No

…

…

…

…

…

…

…

### Table 1. Partial records and fields of a database table for credit evaluation

up splits are similar to the first one. The process continues until an appropriate tree size is reached. 1 shows a segment of the DT. Based on this tree model, a candidate with income at least $31,000 and at least college degree is unlikely to default the payment; but a self-employed candidate whose income is less than $31,000 and age is less than 28 is more likely to default.

We begin with a discussion of the general structure of a popular DT algorithm in statistical community, i.e. CART model. A CART model describes the conditional distribution of y given X, where y is the response variable and X is a set of predictor variables (X = (X1,X2,…,Xp)). This model has two main components: a tree T with b terminal nodes, and a parameter Q = (q1,q2,…, qb) Ì Rk which associates the parameter values qm, with the mth terminal node. Thus a tree model is fully specified by the pair (T, Q). If X lies in the region corresponding to the mth terminal node then y|X has the distribution f(y|qm), where we use f to represent a conditional distribution indexed by qm. The model is called a regression tree or a classification tree according to whether the response y is quantitative or qualitative, respectively.

### 2.1 Splitting a tree

The DT T subdivides the predictor variable space as follows. Each internal node has an associated splitting rule which uses a predictor to assign observations to either its left or right child node. The internal nodes are thus partitioned into two subsequent nodes using the splitting rule. For quantitative predictors, the splitting rule is based on a split rule c, and assigns observations for which {xi < c} or {xi ³ c} to the left or right child node respectively. For qualitative predictors, the splitting rule is based on a category subset C, and assigns observations for which {xi Î C} or {xi Ï C} to the left or right child node, respectively.

For a regression tree, conventional algorithm models the response in each region Rm as a constant qm. Thus the overall tree model can be expressed as (Hastie et al., 2001):

(1)

where Rm, m = 1, 2,…,b consist of a partition of the predictors space, and therefore representing the space of b terminal nodes. If we adopt the method of minimizing the sum of squares as our criterion to characterize the best split, it is easy to see that the best , is just the average of yi in region Rm:

(2)

where Nm is the number of observations falling in node m. The residual sum of squares is

(3)

which will serve as an impurity measure for regression trees.

If the response is a factor taking outcomes 1,2, … K, the impurity measure Qm(T), defined in (3) is not suitable. Instead, we represent a region Rm with Nm observations with

(4)

which is the proportion of class k(k Î {1, 2,…,K}) observations in node m. We classify the observations in node m to a class , the majority class in node m. Different measures Qm(T) of node impurity include the following (Hastie et al., 2001):

Misclassification error:

Gini index:

Cross-entropy or deviance:

(5)

For binary outcomes, if p is the proportion of the second class, these three measures are 1 – max(p, 1 – p), 2p(1 – p) and -p log p – (1 – p) log(1 – p), respectively.

All three definitions of impurity are concave, having minimums at p = 0 and p = 1 and a maximum at p = 0.5. Entropy and the Gini index are the most common, and generally give very similar results except when there are two response categories.

### 2.2 Pruning a tree

To be consistent with conventional notations, let’s define the impurity of a node h as I(h) ((3) for a regression tree, and any one in (5) for a classification tree). We then choose the split with maximal impurity reduction

(6)

where hL and hR are the left and right children nodes of h and p(h) is proportion of sample fall in node h.

How large should we grow the tree then? Clearly a very large tree might overfit the data, while a small tree may not be able to capture the important structure. Tree size is a tuning parameter governing the model’s complexity, and the optimal tree size should be adaptively chosen from the data. One approach would be to continue the splitting procedures until the decrease on impurity due to the split exceeds some threshold. This strategy is too short-sighted, however, since a seeming worthless split might lead to a very good split below it.

The preferred strategy is to grow a large tree T0, stopping the splitting process when some minimum number of observations in a terminal node (say 10) is reached. Then this large tree is pruned using pruning algorithm, such as cost-complexity or split complexity pruning algorithm.

To prune large tree T0 by using cost-complexity algorithm, we define a subtree T T0 to be any tree that can be obtained by pruning T0, and define to be the set of terminal nodes of T. That is, collapsing any number of its terminal nodes. As before, we index terminal nodes by m, with node m representing region Rm. Let denotes the number of terminal nodes in T (= b). We use instead of b following the “conventional” notation and define the risk of trees and define cost of tree as

Regression tree: ,

Classification tree: ,

(7)

where r(h) measures the impurity of node h in a classification tree (can be any one in (5)).

We define the cost complexity criterion (Breiman et al., 1984)

(8)

where a(> 0) is the complexity parameter. The idea is, for each a, find the subtree Ta T0 to minimize Ra(T). The tuning parameter a > 0 “governs the tradeoff between tree size and its goodness of fit to the data” (Hastie et al., 2001). Large values of a result in smaller tree Ta and conversely for smaller values of a. As the notation suggests, with a = 0 the solution is the full tree T0.

To find Ta we use weakest link pruning: we successively collapse the internal node that produces the smallest per-node increase in R(T), and continue until we produce the single-node (root) tree. This gives a (finite) sequence of subtrees, and one can show this sequence must contains Ta. See Brieman et al. (1984) and Ripley (1996) for details. Estimation of a () is achieved by five- or ten-fold cross-validation. Our final tree is then denoted as .

It follows that, in CART and related algorithms, classification and regression trees are produced from data in two stages. In the first stage, a large initial tree is produced by splitting one node at a time in an iterative, greedy fashion. In the second stage, a small subtree of the initial tree is selected, using the same data set. Whereas the splitting procedure proceeds in a top-down fashion, the second stage, known as pruning, proceeds from the bottom-up by successively removing nodes from the initial tree.

Theorem 1 (Brieman et al., 1984, Section 3.3) For any value of the complexity parameter a, there is a unique smallest subtree of T0 that minimizes the cost-complexity.

Theorem 2 (Zhang & Singer, 1999, Section 4.2) If a2 > al, the optimal sub-tree corresponding to a2 is a subtree of the optimal subtree corresponding to al.

More general, suppose we end up with m thresholds, 0 < al < a2 < … < am and let a0= 0. Also, let corresponding optimal subtrees be , then

(9)

where means that is a subtree of . These are called nested optimal subtrees.

### 3. Decision Tree for Censored Survival Data

Survival analysis is the phrase used to describe the analysis of data that correspond to the time from a well-defined time origin until the occurrence of some particular events or end-points. It is important to state what the event is and when the period of observation starts and finish. In medical research, the time origin will often correspond to the recruitment of an individual into an experimental study, and the end-point is the death of the patient or the occurrence of some adverse events. Survival data are rarely normally distributed, but are skewed and comprise typically of many early events and relatively few late ones. It is these features of the data that necessitate the special method survival analysis.

The specific difficulties relating to survival analysis arise largely from the fact that only some individuals have experienced the event and, subsequently, survival times will be unknown for a subset of the study group. This phenomenon is called censoring and it may arise in the following ways: (a) a patient has not (yet) experienced the relevant outcome, such as relapse or death, by the time the study has to end; (b) a patient is lost to follow-up during the study period; (c) a patient experiences a different event that makes further follow-up impossible. Generally, censoring times may vary from individual to individual. Such censored survival time underestimated the true (but unknown) time to event. Visualising the survival process of an individual as a time-line, the event (assuming it is to occur) is beyond the end of the follow-up period. This situation is often called right censoring. Most survival data include right censored observation.

In many biomedical and reliability studies, interest focuses on relating the time to event to a set of covariates. Cox proportional hazard model (Cox, 1972) has been established as the major framework for analysis of such survival data over the past three decades. But, often in practices, one primary goal of survival analysis is to extract meaningful subgroups of patients determined by the prognostic factors such as patient characteristics that are related to the level of disease. Although proportional hazard model and its extensions are powerful in studying the association between covariates and survival times, usually they are problematic in prognostic classification. One approach for classification is to compute a risk score based on the estimated coefficients from regression methods (Machin et al., 2006). This approach, however, may be problematic for several reasons. First, the definition of risk groups is arbitrary. Secondly, the risk score depends on the correct specification of the model. It is difficult to check whether the model is correct when many covariates are involved. Thirdly, when there are many interaction terms and the model becomes complicated, the result becomes difficult to interpret for the purpose of prognostic classification. Finally, a more serious problem is that an invalid prognostic group may be produced if no patient is included in a covariate profile. In contrast, DT methods do not suffer from these problems.

Owing to the development of fast computers, computer-intensive methods such as DT methods have become popular. Since these investigate the significance of all potential risk factors automatically and provide interpretable models, they offer distinct advantages to analysts. Recently a large amount of DT methods have been developed for the analysis of survival data, where the basic concepts for growing and pruning trees remain unchanged, but the choice of the splitting criterion has been modified to incorporate the censored survival data. The application of DT methods for survival data are described by a number of authors (Gordon & Olshen, 1985; Ciampi et al., 1986; Segal, 1988; Davis & Anderson, 1989; Therneau et al., 1990; LeBlanc & Crowley, 1992; LeBlanc & Crowley, 1993; Ahn & Loh, 1994; Bacchetti & Segal, 1995; Huang et al., 1998; KeleÅŸ & Segal, 2002; Jin et al., 2004; Cappelli & Zhang, 2007; Cho & Hong, 2008), including the text by Zhang & Singer (1999).

### 4. Decision Tree for Multivariate Censored Survival Data

Multivariate survival data frequently arise when we faced the complexity of studies involving multiple treatment centres, family members and measurements repeatedly made on the same individual. For example, in multi-centre clinical trials, the outcomes for groups of patients at several centres are examined. In some instances, patients in a centre might exhibit similar responses due to uniformity of surroundings and procedures within a centre. This would result in correlated outcomes at the level of the treatment centre. For the situation of studies of family members or litters, correlation in outcome is likely for genetic reasons. In this case, the outcomes would be correlated at the family or litter level. Finally, when one person or animal is measured repeatedly over time, correlation will most definitely exist in those responses. Within the context of correlated data, the observations which are correlated for a group of individuals (within a treatment centre or a family) or for one individual (because of repeated sampling) are referred to as a cluster, so that from this point on, the responses within a cluster will be assumed to be correlated.

Analysis of multivariate survival data is complex due to the presence of dependence among survival times and unknown marginal distributions. Multivariate survival times frequently arise when individuals under observation are naturally clustered or when each individual might experience multiple events. A successful treatment of correlated failure times was made by Clayton and Cuzik (1985) who modelled the dependence structure with a frailty term. Another approach is based on a proportional hazard formulation of the marginal hazard function, which has been studied by Wei et al. (1989) and Liang et al. (1993). Noticeably, Prentice et al. (1981) and Andersen & Gill (1982) also suggested two alternative approaches to analyze multiple event times.

Extension of tree techniques to multivariate censored data is motivated by the classification issue associated with multivariate survival data. For example, clinical investigators design studies to form prognostic rules. Credit risk analysts collect account information to build up credit scoring criteria. Frequently, in such studies the outcomes of ultimate interest are correlated times to event, such as relapses, late payments, or bankruptcies. Since DT methods recursively partition the predictor space, they are an alternative to conventional regression tools.

This section is concerned with the generalization of DT models to multivariate survival data. In attempt to facilitate an extension of DT methods to multivariate survival data, more difficulties need to be circumvented.

### 4.1 Decision tree for multivariate survival data based on marginal model

DT methods for multivariate survival data are not many. Almost all the multivariate DT methods have been based on between-node heterogeneity, with the exception of Molinaro et al. (2004) who proposed a general within-node homogeneity approach for both univariate and multivariate data. The multivariate methods proposed by Su & Fan (2001, 2004) and Gao et al. (2004, 2006) concentrated on between-node heterogeneity and used the results of regression models. Specifically, for recurrent event data and clustered event data, Su & Fan (2004) used likelihood-ratio tests while Gao et al. (2004) used robust Wald tests from a gamma frailty model to maximize the between-node heterogeneity. Su & Fan (2001) and Fan et al. (2006) used a robust log-rank statistic while Gao et al. (2006) used a robust Wald test from the marginal failure-time model of Wei et al. (1989).

The generalization of DT for multivariate survival data is developed by using goodness of split approach. DT by goodness of split is grown by maximizing a measure of between-node difference. Therefore, only internal nodes have associated two-sample statistics. The tree structure is different from CART because, for trees grown by minimizing within-node error, each node, either terminal or internal, has an associated impurity measure. This is why the CART pruning procedure is not directly applicable to such types of trees. However, the split-complexity pruning algorithm of LeBlanc & Crowley (1993) has resulted in trees by goodness of split that has become well-developed tools.

This modified tree technique not only provides a convenient way of handling survival data, but also enlarges the applied scope of DT methods in a more general sense. Especially for those situations where defining prediction error terms is relatively difficult, growing trees by a two-sample statistic, together with the split-complexity pruning, offers a feasible way of performing tree analysis.

The DT procedure consists of three parts: a method to partition the data recursively into a large tree, a method to prune the large tree into a subtree sequence, and a method to determine the optimal tree size.

In the multivariate survival trees, the between-node difference is measured by a robust Wald statistic, which is derived from a marginal approach to multivariate survival data that was developed by Wei et al. (1989). We used split-complexity pruning borrowed from LeBlanc & Crowley (1993) and use test sample for determining the right tree size.

### 4.1.1 The splitting statistic

We consider n independent subjects but each subject to have K potential types or number of failures. If there are an unequal number of failures within the subjects, then K is the maximum. We let Tik = min(Yik,Cik ) where Yik = time of the failure in the ith subject for the kth type of failure and Cik = potential censoring time of the ith subject for the kth type of failure with i = 1,…,n and k = 1,…,K. Then dik = I (Yik ≤ Cik) is the indicator for failure and the vector of covariates is denoted Zik = (Z1ik,…, Zpik)T.

To partition the data, we consider the hazard model for the ith unit for the kth type of failure, using the distinguishable baseline hazard as described by Wei et al. (1989), namely where the indicator function I(Zik < c) equals 1 if Zik < c and 0 otherwise, which corresponds to a split, say s, based on a continuous covariate Zj (j = 1,..,p). If the covariate is categorical, then I(Zik Î A) for any subset A of its categories need to be considered.

Parameter b is estimated by maximizing the partial likelihood. If the observations within the same unit are independent, the partial likelihood functions for b for the distinguishable baseline model (10) would be,

(11)

Since the observations within the same unit are not independent for multivariate failure time, we refer to the above functions as the pseudo-partial likelihood.

The estimator can be obtained by maximizing the likelihood by solving . Wei et al. (1989) showed that is normally distributed with mean 0. However the usual estimate, a-1(b), for the variance of , where

(12)

is not valid. We refer to a-1(b) as the naïve estimator. Wei et al. (1989) showed that the correct estimated (robust) variance estimator of is

(13)

where b(b) is weight and d(b) is often referred to as the robust or sandwich variance estimator. Hence, the robust Wald statistic corresponding to the null hypothesis H0 : b = 0 is

(14)

### 4.1.2 Tree growing

To grow a tree, the robust Wald statistic is evaluated for every possible binary split of the predictor space Z. The split, s, could be of several forms: splits on a single covariate, splits on linear combinations of predictors, and boolean combination of splits. The simplest form of split relates to only one covariate, where the split depends on the type of covariate whether it is ordered or nominal covariate.

The “best split” is defined to be the one corresponding to the maximum robust Wald statistic. Subsequently the data are divided into two groups according to the best split.

Apply this splitting scheme recursively to the learning sample until the predictor space is partitioned into many regions. There will be no further partition to a node when any of the following occurs:

- The node contains less than, say 10 or 20, subjects, if the overall sample size is large enough to permit this. We suggest using a larger minimum node size than used in CART where the default value is 5;
- All the observed times in the subset are censored, which results in unavailability of the robust Wald statistic for any split;
- All the subjects have identical covariate vectors. Or the node has only complete observations with identical survival times. In these situations, the node is considered as ‘pure’.

The whole procedure results in a large tree, which could be used for the purpose of data structure exploration.

### 4.1.3 Tree pruning

Let T denote either a particular tree or the set of all its nodes. Let S and denote the set of internal nodes and terminal nodes of T, respectively. Therefore, . Also let |×| denote the number of nodes. Let G(h) represent the maximum robust Wald statistic on a particular (internal) node h. In order to measure the performance of a tree, a split-complexity measure Ga(T) is introduced as in LeBlanc and Crowley (1993). That is,

(15)

where the number of internal nodes, |S|, measures complexity; G(T) measures goodness of split in T; and the complexity parameter a acts as a penalty for each additional split.

Start with the large tree T0 obtained from the splitting procedure. For any internal node h of T0, i.e. h Î S0, a function g(h) is defined as

(16)

where Th denotes the branch with h as its root and Sh is the set of all internal nodes of Th. Then the ‘weakest link’ in T0 is the node such that

<

INSIGHTS FROM EGYPTIAN TELECOMMUNICATION SECTOR Contents 1. INTRODUCTION 2. DEFINITION OF A PILOT STUDY 3. VALUE AND GOAL OF A PILOT STUDY 4. THE PILOT STUDY IN THE CURRENT RESEARCH PROJECT 5. PILOT PROCEDURES AND ACTIVITIES 6. BACKGROUND OF THE STUDY 6.1 Overview of the Mobile Telecommunications Business in Egypt 6.2 Motivation 6.3 Problem Statement 6.4 Research Objectives 6.5 About the Researcher: 7. THEORITICAL FOUNDATION OF THE PROPOSED FRAMEWORK 8. POWERSIM MODEL 9. CONCLUSION

** ** The pilot study of the current research was the first step of the practical application of a decision support tool for outsourcing. In this essay the researcher covers a theoretical background of the definition and value of a pilot studies. He also covers the goal of a pilot study – what he expects from a pilot study. The researcher then gives an overview of the Theoretical Framework underpinning the research to give context and reference to the reader. The researcher then discusses the application of the pilot study in the current research. Finally, the outcomes of the pilot study will be examined, as they will have a direct influence on the actual research itself. We start with the definition a pilot study and state the value thereof following the introduction to clarify what a pilot study really is and why it is needed in the research process.

** ** A pilot study is a mini-version of a full-scale study or a trial run done in preparation of the complete study. The latter is also called a ‘feasibility’ study. It can also be a specific pre-testing of research instruments, including questionnaires or interview schedules. (Compare Polit, et al. & Baker in Nursing Standard, 2002:33-44; VanTeijlingen & Hundley, 2001:1.) The pilot study will thus follow after the researcher has a clear vision of the research topic and questions, the techniques and methods, which will be applied, and what the research schedule will look like. It is “reassessment without tears” (Blaxter, Hughes & Tight, 1996:121), trying out all research techniques and methods, which the researcher have in mind to see how well they will work in practice. If necessary, it can then still be adapted and modified accordingly. (Blaxter, Hughes & Tight, 1996:121) The pilot study in this current research can be defined as mainly a try-out of research techniques and methods. The researcher created a system dynamic model and applied this to a pilot group of employees and network KPIs. The value of first piloting the whole research process is discussed in the next section, because if a pilot study is of too little value, the researcher can waste time, energy and money.

The researcher will discuss first the value of a pilot study as explained by different authors and then the applicability to the current study in the following paragraphs. After stating the value of such a study, the researcher will compile the goal of a pilot study for the current research project.

- The Value of a Pilot Study

Things never work quite the way you envisage, even if you have done them many times before, and they have a bad habit of turning out very differently than you expected. “You may think that you know well enough what you are doing, but the value of pilot research cannot be overestimated” Blaxter, et al. (1996:122) It is thus obvious to the researcher, that the pilot study in the current research was essential to prevent the waste of time, energy and money. The value is also emphasised by the points listed below. Pilot studies could be conducted in qualitative, quantitative, and even mixed methods research. General application of pilot studies can be summarized in four areas: 1) To find problems and barriers related to participants’ recruitment. 2) Being engaged in research as a qualitative researcher. 3) Assessing the acceptability of observation or interview protocol. 4) Determine epistemology and methodology of research. A pilot study can explore the limitations of access to data due to cultural sensitivities. Also It can help researchers with refining the sampling strategy. In fact, the pilot study in addition to providing a ground for self-assessment of researchers’ preparation and capacity could help them to practice qualitative inquiry and as a consequence enhance the credibility of a qualitative research. Considering that the development of a plan for data collection requires researcher’s insight and creativity beyond a mechanical inquiry in frame of the research questions and model development, conducting a pilot study could facilitate judgment about the possibility of obtaining sufficient and rich data to answer the research question as well. The pilot study in the current research process was very specifically used to identify practical problems in the process, sessions and methods used. The research itself has as a goal the applicability of the developed decision support tool for outsourcing in the Telecommunication mobile industry to improve their professional practice The pilot study would thus indicate whether the proposed methods and / or instruments are appropriate. The pilot study could also give advance warning of possibilities where certain types of techniques or the study as a whole could fail. A pilot study can therefore be of value for testing the feasibility of either research instruments or data collection instruments and also of the research process itself. The following section combined the statements of the value of pilot studies in a goal of pilot studies in general as well as for the current research project.

- The Goal of a Pilot Study

** ** The researcher sees the goal of a pilot study in general as related to the aim of the research project of which it forms part. The general goal of a pilot study is to provide information, which can contribute to the success of the research project as a whole. The goal is thus to test the study on small scale first to sort out all the possible problems that might lead to failure of the research procedure. It might minimise the risk of failure. In the current study the goal of the pilot study consists of two parts:

- To find as many as possible practical arrangements that might have a negative influence on the success of the research procedure.
- The other included sorting out all practicalities related to measurement instruments as well as the applicability of these instruments to the potential outcomes of the study.

The procedure of the pilot study in the current research project is discussed in the following paragraphs.

The pilot study of the current research follows the design phase, which is the research strategy as stated in Chapter 1. Since the participants in outsourcing decision making incompletely assess the use of resources (work, capital and knowledge) and capabilities in the framework model of company policy for the establishment of relationship with the outsourcer, I tried to find out why enterprises decide to enter outsourcing relationships. I had a number of questions that I wanted to explore:

- Why many outsourcing agreements failed to achieve part or whole of its goals?
- How to avoid the significant risk to the company when it’s outsourcing strategy failed?
- How to facilitate such an important decision that many operators are facing frequently which could change their entire strategy and positioning?

Yes, the failure of outsourcing agreement impact is not only limited to not achieving the target cost savings, efficiency and better performance, but it goes beyond that up to getting the company out of competition! From here it came up to the researcher the idea ** of developing a “Decision support tool for outsourcing”** Three basic premises given below have arisen while thinking about the purpose of the research:

**higher complexity, specialisation and the division of labour make it possible for the outsourcers to carry out several activities with lower costs and a higher added value, than in the case of carrying out all activities inside the company; the outsourcing company chooses suppliers, who improves the outsourcers position on the market through their knowledge, capabilities and technology.**

*1.The choice of partners –***– the majority of companies are in favour of short-term and mainly financial results of the outsourcing relationship. They are rarely aware of the long-term consequences of their actions, thus it is necessary to study the use of capabilities brought about by establishing and termination outsourcing activities and compare them in the temporal framework with the benefits for both the outsourcer and outsources.**

*2. The consequences of short-term placement***– the review of literature shows that enterprises rarely deal with problems, which may arise if a company decides to terminate outsourcing activity and brings it back to the outsourcing company. Thus, it is necessary to find out if the outsourcing company still has the equipment and professional staff, who is familiar with the process, financial assets etc. or the position in which the outsourcee may find itself in. Furthermore, we should analyse difficulties of the two companies, the outsourcer and outsourcee at the time of termination of outsourcing activity.**

*3. The consequences of eventual termination of outsourcing*** ** – Secondary data collection – Focus group discussions

** ** In recent years competition has been fierce between the three mobile operators in Egypt (Vodafone Group-backed Vodafone Egypt, Orange-backed Mobinil and Etisalat-backed Etisalat Misr). This has been primarily due to the increased market saturation and the implementation of new regulatory policies such as mobile number portability. New market developments, such as the proposed introduction of Mobile Virtual Network Operators (MVNO) services & introduction of fourth operator have intensified the competition, inevitability putting pressure on the operators’ costs and profitability, which have already been impacted by the recent political events in Egypt including and not limited to 2011 and 2013 revolutions. As a result of changing market dynamics: reducing costs and maintaining profitability are the key issues on top of Egyptian mobile operators’ agendas. Outsourcing has emerged as an effective strategic tool for these operators to address the key issues of cost reduction and improved profitability. The unrelenting pressure for greater efficiencies has forced many firms to increasingly focus on their core competencies and hence, specialize in a limited number of key areas. This has led operators to outsource some activities that traditionally have been carried out in-house. A number of factors make the intersection of mobile telecommunications industry, the MENA geography (represented by Egypt) and Outsourcing a prime research topic, among them: -The mobile telecommunication industry is booming in Middle East and North Africa (MENA) where Egypt represents the largest market in the Arab world. -Egypt’s mobile market is among the most competitive in MENA and is key driver for its commerce. -Many of the studies of outsourcing in the telecommunications industry have focused primarily on the motives for outsourcing and have failed to provide an in-depth understanding on the outcomes associated with outsourcing.

### 6.1 Overview of the Mobile Telecommunications Business in Egypt

Egypt’s mobile subscriber base had risen to 105.5 million from 83.121 million at the end of December 2015 (Table 1). The telecommunication industry of Egypt is one of the fastest developing sectors in the country, by 2017; it is believed that the market will have around 110.9mn subscriptions and penetration rate of 125.2%. Since the establishment of the Ministry of Communications and Information Technology in 1999, the Egyptian telecom industry has been ushered into an era of liberalized policies and new regulatory laws. In recent times, there has been a significant growth in low-income customer segment owing to stiff competition between various mobile operators, which has led to tariff reductions. Egypt’s mobile market is among the most competitive in the Middle East and North Africa region, playing host to four major international players: The Vodafone Group-backed Vodafone Egypt, Orange-backed Orange-Egypt and Etisalat-backed Etisalat Misr plus the new awarded WE backed up by Telecom Egypt – see Table (2). Business environment became more competitive, especially with market saturation, the introduction of fourth operator, the Mobile-Fixed Convergence and the implementation of new regulatory policies such as mobile number portability. We expect competition to become even more intense in the next few years following the proposed introduction of MVNO, FVNO and 4G services. **Table (1): **Mobile Market Overview Clearly, the political situation in Egypt has had a positive effect on mobile subscriber growth contrary to previous expectations. However, the effect on operators’ financial indicators is less encouraging as Orange Egypt and Vodafone Egypt reported a sharp decline in ARPUs and, consequently, net profits during past years (figure 1). **Table (2): **Competitive Landscape **Source: BMI Egypt Infrastructure Report Q4 2016, Publish By: Business Monitor international 2016** ** ** ** ** There are still a number of negative characteristics of Egypt’s telecoms sector. These include a mobile market that is highly skewed towards prepaid users and falling mobile ARPUs. In addition, price competition has been aggressive since the introduction of compulsory SIM registration in May 2010. Although Egypt’s operators have reported sequential increases in ARPU, the overall trend in the market remains downwards as strong competition continues to give consumers large amounts of choice and forces prices down. **Figure (1): **Industry Forecast – ARPU 2011-2018, BMI Forecast, Source BMI

### 6.2 Motivation

A number of questions motivated the researcher to start his research: How can managers of Egyptian Telecommunication operators successfully deal with fierce competition? Can Outsourcing be an effective strategic tool for these operators to address cost reductions and improved profitability through right sourcing decision? What are the key factors that affect a sourcing strategy decision? What is the appropriate sourcing strategy that to be pursued by business leaders in the Egyptian telecommunication sector? And Why?

### 6.3 Problem Statement

With a population of 90+ millions, Egypt is the largest market in the Arab world. However, unemployment by 2018 is high at 15% a phenomenon that subdues demand. Furthermore, Egyptian wages are low in global terms, and, though this offers certain advantages to foreign investors, it also implies that there are limited opportunities to rapidly deploy more lucrative, high margin, telecoms products. Meanwhile, Egypt has a relatively low level of urbanization, with only around 43% of the population living in towns and cities. This presents telecoms network operators with specific challenges when it comes to rapidly extending new services and technologies to the wider populace. Furthermore, Egypt’s mobile operators are starting to show concern over the continued downtrend of the Average Revenue Per User (ARPU) and their weak subscriber mixes. To further compound the challenge facing the operators in Egypt, in addition to competing on the price of telecom services, they are also expected to compete in other key areas such network quality and completeness of coverage. This view is supported by the fact that, in spite of the operators’ margin compression, there were several large-scale network upgrades and expansion plans announced by the mobile operators during the last 5 years starting 2013. In the face of these challenges, Telecom operators are studying the adoption of an outsourcing strategy to reduce some of the operational burdens. However, their profitability in the longer term also depends on their ability to innovate, in addition to reducing their operational costs. The problem that the research is tackling is to identify relation between the adoption of outsourcing strategy in telecom operators and their profitability through delivering competitive network quality with optimized cost. Hence the research main question will be: How to improve the professional practice of mobile operators in the sourcing domain?

### 6.4 Research Objectives

When starting to think about this research , from professional practices many outsourcing decisions have failed leaving the operator bleeding and not able to provide adequate service. Even the operator is not capable of bringing back his skilful resources that was sacrificed after an outsourcing decision that was not really well analysed! Hence, the ultimate objective of this research was How to facilitate such an important decision of outsourcing that many operators are facing frequently which could change their entire strategy and positioning? Is it possible to provide a simple tool that help the executives and practitioners in the mobile operators to help them in making an adequate decision with minimal risk? This research will also help guide telecom operators to solve the long-standing questions regarding adopting a sound outsourcing strategy. This will cover the academic and practitioner viewpoints: Building a model to explore the nature of relationship between all addressed constructs namely: strategic sourcing, mobile telecom operator (market), organization performance (Network quality, costs,…) and innovation. This model will enable the examination of how the choice of the outsourcing strategy affects the organization performance.

### 6.5 About the Researcher:

The researcher’s is currently a Senior Director of the Network at Orange Egypt, with a cumulative experience of over 20 years in the field of telecommunications. He currently oversees budgeting and dimensioning of the entire Orange network. His background includes strategic planning for start-up engineering projects, operations, maintenance, human resources balancing, and finance. He successfully grew revenue, increased efficiency and productivity, reduced costs, improved operations, and expanded the company footprint to 30+ Million subscribers. He is a results-driven executive with a solid understanding of the practices, technologies, and service providers within the Telecom & IT industry. The researcher’s current responsibilities include managing an organization of 300+ employees and an annual operating budget in excess of $150+ million USD. This budget must be deployed judiciously and stretched to maximize its impact. In that role, the decision whether to outsource a certain function or keep it in-house is encountered all-too-often. In our industry, we are inundated with information regarding outsourcing and its benefits; however, information regarding its impact on an operators’ profitability is scarce and hard to come by. With an engineering background the researcher has always attempted to develop an analytical framework for decision-making. **It is in this context that the researcher has taken a keen interest in the topic and decided to make it the focus of the DBA research thesis.** ** ** ** ** ** ** ** ** ** **

** ** In this section, I will go quickly through the main theories tackled during the literature review and highlight the main ideas and elements extracted to base the foundation of building my system dynamic modelling. It was found through the literature review that the majority of the outsourcing models are built based on 3 main theories:

- Transaction Cost Economics (TCE)
- Resource Based View (RBV)
- Agency Theory.

A number of outsourcing studies in the telecoms industry have employed either Transaction Cost Economics (TCE) as Edoardo Mollona & Alessandro Sposito (2008), Jacobides, M.G. and Winter, S.G.(2005) Jiang, B., Belohlav, J. Young, S. (2007). Or Resource-Based View (RBV) as Lowson (2002) Coates and McDermott (2002) Vastag, (2000) Aron, R., Singh, J.V.(2005) Ellram et al.(2008); Youngdahl and Ramaswamy (2008) or Agency Theory as Logan, Mary S (2000) theoretical frameworks to undertake their analysis. The ideas/concepts extracted from the three aforementioned theories will be used jointly, for my proposed model, which is possible to accomplish with our System Dynamics approach. System Dynamics models are computational representations of the causal structure of systems—be they physical, social, or economic—as a set of differential equations using stock and flow variables. The stock and flow variables are arranged in structures called causal loops to eventually form Causal Loop Diagrams or CLDs . One of the most important reasons to use system dynamics it its capability to manage soft variables included in our model like ” Resource Capabilities” variable, as far as soft variables are concerned, numerical data are often unavailable or non-existent. Despite this, such variables are known to be critical to decision making and, therefore should be incorporated into system dynamics models. The stocks and flows constituting each loop in the researcher’s proposed CLD will have its theoretical underpinning tied to one of the three aforementioned theories, for which we present a short overview next.

** ** The Resource Based View (RBV) theory posits that firms create sustained competitive advantage with resources that are rare, valuable, imperfectly imitable and not substitutable (Barney 1991). Firms as constructs of human interaction tend to develop their own language for codifying knowledge and their own routines to enhance internal processes. If an activity is highly specific to a company, it is embedded in the company’s language and routines. Employees are then familiar with this “common organization communication code” (Monteverde 1995). Thus, activities can be governed more efficiently within the firm. RBV then does not predict how efficient an external purchase can be, it rather points out that the more firm-specific an activity/resource, “the greater use it makes of firm-specific language and routines, and hence the more efficient is internal governance” (Poppo and Zenger 1998, p. 858). *From there, It was taken into consideration the importance of the** **effect of the knowledge/Experience of the firm’s employees into its performance, and hence the outsourcing decision.*

** ** Following transaction costs economics (TCE), external suppliers can achieve production cost efficiencies through economies of scale and specialization (Marshall et al. 2007), which provides a motive for outsourcing (Poppo and Zenger 1998). However, other costs related to the exchange of services within or across firm boundaries, such as search, selection, bargaining, monitoring and enforcement (Madhok 2002), may offset the production cost savings of external suppliers given the higher likelihood of opportunistic behavior of an external supplier compared to an internal unit (Williamson 1991). Frequency, asset specificity, and uncertainty are the key drivers of transaction costs. External suppliers build Global Network Operations Centers to serve the networks of many customers. These few but large operation centers presumably work more efficiently than the sum of all small operation centers managed by and serving single network operators. This is in part due to fixed costs, for example in problem-solving teams. Suppliers serving several networks need only one such team as network breakdowns rarely occur simultaneously in multiple networks. Hence, specialized external suppliers can offer network operation services at lower cost than internal departments at operators (Hecker and Kretschmer 2010). For transaction costs, Crandall et al. (2009) argued for the telecommunications market that negotiating issues such as “prices for maintaining the network, connecting subscriber lines, and replacing network elements as they depreciate” (p. 506) are complex and that the efficiency of market governance is likely to be low. We analyze market-related transaction costs in detail based on the three major drivers for transaction costs. First, consider the frequency of transaction. Network operators communicate on a daily basis with their external services partner about technical issues. However, they do not change their service supplier frequently but sign contracts for three to five years, avoiding high costs due to on-going searching and negotiating. Second, we assess asset specificity of network operation services for a network operator. These assets do not have to be of a physical nature (Klein et al. 1978) and can be interpreted as the knowledge and expertise employees of an external supplier have developed. As mentioned earlier, all major network equipment vendors have the ability to manage not only their own equipment, but also infrastructure initially built by a competitor. Both the interfaces between billing systems or customer management databases and the network infrastructure have been standardized since the launch of 2G mobile. Hence, operators can easily switch suppliers, if they need to. Third, Environmental uncertainty primarily refers to the inability to predict market demand (McNally and Griffin 2004), which is constantly “shifting and evolving” in the Telecommunications industry (Crandall et al.2009). Despite this, mobile network operators cannot adapt their physical network quickly to demand fluctuations. *In summary, from a TCE perspective it is important to take into consideration the cost of doing the service .Comparing the internal Cost with the outsourcing costs but taking into consideration the RBV effect. Also, the risk of not being able to get the internal resources back again once we decide to go for outsourcing should be considered.**As well as the service performance comparison (in our case the network performance comparison) into consideration, which will be highlighted in the next section through the Agency theory.*

** ** Agency Theory has a long tradition in analyzing situations when parties cooperate through the division of labor (Eisenhardt 1989). More precisely, it examines situations where a principal delegates work to an agent. The focal point of analysis is how to align the interests of the agent in an efficient and cost-effective way with those of the principal. If an agent’s performance can be measured adequately, market prices provide the most effective incentives for the agent to act in accordance with the principal’s interests (Poppo and Zenger 1998). If the performance of an agent, however, is difficult to measure, market contracts might be less efficient than internalizing the principal-agent relationship (Barzel 1989, p. 76). Within an organization, principals can suppress opportunistic behavior of an agent by “behavioral monitoring” (Poppo and Zenger 1998, p. 859) and the use of authority instead of incentives. In market transactions, such instruments are not available. We now question if the performance of external network operation services can be measured accurately. If all functions related to operation services are outsourced, the focal variable is the overall quality and reliability of the network, which is crucial for the success of an operator. In practice, network operators include key performance indicators, audits and service benchmarks in their contracts with external suppliers and measure network quality via overall network coverage and the number of breakdowns (Friedrich et al. 2009, p. 14). *From the Agency theory, we can understand the importance of the service delivered (here the network Key Performance/Quality Indicators) to be taken into consideration as well as the costs and the skills in the outsourcing strategy decision.** ** ** ** *

** ** After highlighting the main variables to be taken into consideration going through the main theories governing that domain, I tried to develop a decision support tool for the executives of the Mobile operators to guide and support them to decide on their strategy in managing their Network. Whether outsource or not? why? when? and at which cost? By addressing the application of the ideas extracted from TCE, RBV and Agency theories in an integrative framework ;we can then understand if it is Beneficial for local telecommunication network and service providers to outsource activities to international managed service providers or no. Specifically, the author applies RBV to address the questions related to the strategic importance of those activities, assessing whether they are core competences of the firms or not. By contrast, TCE will assist in assessing whether economic advantages are actually achievable by outsourcing activities. Finally, Agency theory will be used to assess the quality measure. Epistemologically the qualitative paradigm was chosen in this research for several reasons; first, the research main objective is to “explore” how the outsourcing affects the mobile operators’ profitability in Egypt. The researcher is trying to examine the current change in “outsourcing phenomena” that started to rise in the telecommunication field. As the researcher will be more concerned about the quality and texture of sourcing experience, the researcher in this paradigm is trying to explore the relation between the sourcing model and operators’ profitability. The research involves observation, measuring and testing:

- Critical observation
- Model Building
- Analysing
- Evaluation

** **

- Initial causal loop diagram:

** ** The initial CLD shown below in ( figure 2) was the basic idea of this research , the aim was to give a decision support for the executives and managements of Telecommunication Mobile operator whether to outsource or not their network activities. The start-up idea was to compare the Profits (Revenues-Costs) of those activities when handled in house versus the Profits when they are outsourced. The researcher realised then that there other elements that could impact the customer satisfaction else than the employees experience/performance and the network. Also the same applies on the revenues , as there are a lot of factors that could impact the revenues else than the customer satisfaction like the sales force , the competition and many others… After many researches and discussion with the supervisor and the executives of the company. And based on the long experience of the researcher in the Mobile Telecommunication industry; it was decided to focus mainly on the Network KPIs impact and the sourcing costs. **Fig (2) Initial Model CLD** **8.2**** ****Model**** ****Overview:** The purpose of this model is to help in the decision making of insourcing or outsourcing the engineering services of a given project / rollout based on the following parameters;

- Availability of the qualified in-house resources.
- Insourcing cost.
- Out-sourcing cost.
- Best Competitor KPIs.
- Vendor KPIs.

The main focus now in the upcoming sections is how to determine the availability of the qualified in-house resources and the insourcing cost then later on I can build on this to take into account the effect of the Key performance indicators (KPIs) and the outsourcing cost. **8.2.1 Insourcing scenario prediction/estimation:** **Training hours & Platforms knowledge **In order to simulate the availability of in-house resources and their movements (promotions, churn) inside a given organization, the following assumption is made to be able to quantize the process; The promotion of an employee from a level to another depends on the employees’ knowledge of a specific number of platforms and the availability of open positions in that level. The platform could be the products of a specific OEM or Vendor. To be able to measure the knowledge of a given employee, the number of training hours received is considered in this study as the reference for the knowledge of a given number of platforms. At given thresholds of training hours: ** Training Hours_ Threshold-1 & Training Hours_ Threshold-2**, which are user inputs, the knowledge of one, two or three platforms is determined. If the training hours obtained by a given employee are more than

**and less than**

*Training Hours_ Threshold-1***then this employee knows 2 platforms and if exceeded**

*Training Hours_ Threshold-2***then 3 platforms are known by that employee. Example: Assuming**

*Training Hours_ Threshold-2***, the following platform knowledge will be determined (Table 3).**

*Training Hours_ Threshold-1= 200 and Training Hours_ Threshold-2 = 500*Employee ID |
Initial Training Hours |
Platforms Knowledge |

1 | 550 | 3 |

2 | 230 | 2 |

3 | 220 | 2 |

4 | 220 | 2 |

5 | 100 | 1 |

6 | 90 | 1 |

7 | 80 | 1 |

8 | 70 | 1 |

9 | 100 | 1 |

10 | 100 | 1 |

__Table (3) Example for the platform knowledge determination__Based on the above criteria, the Employees Experience loop is built. An initial state is fed into the model which consists of the training hours already received (Initial data) and the planned number of training hours per year based on the employees experience (imported data). In the initial trials to build the model, “the training hours “were modeled as one variable containing the total number of hours of all the employees but this didn’t work as it prevented the capability of increasing the training hours per year for each employee differently so after going through the Powersim it was found that the best way to model the training hours for each employee and the rate of increase both as ** arrays.** An extremely powerful feature of Powersim is the possibility to define indexed variables, or arrays. One array variable can hold several values, as opposed to an ordinary scalar variable, which holds only a single value. Each array variable consists of several elements. By defining a variable as an array, a group of related values may be represented as one variable which in my case is the training hours received by different employees which gives the ability to simulate the real case which is different employees getting different training sessions . The next challenge was how to import the initial data into the Powersim model, in the beginning a manual method was used to fill the array but it was not accurate or convenient in addition that It leads to a static model i.e. cannot be changed which contradicts with the purpose to have a dynamic & flexible model and also having the data imported manually, limits the number of entries i.e. the employees to be analyzed which will yield to another drawback. Going through Powersim, a very useful function called

**was found. The XLDATA function returns the values of an area in an Excel worksheet as a scalar, vector, a two-dimensional array, or a three-dimensional array. XLDATA cannot be used in composite expressions; i.e., it must define a variable completely, and it cannot be used for writing data to Excel. For example to import the following data range in excel, the**

__XLDATA__

*XLDATA definition will be XLDATA(“C:/../Book1.xlsx”, “Sheet1”, “R1C1:R5C2”) (table 4).*

**Table (4) EXCELDATA example So in order to create a dynamic model as much as possible and easy to use, X**

**LDATA**function was used to create the following 2 arrays; the initial data array that contains the training hours already received for each employee so its dimension is

**1 * Number of employees.**The imported data array that contains the planned number of training hours per year for each employee so its dimension is

**1 * Number of employees.**Through this study, the employees of a given organization will be classified into 3 categories as shown in ( table 5) below;

Employee category |
Platforms Knowledge |
Representation in the array |

Normal | 1 platform | 1 |

Experienced | 2 platform | 2 |

Rare | 3 platform | 3 |

** ** Table (5) employees Categories Using Powersim simulation, the number of training hours per employee for a future period of time (Quarters for example) can be estimated based on the settings of the simulation.(figure 3). The number of training hours received is simulated with a “level” which acts as a reservoir that keeps the flow (training hours / year) going into it. Figure (3) The platforms movement is simulated using multiple if condition as shown below; FOR( i=’data range’ | IF(‘Training hours'[i] >= ‘Training Hours_ Threshold-1’, IF (‘Training hours'[i] <= ‘Training Hours_ Threshold-2’,2,3) ,1)) The below Graph in figure (4) represents the RUN of the above model on 20 employees for 9 consecutive quarters . ** ** The (table 6) below shows the platform movements as an array over years As an example we can see in year 0 the following array: {3,3,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} Which means we have 2 employees as Rare knowing 3 platforms, 3 employees as experts knowing 2 platforms and 15 Normal employees knowing only 1 platform. While in year 5 we have 3 employees as rare and 17 as experts and none normal. And the good thing that you can even determine that the 3rd employee is expert and the 4th is rare, {3,3,2,3,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} ** ** Using a **“for loop”,** an **“IF condition” **decides for each employee the number of platforms he possess based on the threshold training hours then using **COUNTEQ** function , the number of employees with a given number of platforms is determined .(figure 5) For example, the array A= {1,1,2,3} COUNTEQ (A,1) will be equal 2 Which means the number of repeated an element equal to “1” in the array is 2. Then to make the study more practical and matching the real case scenario, the churn of employees is taken into consideration and modeled by Exp. leaving per year and rare leaving per year. Fig (5) number of platform processed For a team to achieve certain network KPIs, it should have enough experienced and rare members so ** an experienced employees target count and rare employees target count **are created in the model as user inputs to give the user the ability to adjust the model and the simulations based on the requirements and the experience of the decision maker which in this case will be one of the executives in the MNO. As mentioned earlier, the promotion of an employee from a level to another depends on the availability of open position(s) in the higher one so there will be cases when there are employees qualified from a knowledge perspective to be promoted but still no availability/need to promote them so to simulate this case, variables

**created as these values will be needed in the decision making of promoting the employees later on.(figure 6)**

*called Experience ready and Rare ready are***8.2.2 Employees’ movement**The second major part of the “employee experience” loop in the model is the employees’ movement loop which builds on the output of the platforms movement loop. In this loop, the movement of employees between the 3 categories is simulated, taking into account the following parameters;

- The availability of qualified resources to be promoted –
&*Experienced ready*.*Rare ready* - The availability of open positions in the higher categories –
&*Experienced count target***rare count target**. - The churn rate.

The available open positions are determined by the level of experience needed, for example , at a given project to be achieved , 15 % of the employees need to be experienced and 5% need to be rare so based on this count targets , the model will simulate the employees’ movement to predict when this employees combination will be available. If there are no qualified resources available to be promoted, the number of experienced or rare employees will be below the target (required) count which will increase the time needed to insource the project on hand and may lead to outsourcing the project if it is time sensitive . In the initial phase of the model, the employees’ counts were modeled as an auxiliary. An auxiliary is used to combine or reformulate information. It has no standard form; it is an algebraic computation of any combination of levels, flow rates, or other auxiliaries i.e. it has no memory. In this trial, the value of the experienced count for example was calculated at a given instant (year for example) and it will be calculated again based on the equation creating it in the next simulation step which is not adequate for the case in hand where the counts of a given year will depend on the previous one so after studying the different variable types of the Powersim tool, it was found that the best way to model the Normal, Experienced, Rare employees’ counts as **“levels”.** The concept of levels depends on the fact that every element in feedback loops, and therefore every element in a system, is either a level or a flow. Levels are accumulations and flows represent the changes to these levels. Flows fill up or drain the levels, much as the flow of water into a bathtub fills it, and the drain at the other end (another flow) empties it. This action of flows being accumulated in levels is the cause of all dynamic behaviors in the world. So in our case, employees’ counts at different categories **are levels** and the decision of promotions is the** flow** that fills these levels and the employees’ churn is the flow that drains it. For example, as shown in the loop below (figure 7), the Fresh to ** Normal decision** is the flow the fills the

**to compensate for the N**

*Normal count level***that drains it. Figure (7) Normal employee movement The decision of promotion flow is modeled by an IF condition that includes the parameters mentioned above. For example to model**

*leaving per year***, I used**

*the Norm to Exp Decision***as follows; IF (‘EXP Count’ <‘EXP Count Target’ AND ‘Exp Ready’ > 0 <<emp>>, IF (‘Exp Ready’>=(‘EXP Count Target’ -‘EXP Count’),’EXP Count Target’ -‘EXP Count’,’Exp Ready’), 0 <<emp/year>>) Using the first condition, the current count of the experienced employees is compared to the target count and also the availability of ready employees to be promoted is checked. Once the current count is below the target and there is availability of ready employees, the count is increased by the delta between the target and current counts on condition that this delta count is equal to the ready employees to be promoted. At each simulation run (time unit ) quarter for example , the employees’ count of a specific category is compared to the target count , if it is less , then a number of employees are promoted to achieve the target count given that they are qualified i.e. know enough platforms . The churn can happen due to different reasons, for example staying in the same position for a long time without a promotion, salary saturation etc. In this study, the churn flow (rate) is a user input percentage for the sake of simplicity. Another major technique I used to model the employees’ movement is the**

*multiple IF condition***from one variable to another. For example in the loop below, the feedback from**

*feedback***Decision is very important to have accurate results.(Figure 8) The below charts in (figure 9) represent the RUN of the above model on 20 employees for 10 years. As an example, if we look at year 1 we can see: 15 Normal, 3 Experienced and 1 Rare. By checking the values, we will see that the model is adjusting the counts to match the input target counts, so for example in the above simulation, the target count was 4 for experienced and 3 for rare which are the numbers the model is trying to achieve. To test the outputs of this loop, input target counts are applied as below in (Figure 10); Starting from an initial state of {16, 3, 1} and checking the output of the loop, it is clear that the model managed to achieve the target counts {12, 6, 2} starting the 5th quarter.**

*the EXP count to the Norm to Exp*

__Figure (10) Example for the loop output__Another example to test the outputs of this loop with the following inputs as target counts ( Figure 11); Starting from an initial state of {16, 3, 1} and checking the output of the loop, the model is approaching the target count {14, 3, 3} as it achieves {15, 2, 3} starting the 5th quarter.

__Figure (11) Example for the loop output__**8.2.3 Insourcing cost:** Given the counts of employees at different categories, the insourcing cost can be estimated. The following parameters are taken into consideration to estimate the overall cost of insourcing (Figure 12,15);

- Tools for the staff.
- Transportation.
- Training
- Location

-Salaries including annual raise and bonuses. Cost of tools, transportation, training and location are modeled as constants which represent information that is not changed by the simulation, but they can be changed by the user through input controls based on the user requirements. __Figure (12) Insourcing Cost parameters__** ** Constants are often used to identify and quantify the boundaries of the model, and to represent decision parameters. They are, as the name implies, constant, and the definition only defines the initial value (the definition is only calculated at the start of the simulation). Also it is possible to assign a new value to a constant through input controls, thereby changing the scenario of the model. By creating permanent constants, I can create constants that not only keep their values over one simulation run, but also keep its value between simulation runs. Permanent constants help to create simulations that “remember” the input given by the user. It is useful to create constant variables rather than including literal constants in various variable definitions, this help to clean up the model and visualize parameters that might be decision parameters in the system. It also helps to gain full effect of Powersim powerful unit detection capabilities. Also, if it is needed to change units at a later stage; I will only have to do so for a handful of constants rather than going through all the variables of your system to find them. Cost of salaries is modeled as a level as it represents states in the system that change over time. Levels are variables with memory, and their value is determined by the flows that flow in and out of them. The rate of change of the salary (in-flow) is the annual raise and to take into consideration the salary saturation after a certain number of years, the annual raise of the salaries is only applied for a given number of years only such that the model provides more practical results as shown in the time graph below (Figure 13,14).

__Figure (13) Employees’ Salaries over the years__As shown above, the salary saturates after year 3, which is one of the main reasons of employees’ churn. ins

__Figure (14) Example for the loop output__ **IN SUMMARY,** The researcher was able to build and test the first two loops of his model, mainly the experienced resources and the cost . The researcher is confident as per Sterman to follow the same approach to build and test the 3rd loop of the model representing the Key Performance Indicators in a straight forward way and with the same concepts used to create the first 2 loops already explained.

** ** In this pilot project, I achieved wo objectives : The first was to identify why many outsourcing agreements failed to achieve part or whole of its goals and in doing so was there a way of minimising the risk to a company if it’s outsourcing strategy failed. My solution was to build a decision support system. I chose the System Dynamic methodology and the modelling tool Powersim as my research showed that this was a powerful way of building such systems. The second objective of this pilot project was then to test if Powersim could do all the tasks that would be needed to create an efficient and reliable Decision Support Tool. Both objectives have been achieved. I have shown to my satisfaction that a decision support tool would be a beneficial tool for the industry and I have tested out some initial loops and functions of Powersim that I will need in my final model My tests involved modelling the insourcing which helps to predict and simulate the availability of the in-house resources and also the insourcing cost model was explained. I have also shown that the insourcing strategy will be able to achieve certain network Key performance indicators (KPIs) and that Powersim has all the functional ability to model the next stage which is how to model the KPIs and relate them to the available experience of the in-house staff. I am now confident that I can create a top level design will relate the outputs of the insourcing model and the insourcing cost model to the new KPI loop that will be created. References **Agency Theory as Logan, Mary S (2000)** *Mary, S. L. (2000), Using Agency Theory to Design Successful Outsourcing Relationship, The International Journal of Logistics Management,Vol. 11 Iss. 2, pp. 21-32* **Aron, R. and Singh, J. V. (2005).*** ‘Getting offshoring right’. Harvard Business Review, 83, 135-43.* **Barney, J. B. 1991.*** ‘Firm Resources and Sustained Competitive Advantage.’Journal ofManagement 17 (1): 99–120.* **Barzel 1989**** ****Barzel, Y. 1989***. Economic analysis of property rights: Political Economy of Institutions and Decisions series; Cambridge; New York and Melbourne; Cambridge University Press. ***Blaxter, et al. (1996:122**** ****Blaxter, L., Hughes, C. and Tight, M. (1996, 2002, 2006, etc.).*** How to Research. Buckingham, Philadelphia: Open University Press. 263 pp. Blaxter, Hughes & Tight, 1996:121* **Coates and McDermott (2002)*** **Coates, T.T., McDermott, C.M., 2002. An exploratory analysis of new competencies: a resource based view perspective. Journal of Operations Management 20, 435–450 ***Crandall, R. W. / Eisenach, J. A. / Litan, R. E. (2009)*** ‘Vertical separation of telecommunication networks: Evidence from five countries’, available online at: http://ssrn.com /abstract=1471960** **[last accessed Jan 5, 2012].* **Edoardo Mollona & Alessandro Sposito (2008),*** **Mollona, Edoardo, and Alessandro Sposito. **2007. “Transaction Costs and Outsourcing Dynamics : A System Dynamics Approach.” International Conference on System Dynamics: 1-16.* **Eisenhardt, K. M. (1989),*** “Building theories from case research”, Academy of Management Review, Vol.14, No.4, pp. 532-550.* **Ellram, L. M., Tateb, W. L., & Billington, C. (2008).*** Offshore outsourcing of professional services: A transaction cost economics perspective. Journal of Operations Management, 26(2), 148–163.* **Friedrich, R. / Weichsel, P. / Miles, J. / Rajvanshi, A. (2009)*** ‘Outsourcing Network Operations – Maximizing the Potential’, Booz & Company,* **Hecker, A / Kretschmer, T (2010)*** ‘Outsourcing Decisions: The Effect of Scale Economies and Market Structure’, Strategic Organization 8 (2): 155–175.* **Jacobides, M.G. and Winter, S.G.(2005)*** **“The co-evolution of Capabilities and Transaction Costs: Explaining the Institutional Structure of Production”, Strategic Management Journal. 26 (5): 395-413.* **Jiang, B. / Belohlav, J. A. / Young, S. T. (2007)*** ‘Outsourcing Impact on Manufacturing Firms’ Value: Evidence from Japan’, Journal of Operations Management 25 (4): 885-900.* **Klein et al. 1978**** ****Klein, B., Crawford, R. and Alchian, A. (1978***), Vertical Integration, Appropriable Rents and the Competitive Contracting Process, Journal of Law and Economics, No. 21, pp. 297-326* **Lowson, R. H. (2002).*** Assessing the operational cost of offshore sourcing strategies. International Journal of Logistics Management, 13(2), 79–89.* *“***Madhok, A. (2002).*** Reassessing the Fundamentals and Beyond: Ronald Coase, the Transaction Cost and Resource-Based Theories of the Firm and the Institutional Structure of production, rategic Management Journal, 23, pp.535-550.”* **Marshall, D., McIvor, R. and Lamming, R. (2007)*** Influences and outcomes of outsourcing: insights from the telecommunications industry. Journal of Purchasing and Supply Management 13, 245-260.* **McNally and Griffin 2004**** ****McNally, R. C., & Griffin, A. (2004).*** Firm and individual choice drivers in make-or-buy decisions: a diminishing role for transaction cost economics?. The Journal of Supply Chain Management, 40(1), 4–17.** ***Monteverde, K. (1995).*** Technical dialog as an incentive for vertical integration in the semiconductor industry. Management Science, 41(10), 1624–1638.* **Poppo, L. / Zenger, T. (1998)*** ‘testing alternative theories of the firm: transaction cost, knowledgebase, and measurement explanations for make-or-buy decisions in information services’, Strategic Management Journal 19 (9): 853-877.* **Teijlingen, E. R. and Hundley, V. (2001),*** The Importance of Pilot Studies, Social Research Update, Issue 35, University* **Vastag, G., 2000***. The theory of performance frontiers. Journal of Operations Management 18 (3), 353-360.* **Williamson, O. E. (1991)*** ‘Strategizing, economizing, and economic organization’, Strategic.** **Management Journal 12: 74-94.* **Youngdahl, William***; Ramaswamy, Kannan Journal of Operations Management, 2008, Vol.26(2), pp.212-221 [Peer Reviewed Journal]*