4 Model
4.1 A Workflow Model
A workflowW is represented by a directed acyclic graph G = (V,E) , where V = D ∪ C ∪ J , and E represents the edge that maintains execution precedence constraints. Having a directed edge from jx to jy means that jy cannot start to execute until jx is completed. Having a directed edge from dx to jx means that job jx cannot start to execute until file is transferred or made available to the execution host executing the job. The components are described as follows:
1. A set of jobs J = {j1, j2, . . . , jn}
2. A set of files F = {f1, f2, . . . , fn}
3. A set of compute-hosts C = {c1, c2, . . . , cn}
4. A set of data-hosts D = {d1, d2, . . . , dn}
A job jx requires a set of files Fx = {f1, f2, . . . , fn} to be staged in for execution. In the set of files, we denote ft k as temporary file and ff k as fixed file to distinguish files produced as a result of execution of a job and those files already hosted by data-hosts, respectively. File ff k is hosted by multiple data-hosts.
4.2 Resource Model
A compute resource is a high performance computing platform such as a cluster which has a 'head' node that manages a batch job submission system. Each computehost has its own storage constrained data-host. There exists data-hosts that are only for storage purposes and do not provide computation. The communication cost between the compute-host and its own data-host is local and thus minimal in comparison to the communication cost between different hosts [22].
4.1 A Workflow Model
A workflowW is represented by a directed acyclic graph G = (V,E) , where V = D ∪ C ∪ J , and E represents the edge that maintains execution precedence constraints. Having a directed edge from jx to jy means that jy cannot start to execute until jx is completed. Having a directed edge from dx to jx means that job jx cannot start to execute until file is transferred or made available to the execution host executing the job. The components are described as follows:
1. A set of jobs J = {j1, j2, . . . , jn}
2. A set of files F = {f1, f2, . . . , fn}
3. A set of compute-hosts C = {c1, c2, . . . , cn}
4. A set of data-hosts D = {d1, d2, . . . , dn}
A job jx requires a set of files Fx = {f1, f2, . . . , fn} to be staged in for execution. In the set of files, we denote ft k as temporary file and ff k as fixed file to distinguish files produced as a result of execution of a job and those files already hosted by data-hosts, respectively. File ff k is hosted by multiple data-hosts.
4.2 Resource Model
A compute resource is a high performance computing platform such as a cluster which has a 'head' node that manages a batch job submission system. Each computehost has its own storage constrained data-host. There exists data-hosts that are only for storage purposes and do not provide computation. The communication cost between the compute-host and its own data-host is local and thus minimal in comparison to the communication cost between different hosts [22].