HCSP instances

Although the research community has faced the HCSP in the past, there do not exist standardized problem benchmarks or test suites [Theys et al., 2000].

Instances from Braun et al. (2000)

Ali et al. [2000] proposed two methods for designing ETC matrices to represent different HC scheduling scenarios: the range based and the coefficient of variation methods. Both methods account for the three key properties of ETC model (machine heterogeneity, task heterogeneity and consistence), and they only differ in the mathematical model used to define the heterogeneity measure for tasks and machines. The range based method defines two ranges for heterogeneity: (1,RMACH) for machine heterogeneity, and (1,RTASK) for task heterogeneity. Heterogeneity values for machines (τM) and tasks τT are then randomly sampled using a uniform distribution, and the expected time to compute task i in machine j is defined as ETC(i,j) = τT (i) × τM (j). Ali et al. suggested the values RMACH = 10 and RMACH = 100 (low and high machine heterogeneity, respectively), and RTASK = 10 and RTASK = 100000 (low and high task heterogeneity, respectively), selected to model scenarios for the Management System for Heterogeneous Networks Project. The coefficient of variation method uses the quotient between the standard deviation and the mean of execution times values as a measure of machine and task heterogeneity. The expected time to compute values are randomly generated using the uniform distribution or the gamma distribution, which provides an accurate method for approximating the real behavior of HC environments.

In their comparison of eleven static heuristics for the HC scheduling problem, Braun et al. [2001] presented a test suite of twelve problem instances generated using the range based method by Ali et al. Those twelve scenarios have a dimension of 512 tasks and 16 machines, and combine the three ETC model properties (high/low task heterogeneity, high/low machine heterogeneity, and each type of consistency). Braun et al. used a modified version of Ali et al. range based method, using different upper bounds for machine and task heterogeneity intervals, defined by the parameters values RMACH = 10 and RMACH = 1000 (low and high machine heterogeneity, respectively), and RTASK = 100 and RTASK = 3000 (low and high task heterogeneity, respectively). The authors mentioned that their parameter values election was quite arbitrary (they were selected to model several characteristics of the prediction methods used in Armstrong et al. [1998]), and even encouraged researchers to use their own values to generate instances that model specific situations of interest. However, the twelve instances by Braun et al. have become a de facto standard benchmark suite for the evaluation of algorithms applied to solve the HCSP. The instances from Braun et al. are labeled using a name with the pattern d_c_MHTH.0, where d stands for the distribution function used to generate the heterogeneity values (u, for the uniform distribution), c indicates the consistency type (c for consistent, i for inconsistent, and s for semiconsistent). The next characters HT and MT indicate the level of heterogeneity for tasks and machines respectively (lo and hi for low and high heterogeneity, respectively). The final number 0 refers to the number of test case (several benchmark suites were generated by Braun et al., but only the “class 0” instances gained popularity).

New HCSP instances

Besides the previous proposals, there have been little effort to define a standard set suite for HC scheduling. Even today, when HC and grid scheduling have been the focus of many works, researchers have been using the instances from Braun et al. or proprietary instances, often generated without following a methodological basis. In order to study the scalability of evolutionary algorithms to solve the HCSP as the instances dimension grows, a test suite of randomly generated HC scheduling instances was designed in this project, following the methodology proposed by Ali et al. The random generator is implemented in the C language, uses the Standard C library (stdlib.h) and the standard C mathematical library (math.h), and do not require any additional software. The generator implements both the range based and the coefficient of variation methods from Ali et al., regarding several scenario parameters such as dimension (number of tasks and machines), task and machine heterogeneity, consistency, ETC model (Ali et al. or Braun et al.) and duration type (real or integer). The input parameters for the HC scheduling instances generator are presented in Fig.1.

Text Box: Syntaxis: generator <NT> <NM> <TH> <TM> <cons> [model] [type] [seed]
Required parameters:
<NT>: number of tasks.
<NM>: number of machines.
<TH>: task heterogeneity level (0-Low, 1-High).
<MH>: machine heterogeneity level (0-Low, 1-High).
<cons>: consistency type.
		(0-Consistent, 1-Semiconsistent, 2-Inconsistent).
Optional parameters:
[model]: heterogeneity model 
		(0-Ali et al., 1-Braun et al.). 
		Braun et al. model assumed as the default option. 
		Heterogeneity ranks (tasks, machines): 
			Ali et al. (10-100000,10-100), 
			Braun et al. (100-3000,10-1000).
[type]: task execution time type.
		(0-real, 1-integer). Real type assumed as default.
[seed]: random number generator seed.

Fig. 1: Details of HCSP instances generator.

The output file format is similar to the one employed by Braun et al.: a column vector of N×M real numbers (or integers) that represents the ETC matrix ordered by task identifier, as presented in Fig. 2.

Text Box: ETC(t1,m1)
ETC(t1,m2)
. . .
ETC(t1,mM)
ETC(t2,m1)
ETC(t2,m2)
. . .
ETC(tT,m1)
. . .

Fig. 2: Format of HCSP instance file.

The generated test suite include HCSP instances with diverse complexity. The small-sized instances simply extend Braun et al. standard HC problems up to 2048 tasks and 64 machines. The medium-sized instances includes up to 8192 tasks and 256 machines, and are considered as representative of large multi-processor, medium-size clusters od computers, and small grid systems. The large-sized group of instances includes up to 32768 tasks and 1024 machines, a dimension considered as representative of large clusters and medium-size grid systems. The details of the instances are presented in Table 1. For each dimension, 24 problems instances were generated regarding all the heterogeneity and consistency combinations, twelve of them using Ali et al.’s parameter values, and the other twelve using Braun et al.’s parameter values. The instances are named according to Braun et al.’s convention, but the name follows the pattern M_d_c_MHTH, where the first letter (M) describes the heterogeneity model used (A for Ali et al., and B for Braun et al.). The number 0 in the last position of the name was removed, since only one instance was generated for each combination of dimension, heterogeneity model, task and machine, and consistency (a total number of 144 instances were generated). The duration type parameter was incorporated trying to reduce the file size for the largest instances (large size instances have integer ETC values). The generator program, parameters and instances can be downloaded in the download page.

Table 1: HCSP instances details.

HCSP

Heterogeneous Computing Scheduling Problem

Grid

HSCP instances