INPUT FILE FORMATS


POPSTAR supports instances in eight different formats. Instances of the p-median problem can be handled as points on the plane, points on the surface of a sphere (e.g. the Earth), graphs, arbitrary distance matrices, and implicit instances (in which most elements in the distance matrix are infinity). There is also one format to handle arbitrary distance matrices for the uncapacitated facility location problem. Finally, there algorithm also accepts instances of the maximum set covering problem (which are internally reduced to the p-median problem).

The table below summarizes the formats.

extension problem
type
pmm
p-median arbitrary matrix
geo
p-median points on the sphere
pmi
p-median points on the plane
dimacs
p-median graphs
tsp
p-median points on the plane
msc
maximum cover generic
imp
p-median implicit p-median
ufl
facility location arbitrary matrix

For most p-median formats the value of p must be specified in the command line (with the -p option). Even when the input file itself contains the value of p, the user may override it using the command line. For uncapacitated facility location problem, the -p option has no effect.


DIMACS FORMAT (GRAPHS)

Graph files must have extension 'dimacs' and be in extended want DIMACS format. The first line in the file must have the following form:

p <n> <m> <p>
where <n> is the number of vertices, <m> the number of edges, and <p> the number of facilities to open [this number can be changed in the command line using the -centers option]. This line must be  followed by exactly <m> lines of the form:

e <v1> <v2> <cost>
representing an edge between vertices <v1> and <v2> whose cost is <cost>. Vertices must have labels between 1 and n (the total number of vertices).

An example (a graph with 4 vertices, 5 edges, and 2 centers):

p 4 5 2
e 1 2 75
e 2 3 120
e 3 4 60
e 4 1 3
e 4 2 17

TSP FORMAT (POINTS ON THE PLANE)

This format (extension 'tsp') is similar to the format used to represent TSP instances in the TSPLIB. Every point is represents both a user and a potential facility. Each point is represented by a line with the following format:

<i> <x> <y>

In this expression, <i> is the label of the point (an integer between 1 and the total number of points) and <x> and <y> are its coordinates. Any double precision value is allowed for coordinates.

Before the list of points, there must be a line containing the total number of points (<n>) with the following format:

DIMENSION : <n>

An example with 4 points:

DIMENSION : 4
1 1.5 -2.5
2 0 0.999999999
3 1500 1283
4 1 34123


PMI FORMAT (POINTS ON THE PLANE)

This format (extension 'pmi') applies to points on the plane, but in which the set of users is not necessarily the same as the set of potential facilities.

The first line of the file must contain the total number of users (n) and the number of facilities to open (m):

p <n> <m>

There must be <m> lines describing facilities, each with the following format:

f <i> <x> <y>

<i> is the facility label (an integer between 1 and <m>), <x> is its x-coordinate (a double) and <y> is its y-coordinate (also a double).

Similarly, there must be <n> lines describing customers, formatted as follows:

c <i> <x> <y>

Here, <i>, <x>, and <y> mean the same as in lines describing facilities (with <i> between 1 and <n>).

An example with 4 users and 3 facilities:

p 4 3
f 1 0.5 0.5
f 2 0.7 1234
f 3 1834 132
c 1 -12 14
c 2 0.5 0.5
c 3 0 0
c 4 1e5 17

GEO FORMAT (POINTS ON THE SURFACE OF A SPHERE)

This format (extension 'geo') applies to points on the Earth (or any sphere, with adjustments to the distance). Facilities and users are given by their latitudes and longitudes, in degrees, minutes, and seconds.

The first line in the file contains the number of users <n> and the number of facilities <m>, with the following format:

p <n> <m>

There must be <n> lines specifying customers, all with the following format:

c <i> <lat_d> <lat_m> <lat_s> <N|S> <lon_d> <lon_m> <lon_s> <E|W>

In this expression, <i> is the vertex number (between 1 and <n>).

Latitues are reprsenting by the next four fields: <lat_d>, <lat_m>, and <lat_s> are the number of degrees, minutes, and seconds (all integers), and the must be followed by either N (north) or S (south).

Longitudes are represented in a similar way by the last 4 fields (E and W representing east and west).

There must be <m> lines representing the facilities. They have the exact same format as the lines that describe customers, with f as the first letter (instead of c).

An example with 4 users and 3 facilities:

p 4 3
f 1 45 59 0 N 14 27 19 E
f 2 0 0 0 S 150 10 12 W
f 3 89 59 59 S 12 27 19 E
c 1 1 2 3 S 4 5 6 E
c 2 10 20 30 S 40 50 59 W
c 3 31 41 59 N 2 0 0 E
c 4 45 59 0 N 14 27 19 E

PMM FORMAT (ARBITRARY DISTANCE MATRICES)

This format (extension 'pmm') represents arbitrary distance matrices.

The first line of the file specifies the number of potential users <n> and the number of candidate facilities <m>:

p <n> <m>

The file must specify the distance from every user (from 1 to <n>) to every facility (from 1 to <m>). Therefore, there must be <m><n> lines with the following format:

a <u> <f> <d>

Where <u> represents the user label, <f>the facility label, and <d> the distance between them (which can be any nonnegative double value).

An example with 4 users and 3 facilities:

p 4 3
a 1 1 4.00
a 1 2 1.23
a 1 3 2.34
a 2 1 17.01
a 2 2 3
a 2 3 3.141592
a 3 1 1e10
a 3 2 0
a 3 3 17
a 4 1 21341234
a 4 2 17.1
a 4 3 0.00000001

MSC FORMAT (MAXIMUM SET COVER)

This format (extension 'msc') describes instances to the maximum set cover problem. It defines a set of elements and a set of potential sets that covers them. Each element has a weight, and (in its standard version) the algorithm tries to choose p sets so as to maximize the total weight of the elements covered (minus the cost of opening the corresponding sets). It is also possible to associate a "service cost" to each pair (element,set), but only implicitly. The user may associate a factor with each set and each element. The service cost will be the product of those factors; all factors have a default value of zero.

Internally, the program performs a reduction of the maximum set cover problem to the p-median problem. It is a straightforward reduction, and requires one extra set (to take care of "uncovered" elements) and one extra user (to make sure the extra set will be selected). We stress that this reduction is done by the program itself: the input should be just a usual set cover instance.

The first line of the file specifies the total number of elements in the universe <ecount> and the number of candidate sets <scount>:

p <ecount> <scount>

Each element in the universe must be characterized by a line in the following format:

e <i> <w> <c> [<f>]
In this expression <i> is the element number, <w> is its weight, and <c> is the number of sets in the universe that cover it. The last paramter (<f>) is the factor associated with the element (if not given, the default is zero). All parameters must be nonnegative.

There is also one line describing each potential set:

s <i> <c> <o> [<f>]
In this expression <i> is the set number, <c> is the number elements it covers, and <o> is its opening cost. The last parameter (<f>) is the factor associated with the element (the default is zero). All parameters must be nonnegative.

After all e and s lines, coverage information is presented. Each line has the following format:

c <e> <s>
This line means that element <e> covers set <s>.

An example with 4 sets and 6 elements:

p 6 4
e 1 1.23 3 0.1
e 2 2 1
e 3 3.1 1
e 4 100 1 100
e 5 .23 2 18
e 6 14 1 0.001
s 1 2 25
s 2 4 26 1.1
s 3 1 27.1 19
s 4 2 28
c 1 3
c 1 2
c 2 1
c 3 2
c 4 2
c 1 4
c 5 1
c 5 4
c 6 2

IMP FORMAT (IMPLICIT P-MEDIAN)

This format (extension 'imp') is used for instances in which most of the distances are infinite. Its syntax is exactly the same as in the MSC format. The difference is only on how the program deals with the data: in the case of the MSC problem, it actually creates an extra set to take care of "uncovered" elements. Since this does not happen with IMP, there will be cases in which the solution will have an infinite value.


UFL FORMAT (FACILITY LOCATION)

This format (extension 'ufl') reprsents facility location problems defined by arbitrary distance matrix. The format is derived from pmm, with additional lines to represent the cost of facilities.

The first line of the file specifies the number of potential users <n> and the number of candidate facilities <m>:

p <n> <m>

There must be <m> lines describing the facilities themselves; each must have the following format:

f <i> <s>
Here, <i> is the facility number and <s> is its setup cost.

The file must specify the distance from every user (from 1 to <n>)to every facility (from 1 to <m>). Therefore, there must be <m><n> lines with the following format:

a <u> <f> <d>

Where <u> represents the user label, <f>the facility label, and <d> the distance between them (which can be any nonnegative double value).

An example with 4 users and 3 facilities:

p 4 3
f 1 35
f 2 0
f 3 2.71
a 1 1 4.00
a 1 2 1.23
a 1 3 2.34
a 2 1 17.01
a 2 2 3
a 2 3 3.141592
a 3 1 1e10
a 3 2 0
a 3 3 17
a 4 1 21341234
a 4 2 17.1
a 4 3 0.00000001

[ main | installation | command-line parameters | output | input formats | downloads ]