POPSTAR supports instances in eight different formats. Instances of the p-median problem can be handled as points on the plane, points on the surface of a sphere (e.g. the Earth), graphs, arbitrary distance matrices, and implicit instances (in which most elements in the distance matrix are infinity). There is also one format to handle arbitrary distance matrices for the uncapacitated facility location problem. Finally, there algorithm also accepts instances of the maximum set covering problem (which are internally reduced to the p-median problem).
The table below summarizes the formats.
extension | problem | |
p-median | arbitrary matrix | |
p-median | points on the sphere | |
p-median | points on the plane | |
p-median | graphs | |
p-median | points on the plane | |
maximum cover | generic | |
p-median | implicit p-median | |
facility location | arbitrary matrix |
For most p-median formats the value of p must be specified in the command line (with the -p
option). Even when the input file itself contains the value of p, the user may override it using the command line. For uncapacitated facility location problem, the -p
option has no effect.
Graph files must have extension 'DIMACS FORMAT (GRAPHS)
dimacs
' and be in extended want DIMACS format. The first line in the file must have the following form:
where <n> is the number of vertices, <m> the number of edges, and <p> the number of facilities to open [this number can be changed in the command line using the -centers option]. This line must be followed by exactly <m> lines of the form:p <n> <m> <p>
representing an edge between verticese <v1> <v2> <cost>
<v1>
and <v2>
whose cost is <cost>
. Vertices must have labels between 1 and n
(the total number of vertices).
An example (a graph with 4 vertices, 5 edges, and 2 centers):
p 4 5 2
e 1 2 75
e 2 3 120
e 3 4 60
e 4 1 3
e 4 2 17
This format (extension ' In this expression, Before the list of points, there must be a line containing the total number of points ( An example with 4 points:
This format (extension ' The first line of the file must contain the total number of users ( There must be Similarly, there must be Here, An example with 4 users and 3 facilities:
This format (extension ' The first line in the file contains the number of users <n> and the number of facilities <m>, with the following format: There must be In this expression, Latitues are reprsenting by the next four fields: Longitudes are represented in a similar way by the last 4 fields ( There must be An example with 4 users and 3 facilities:
This format (extension ' The first line of the file specifies the number of potential users The file must specify the distance from every user (from 1 to Where An example with 4 users and 3 facilities:
This format (extension ' Internally, the program performs a reduction of the maximum set cover problem to the p-median problem. It is a straightforward reduction, and requires one extra set (to take care of "uncovered" elements) and one extra user (to make sure the extra set will be selected). We stress that this reduction is done by the program itself: the input should be just a usual set cover instance.
The first line of the file specifies the total number of elements in the universe Each element in the universe must be characterized by a line in the following format:
There is also one line describing each potential set:
After all An example with 4 sets and 6 elements:
This format (extension '
This format (extension ' The first line of the file specifies the number of potential users There must be The file must specify the distance from every user (from 1 to Where An example with 4 users and 3 facilities:TSP FORMAT (POINTS ON THE PLANE)
tsp
') is similar to the format used to represent TSP instances in the TSPLIB. Every point is represents both a user and a potential facility. Each
point is represented by a line with the following format:
<i> <x> <y>
<i>
is the label of the point (an integer between 1 and the total number of points) and <x> and <y> are its coordinates. Any double precision value is allowed for coordinates.
<n>
) with the following format:
DIMENSION : <n>
DIMENSION : 4
1 1.5 -2.5
2 0 0.999999999
3 1500 1283
4 1 34123
PMI FORMAT (POINTS ON THE PLANE)
pmi
') applies to points on the plane, but in which the set of users is not necessarily the same as the set of potential facilities.
n
) and the number of facilities to open (m
):
p <n> <m>
<m>
lines describing facilities, each with the following format:
f <i> <x> <y>
<i>
is the facility label (an integer between 1 and <m>
), <x>
is its x-coordinate (a double) and <y>
is its y-coordinate (also a double).<n>
lines describing customers, formatted as follows:
c <i> <x> <y>
<i>
, <x>
, and <y>
mean the same as in lines describing facilities (with <i>
between 1
and <n>
).
p 4 3
f 1 0.5 0.5
f 2 0.7 1234
f 3 1834 132
c 1 -12 14
c 2 0.5 0.5
c 3 0 0
c 4 1e5 17
GEO FORMAT (POINTS ON THE SURFACE OF A SPHERE)
geo
') applies to points on the Earth (or any sphere, with adjustments to the distance). Facilities and users are given by their latitudes and longitudes, in degrees, minutes, and seconds.
p <n> <m>
<n>
lines specifying customers, all with the following format:
c <i> <lat_d> <lat_m> <lat_s> <N|S> <lon_d> <lon_m> <lon_s> <E|W>
<i>
is the vertex number (between 1 and <n>
).
<lat_d>
, <lat_m>
, and <lat_s>
are the number of degrees, minutes, and seconds (all integers), and the must be followed by either N
(north) or S
(south).
E
and W
representing east and west).
<m>
lines representing the facilities. They have the exact same format as the lines that describe customers,
with f
as the first letter (instead of c
).
p 4 3
f 1 45 59 0 N 14 27 19 E
f 2 0 0 0 S 150 10 12 W
f 3 89 59 59 S 12 27 19 E
c 1 1 2 3 S 4 5 6 E
c 2 10 20 30 S 40 50 59 W
c 3 31 41 59 N 2 0 0 E
c 4 45 59 0 N 14 27 19 E
PMM FORMAT (ARBITRARY DISTANCE MATRICES)
pmm
') represents arbitrary distance matrices.<n>
and the number of candidate facilities <m>
:
p <n> <m>
<n>
) to every facility (from 1 to <m>
). Therefore, there
must be <m><n>
lines with the following format:
a <u> <f> <d>
<u>
represents the user label, <f>the facility label, and <d>
the distance between them
(which can be any nonnegative double value).
p 4 3
a 1 1 4.00
a 1 2 1.23
a 1 3 2.34
a 2 1 17.01
a 2 2 3
a 2 3 3.141592
a 3 1 1e10
a 3 2 0
a 3 3 17
a 4 1 21341234
a 4 2 17.1
a 4 3 0.00000001
MSC FORMAT (MAXIMUM SET COVER)
msc
') describes instances to the maximum set cover problem. It defines a set of elements and a set of potential sets that covers them. Each element has a weight, and (in its standard version) the algorithm tries to choose p sets so as to maximize the total weight of the elements covered (minus the cost of opening the corresponding sets). It is also possible to associate a "service cost" to each pair (element,set), but only implicitly. The user may associate a factor with each set and each element. The service cost will be the product of those factors; all factors have a default value of zero.
<ecount>
and the number of candidate sets <scount>
:
p <ecount> <scount>
In this expression
e <i> <w> <c> [<f>]
<i>
is the element number, <w>
is its weight, and <c>
is the number of sets in the universe that cover it. The last paramter (<f>
) is the factor associated with the element (if not given, the default is zero). All parameters must be nonnegative.
In this expression
s <i> <c> <o> [<f>]
<i>
is the set number, <c>
is the number elements it covers, and <o>
is its opening cost. The last parameter (<f>
) is the factor associated with the element (the default is zero). All parameters must be nonnegative.
e
and s
lines, coverage information is presented. Each line has the following format:
This line means that element
c <e> <s>
<e>
covers set <s>
.
p 6 4
e 1 1.23 3 0.1
e 2 2 1
e 3 3.1 1
e 4 100 1 100
e 5 .23 2 18
e 6 14 1 0.001
s 1 2 25
s 2 4 26 1.1
s 3 1 27.1 19
s 4 2 28
c 1 3
c 1 2
c 2 1
c 3 2
c 4 2
c 1 4
c 5 1
c 5 4
c 6 2
IMP FORMAT (IMPLICIT P-MEDIAN)
imp
') is used for instances in which most of the distances are infinite. Its syntax is exactly the same as in the MSC format. The difference is only on how the program deals with the data: in the case of the MSC problem, it actually creates an extra set to take care of "uncovered" elements. Since this does not happen with IMP, there will be cases in which the solution will have an infinite value.
UFL FORMAT (FACILITY LOCATION)
ufl
') reprsents facility location problems defined by arbitrary distance matrix. The format is derived from pmm, with additional lines to represent the cost of facilities.
<n>
and the number of candidate facilities <m>
:
p <n> <m>
<m>
lines describing the facilities themselves; each must have the following format:
Here,
f <i> <s>
<i>
is the facility number and <s>
is its setup cost.
<n>
)to every facility (from 1 to <m>
). Therefore, there
must be <m><n>
lines with the following format:
a <u> <f> <d>
<u>
represents the user label, <f>the facility label, and <d>
the distance between them
(which can be any nonnegative double value).
p 4 3
f 1 35
f 2 0
f 3 2.71
a 1 1 4.00
a 1 2 1.23
a 1 3 2.34
a 2 1 17.01
a 2 2 3
a 2 3 3.141592
a 3 1 1e10
a 3 2 0
a 3 3 17
a 4 1 21341234
a 4 2 17.1
a 4 3 0.00000001