The software in this package is described in the paper

C. Blum, M.J. Blesa, and M. López-Ibáñez. Beam Search for the Longest Common Subsequence Problem. Computers & Operations Research, 36(12), 3178-3186, 2009

IMPORTANT NOTE: When applying this algorithm with the parameter settings as outlined in the paper to the problem instances used in the paper, you will get, in most cases, a slightly different result, which is sometimes a little worse and sometimes a little better than the one reported in the paper. The reason for that is as follows. After the publication of the paper we noticed that our program gave slightly different results when applied to the same problem instance over and over again. This was despite of the fact that the algorithm is deterministic. Finally, we found an implicit stochastisity caused by the fact that our initial implementation used STL maps of pointers to objects. However, as the heap constantly changes, the same objects have different pointers in different runs of the algorithm. Therefore, iterators over these maps treated the objects in changing orders. This "problem" is fixed in the version of the code that you have at hand. The algorithm is now "really deterministic". However, as mentioned above, this implies that you won't get exactly the same results as reported in the paper. 


COMILING THE CODE

Simply type "make". This will produce the executable "lcs-pbs"


USAGE OF lcs-pbs:

./lcs-pbs <instance> -beamwidth <X> -mu <Y> -filter {yes,no}

* <instance> is the file that contains the problem instance

* The beamwidth <X> is an integer greater or equal to 2. A good value  to start with is 10.

* Paramter <Y> is a double greater or equal to 1.0. A good value to start with is 2.0 or 3.0. It determines the number of children that may be chosen at each construction step. Afterwards the upper bound is used to reduce this set to the best <X> children with respect to the upper bound.

* We recommend to use always "-filter yes". 


EXAMPLE

In file "test-instance.dat" you find a test instance with 10 strings of length 100 over an alphabet of size 4. You may execute the following command:

./lcs-pbs test-instance.dat -filter yes -mu 3.0 -beamwidth 10

This will produce the following output:

test-instance.dat       33      0.022
daadbcddaaabddadaaddcdbbaadcdbbdb

This means that the length of the computed solution is 33, the time to compute this solution was 0.022 seconds, and the solution itself is printed below. Note that the software always checks if the computed solution is valid. In other words, you can be sure that the produced solution is valid.


FORMAT OF THE PROBLEM INSTANCES

The first line is as follows:

<n_of_sequences>	<alphabet_size>

Then, for each sequence, follows a line of the following format:

<sequence_length>	<sequence>


CONTACT

In case of problems, comments, suggestions, etc, please contact Christian Blum (christian.c.blum@gmail.com)
