CS 422,
Fall 2002 Instructor: Jeffrey Horn
(1) (40 points) TELEPATHIC TEAMS
You are the Chief Algorithmist of the planetary explorer Turing's Machine. Your ship has N humanoid scouts from many different alien races, each of which can communicate well with certain other races. Your commander gives you a list of all the pairs of individual scouts who can communicate telepathically with each other. She then commands you to write a program that will help her form teams of five scouts to send out on reconnaissance missions. Her requirement is that if she needs to send out several teams at one time, they should all be able to communicate telepathetically with each other, by having at least one member of each 5 "person" team able to communicate telepathically with at least one member of each other team. Having dabbled in programming herself, she instructs you to design your program thusly: (1) first create a graph G in which each vertex represents one of the n scouts, and each edge i,j means that scout i and scout j can communicate telepathetically; (2) then from G create another graph G' in which each vertex represents one of the possible five person teams; (3) then make an edge in G' between vertex i and j if and only if the corresponding teams can communicate through a telepathically linked pair, and teams i and j are distinct (i.e., nobody is on both teams). Now she wants you to write code to find the largest group of teams that can communicate with each other (in other words, any pair of teams in the group can communicate telepathically). Now answer these questions:
- Given that G has n vertices and, say, e edges, how many vertices (call it n') and edges (call it e') does G' have? (For n', recall the formula for "n choose 5" from discrete math. ) (For e', if you cannot find an exact formula, then at least try to bound it. E.g., what is the maximum number of edges in a graph?)
n' e' - Look at your answers above. Is the rate of growth for n' polynomial in n? Or is it exponential in n?
- Now suppose that instead of five member teams, your commander wants your code to be able to teams of size n/10 (assume for simplicity that n is always a multiple of 10). Now how fast does n' grow (in n)? Is THIS polynomial or exponential in n?
- For this part of the question, assume that n' DOES indeed grow exponentially in n. Can you now say that the problem you have been given is NP-complete or not? Explain! (Consider: You know that MAX-CLIQUE is NP-complete, but this problem has special constraints on which edges are allowed.)
- Now suppose that instead of five member teams, your commander wants your code to be able to handle any number k size team. Prove that THIS problem IS NP-complete (reduce MAX-Clique to this problem).
- Fill in the blank: If G has a clique of size h, then G' must also have (at least) one clique of size h, but we can only conclude this IF n > ______ .
You know that HAMILTON PATH is NP-complete. The enumerative algorithm is try every ordering of the n vertices of the give graph G to see if any form a complete path. But suppose your brash, young subordinate thinks he has found an algorithm to solve HP much faster than the n! time that the enumerative approach takes. His algorithm is to partition the graph G in half (assume for simplicity's sake that n is even), then find a Hamilton Path for each of the smaller problems, then connect them. He claims this will run in time (n/2)! + (n/2)!, which is less than n!. Answer the following questions:
- If he is right, does that mean that P=NP? Why or why not?
- Is he right? Would his algorithm work? If not, how could you "patch it up" to make it work? What is the correct running time of the correct algorithm (his, if correct, or yours, if you patched his up)?
INSERTION SORT is O(n^2). Actually it is Big Theta(n^2), since n^2 is a tight bound. Recall the invariant of the algorithm: given an array A of n values, initially unsorted, at any given iteration k of the main loop, A[0]..A[k-1] is sorted, and A[k]..A[n-1] is not sorted. Then each time through the loop we increment k by one, by inserting A[k] into its proper place in A[0]..A[k], which involves a linear search of A[0]..A[k-1], followed by a shifting right of values in the array A[i]..A[k], assuming A[k] needs to be inserted at A[i].
Now consider replacing the linear search in the above with a binary search, which as you know is O(lg n) (where "lg" is log base 2, and n is the length of the SORTED array to be searched).
- How does this speed up the insertion sort algorithm? Analyze its run time now, in terms of the number of comparisons (of two values in the array):
- How about the number of copies (that is, copying of values into the array)?
- Now assume we are sorting a linked list rather than an array. We can move values around in a linked list just be swapping a three or four pointers each time. So now we don't incur a bunch of copies each time we need to insert A[k] into A[0]..A[k-1]. Would binary search make more sense NOW than it did for insertion sort with an array? Why or why not? What extra cost does one pay for using binary search here (over using linear search)?
Answer questions 1.4-1 and 1.4-2 from page 17 of our text.
1.4-1
1.4-2
Consider the code in Part 2 of this final. In particular, consider the boolean method "cliqueQ(int [] list_of_vertices)" which takes as an argument a list of vertices and returns true if and only if the set of vertices constitutes a clique in the graph G represented in the adjacency matrix edges[][]. Now consider using an adjacency list instead of the matrix. Recall that an adjacency list is simply an array of linked lists, in which list i corresponds to vertex i and contains a list of the vertices adjacent to i. Each list is not necessarily ordered or sorted, and consists of cells containing two items: a vertex and a pointer to the next cell (this pointer has value null if the cell is the end of the list). Now let's compare the two types of graph representations.
- First, storage costs. Assume that a vertex is stored in a complex object of type Vertex, but of course we only store the pointers (references, in Java-speak) to such objects in and adjacency list. So we can calculate the storage cost in terms of numbers of pointers stored. Assume that a pointer takes as much storage as an integer (which is what we use to represent edges inthe adj. matrix). Now for a graph G of n vertices, the adjacency matrix will be of size n by n, while the adjacency list will use up an array of pointers to lists, plus two pointers for each cell in each list, with one cell for each vertex. Assume a graph G with n vertices and edge density d (where 0 <= d <= 1).
- What is the EXACT storage cost, in terms of the number of pointers/integers stored, for the adjacency list representation, and for the adjacency matrix representation? (You will need to calculate the number of edges in G. You can do this given d and n and knowing the maximum number of possible edges in a graph with n vertices. And don't forget to count the pointers in the array of linked lists for the adj. list representation!)
- For what value of d, if any, will the costs be equal? Let this be called d*. For d<d*, which graph representation uses less storage? For d>d*?
- Now for run-time cost. We'll do average case, so you'll need to know the average number of edges per vertex (to find out the average length of the linked lists in the adj. list rep.). You can calculate this knowing d, and n. Next, assume we are measuring time in units of "memory accesses". This means following a pointer, as in going to the next cell of a linked list, or looking up the value of an integer in a multi-dimensional array of int (as in the adj. matrix). What is the average case cost of a call to cliqueQ with an array of k vertices, where 0<k<=n for graph G of n vertices, for the adj. matrix rep. versus for the adj. list rep.? I want actual cost (i.e., T( f(n,d,k)) ). Both answers can be in terms of n, d, and k, although not all three might be needed.