The Efficiency Of Algorithms

Скачать реферат: The Efficiency Of Algorithms

Some mathematical problems can be solved only by methods too slow for even the fastest computers. More efficient methods have not been found, but neither has it been proved, that there are no better methods by Harry R.Lewis and Christos H. Paradimitrou

Suppose you were asked to plan an itinerary for a traveling salesman who must visit a number of cities. You are given a map on which the distances between the cities are marked and you are asked to find the shortest route that passes through all the cities and returns to the starting point. An approach to this problem that is certain to give the correct answer is to trace all the possible routes, measure their length and pick the shortest one. If the tour included more than a few cities, however, hundreds or thousands of routes would have to be checked. If there were 100 cities, then even the fastest computers would require weeks of calculation to find the shortest path.

In the search for a quicker solution you might try some less rigorous methods. One idea that seems reasonable is always to visit nearby cities before going on to those farther away. You would soon discover, however, that this procedure does not invariably yield the correct answer. Other shortcuts also fail. In fact, the best methods known for solving the problem are not much better than the obvious but laborious procedure of checking all the possible itineraries. Mathematicians now suspect that this problem and many others like it may forever remain beyond our ability to solve in any efficient way. That speculation itself, however, is unconfirmed; although no faster methods of solution have been found, neither has it been proved that faster methods do not exist.

In the problem of the traveling salesman's tour it is not the solution for a particular set of cities that is of the greatest importance but a general method for finding the solution for any cities. Such a method is called an algorithm: it is a precisely stated procedure or set of instructions that can be applied in the same way to all instances of a problem. If the problem is to be solved with the aid of a computer, an algorithm is indispensable, because only those procedures that can be stated in the explicit and unambiguous form of an algorithm can be presented to a computer. Instructions that are vague or that rely on intuition are unacceptable.

An example of an algorithm is the procedure taught in the schools for the subtraction of whole numbers. If each of the steps in this procedure is applied correctly one at a time, the algorithm will always yield the correct result. What is more, once the algorithm has been learned or stored in the memory of a computer or embodied in the circuitry of an electronic calculator, it can be applied to an infinite set of subtraction problems. With this one algorithm the difference between any two whole numbers can be determined.

In principle any problem for which an algorithm can be devised can be solved mechanically. It may therefore seem surprising that there are problems for which algorithms exist but for which we so far have no practical general solution. The algorithms for solving these problems always give a correct answer, but they often require an inordinate amount of time. The problem of the traveling salesman's tour is among these intractable tasks.

The efficiency of computer algorithms is a topic of obvious practical importance. It is also of interest in more formal areas of mathematics. There are some problems in mathematics and logic for which no algorithm can ever be written, and there are many others for which efficient, fast algorithms are already known. Between these two groups is a third class of problems that can always be solved in principle, but for which there are only inefficient (and therefore largely unusable) algorithms. For some of these difficult problems mathematicians have been able to demonstrate that efficient algorithms can never be designed. For many of the most important problems, however, there is only the suspicion that good algorithms are impossible.

A given problem can have more than one algorithm for its solution. For example, children in Europe learn a procedure for subtraction slightly different from the one taught in the U.S. Both of the subtraction algorithms, however, give the same result in the same amount of time. That is not invariably the case with different algorithms for solving the same problem. One celebrated problem that can be solved by either a "fast" algorithm or a "slow" one is the problem of the Koenigsberg bridges.

In the I8th-century German city of Koenigsberg (which is now the Russian city of Kaliningrad) a park was built on the banks of the river Pregel and on two islands in the river. Within the park seven bridges connected the islands and the riverbanks. A popular puzzle of the time asked if it was possible to walk through the park by a route that crossed each of the bridges once and only once.

For the solution of the problem the size and shape of the islands and the length of the bridges are immaterial: the only essential information is the pattern of interconnections. This information can be presented compactly in the mathematical structure known as a graph, which is merely a set of points with lines drawn to join them. In the case of the Koenigsberg park each of the riverbanks, and each of the islands is condensed to a single point, and each of the bridges is represented by a line between two points. Thus the graph consists of four points and seven lines. If the lines are labeled, any path through the park can be specified by a simple listing of labels.

The obvious approach to the problem is to list all the paths that cross all the bridges and to eliminate from consideration those that cross any bridge more than once. This is the technique of exhaustive search, similar to the one employed in the problem of the traveling salesman. When the mathematician Leonard Euler was presented with the problem of the Koenigsberg bridges, he recognized the limitations of the technique and found another method. In recognition of his contribution a path that traverses each line of a graph exactly once is now called an Eulerian path.

Euler wrote: "The particular problem of the seven bridges of Koenigsberg could be solved by carefully tabulating all possible paths, thereby ascertaining by inspection which of them, if any, met the requirement. This method of solution, however, is too tedious and too difficult because of the large number of possible combinations, and in other problems where many more bridges are involved it could not be used at all."

Euler's alternative method is much simpler. He showed that a tour of the kind sought must exist if the graph meets two conditions. First, it must be possible to go from any point in the graph to any other point by following the lines of the graph: in other words, the graph may not be disconnected. Second, every point of the graph, with two possible exceptions, must be at the junction of an even number of lines.

It is not hard to understand why a graph cannot have an Eulerian path unless it meets these conditions. All regions of the graph must be connected to one another if there is to be any path that traverses all the lines. Each point must have an even number of lines because half of them are required to reach the point and the other half to leave it. Two points with an odd number of lines can be allowed if they are chosen as the starting and finishing points of the path. Demonstrating that any graph that meets these conditions actually has an Eulerian path requires a somewhat more complicated argument, which we shall not present here, but Euler was able to give a rigorous proof.

It is an easy matter to express Euler's solution to this problem in an algorithm that could be executed by a computer. The first requirement, connectivity, can be established by marking some point of the graph, then similarly marking all points connected to it by lines, then all the points connected to the newly marked points, and so on. The graph is connected if at the end all points have been marked. The second requirement is just as easily tested: the machine is instructed to examine each point of the graph and count the number of lines that terminate at that point. If no more than two of the points have an odd number of lines, the graph has an Eulerian path. The park at Koenigsberg met the first condition but not the second, and so there was no Eulerian tour of the seven bridges.

Euler's method is unquestionably the more economical approach to the problem of Koenigsberg bridges: it requires that each point and line of the graph be listed just once, whereas the exhaustive search is not completed until every path that crosses all the bridges has been listed. The number of such paths is much larger than the number of points and lines in the graph. In that sense Euler's method is the better algorithm, but how much better? How can the difference be measured and how can one fell if the difference is significant?

For a graph with only four points and seven lines both techniques are fast enough to be considered practical. Suppose, however, that more islands and bridges were added to the park, or in other words that more points and lines were added to graph. If the problem is being solved by Euler's method, each new point merely adds one item to the list of points that must be checked. If the paths are to be examined by exhaustive search, on the other hand, then with each new point and line the size of the list is multiplied by some factor. A moderate increase in the size of the graph results in an explosive increase in the number of paths. Ultimately the list of paths 'must become prohibitively long.

In this comparison of the two solutions to Euler's problem there is the basis for a completely general method of evaluating the speed or practicality or efficiency of any algorithm. We imagine that the algorithm is supplied with larger and larger inputs, and we note the rate at which the execution time of the algorithm increases. In this way it is possible to make unequivocal judgments of algorithms. Exhaustive search not only is a slower method; in general it is too slow to be of any value. Euler's method remains practical for problems of essentially unlimited size.

As the size of the graphs being examined increases, the lists produced by the method of exhaustive search grow exponentially. Each time some fixed number of points and lines are added to the graph the size of the list doubles. Growth of this kind can be described by a mathematical function such as 2^n, where n is some measure of the size of the graph. Many other functions have similar or even higher rates of growth. Among them are n^n and n! ( which is read as "n factorial" and signifies n multiplied by all the integers between 1 and n). For the purposes of this discussion all these functions can be regarded as having the same property of exponential growth.

Mathematical functions of another kind are known as polynomials. The simplest members of this class are linear functions, such as 3n, which designate a simple relation of proportionality. The time needed to solve the problem of Koenigsberg bridges by Euler's method increases as a linear function of the size of the graph. Other polynomials are n^2, n^3 and so on, and the sums of such functions. What distinguishes polynomials from exponential functions is that n never appears in an exponent.

For sufficiently large values of n any exponential function will overtake and exceed any polynomial function. It was the certainty of this result that dismayed 'Ihomas Malthus when he compared the exponential rate of population growth with the polynomial rate of increase in the food supply. For small values of n a given polynomial function may well exceed a given exponential one, but there is always a value of n beyond which the exponential function is the greater. The exact form of the polynomial makes little difference, except in changing the point at which the polynomial function is overtaken.

It is now generally agreed by computer scientists that algorithms whose execution time increases exponentially as a function of the size of the input are not of practical value. We shall call algorithms of this kind "exponential time" algorithms, or simply inefficient algorithms. The only algorithms that are considered fast or efficient enough for general application are "polynomial time" algorithms.

Of course, even among efficient algorithms some are faster than others, but for the purposes of this discussion it is important only to distinguish polynomial-time algorithms as a class from exponential-time algorithms. Moreover, this system of classification has the advantage that it makes the speed of an algorithm a property of the algorithm itself and independent of the machine on which it is executed. For sufficiently large problems a polynomial-time algorithm executed on even the slowest machine will find an answer sooner than an exponential-time algorithm on the fastest computer.

The mathematical properties of algorithms were studied in the 1930's by the British mathematician A. M. Turing, the inventor of the imaginary computer, now called a Turing machine. The Turing machine was conceived to be an automaton equipped with an infinite supply of paper tape marked off in square regions. The machine was capable of just four actions: it could move the tape one square, it could place a mark in a square, it could erase a mark already present and at the end of a calculation it could halt. These operations were to be performed according to a sequence of instructions built into the internal mechanism. Of course, Turing never built such a machine; it was merely a conceptual device for automatically solving problems in mathematics and logic. Indeed, Turing was interested not in actually solving problems with the machine but rather in investigating what kinds of problems it could solve and what kinds it could not.

Turing discovered that even a machine as simple as this one could solve any problem for which an algorithm could be devised. The computation might be laborious and indirect, but given enough time and paper tape the machine would eventually find the solution and halt. Reduced to its essentials the Turing machine is a language for stating algorithms, as powerful in principle as the more sophisticated languages now employed for communicating with computers.

In addition to conceiving, these machines Turing demonstrated, their limitations. In 1936 he showed that there are problems that cannot be solved by Turing machines, and it follows that these problems cannot be solved by any automatic computer. They are problems for which algorithms cannot be written, even in principle. The example first studied by Turing is the problem of predicting whether a particular Turing machine, once it is set in motion, will ever finish its calculation and halt. Through an analysis of this problem he was able to show that there can be no general procedure for telling whether mathematical propositions are true or false. Since then a variety of other problems with the same properties have been proposed.

One result of Turing's work was to divide all imaginable problems in mathematics into two classes. Those problems for which algorithms can never be written are in a formal sense permanently unsolvable. Some instances of these problems may be solved by rare perception or by luck, but a general method for their solution will never be found.

All other problems in mathematics and logic can be solved by algorithms. As we have seen, however, some algorithms are more useful than others. The class of solvable problems can therefore be divided into two subgroups: those for which there are efficient, polynomial-time algorithms and those for which there are only exponential-time algorithms. Euler's problem is a member of the class with polynomial-time solutions, since Euler's method is itself a polynomial-time algorithm. Problems that can be proved to have only exponential-time algorithms are also known, although they are rather obscure.

Although these two groups of problems are quite distinct, it is not always a straightforward task to assign a problem to one group or the other. Indeed, a very interesting class of problems seems to fall somewhere between them. For these problems no efficient algorithms are known and the best available solutions require exponentially increasing time, yet no one has been able to prove that the problems do not have polynomial-time solutions.

One such problem was considered in the 19th century by the Irish mathematician William Rowan Hamilton. Superficially Hamilton's problem is much like Euler's .The problem is to decide whether a given graph has a path that takes in each point exactly once (whereas Euler looked for a path that traversed each line once). Actually the tasks are quite different, and Euler's method cannot be applied to Hamilton's problem. The graph derived from the plan of the park at Koenigsberg has a Hamiltonian path, although, as we have seen, it has no Eulerian path. On the other hand, removing two lines results in a graph that has an Eulerian path but not a Hamiltonian one. Many other graphs have neither kind of path.

Hamilton's problem can be solved by exhaustive search: indeed, the procedure is not substantially different from that employed in listing all the possible paths that might have the Eulerian property. For Hamilton's problem, however, no efficient algorithm comparable to Euler's method has been found. The problem has been pondered by many of the best mathematicians of the past century, but the most efficient methods available today are fundamentally no better than exhaustive tabulation. On the other hand, all attempts to prove that there is no better method have also failed, and it must be considered a possibility that an efficient algorithm will be discovered tomorrow.

Problems that are known to have polynomial-time solutions, such as Euler's problem, are said to be members of the class P (for polynomial). Hamilton's problem is a member of another class, designated by the letters NP, signifying "nondeterministic polynomial." The class NP encompasses all the problems in P, or in other words P is a subset of NP. In addition NP includes problems whose status is less certain. They are all solvable problems in principle: they have algorithms, but for now only exponential-time algorithms are known. They may also have polynomial-time algorithms (in which case NP and P are identical) or they may prove to be permanently intractable, with only inefficient solutions.

The problems considered here, and all problems classified in this way, can be described as an infinite set of similar questions each of which can be Another way of defining NP is as the class of yes-or-no problems that can be solved by guessing certificates. If one is given an instance of a problem in NP for which the answer happens to be yes, then with luck one may discover the required certificate fairly quickly by making a sequence of guesses; if the answer is no, guessing cannot possibly yield an answer any faster than an exhaustive search could. For example, in solving Hamilton's problem one might find a correct path (if there is one) on the first try by tracing small portions of the path and guessing at each stage how to proceed. Such a procedure, it should be emphasized, is not an algorithm. It could be made into an algorithm only by crossing off each trial path as it is tested and checking all possible paths, but that is equivalent to the method of exhaustive search.

A mathematical procedure defined in terms of luckly guesses may seem bizarre, but it is a quite legitimate approach to defining the problems in the class NP. In principle the procedure could even be mechanized by building a device called a nondeterministic Turing machine. This device can do all that an ordinary Turing machine can do; in addition, at some points in its operation it may have more than one choice of what to do next. Such a machine would be considered to answer yes to a question if there were some sequence of choices that could lead it to a yes conclusion. NP, the class of nondeterministic polynomial-time problems, consists of precisely those problems whose yes instances can be identified by, machines making comparatively short guessing computations.

The inclusion of guessing in the definition of these problems suggests strongly to many mathematicians that P and NP are not the same set and hence that efficient algorithms can never be found for the intractable problems in the class NP. If every problem in NP were actually in P, then all the guesswork and luck could be replaced by some systematic procedure without great sacrifice in time. It is hard to believe the ability to guess and be lucky could win so little.

The class NP includes a variety of commonly encountered problems that seem to defy efficient solution. We have already mentioned Hamilton's problem and the problem of composite numbers. Another example is known as the matching problem. It can be considered in terms of the task faced by the colleges every September, when a new class of freshmen must be assigned to shared dormitory rooms.

For the sake of simplicity let us assume that all the information gathered about the students' smoking habits, bed time hours, taste in music and so forth results in a single yes-or-no decision as to the compatibility of each possible pair of students. The entire class can then be represented as a graph in which the points correspond to students and a line is drawn answered yes or no. For problems that are formally unsolvable, such as the problem of predicting whether a Turing machine will halt, these questions simply cannot be answered by any algorithmic procedure. For problems of the class P the questions can invariably be answered, whether the answer turns out to be yes or no, by an efficient procedure. In order for a problem to qualify for the class NP there need not be an efficient means of answering the yes-or-no questions. What is required is that whenever the answer is yes there be a short and convincing argument proving it.

Hamilton's problem, for example, meets this condition. It is not possible to tell by any efficient means known today whether a graph has a Hamiltonian path, but if it does, then the path itself can be exhibited. Hence for every Hamiltonian graph it is possible to issue a "certificate" that proves its membership in this special class of graphs. Such a certificate would name the lines in the graph in the order the path traverses them. Finding the path might require weeks of tabulation, but once it has been found it can easily be exhibited. Another problem that belongs to the class NP is the question of whether a whole number is composite, that is, whether it can be written as the product of two other numbers. Again there is no known efficient procedure for answering the question, but if the number is indeed composite there is a succinct proof of that fact, namely a correctly worked-out multiplication with the number on the bottom line.

Care must be taken when asking the yes-or-no question of a problem in the class NP, since the complementary no-or-yes problem might not be in the same class. For example, the complement of Hamilton's problem, in which one is asked to show that a graph does not have a path passing once through each point, may well not be in the class NP. For now the only way to demonstrate the absence of such a path is to list all possible paths, and such a proof is too lengthy to qualify as a certificate of membership in NP. On the other hand, the complement of the composite-number problem, which asks if a number is prime, turns out to be in the class NP. The reason, which is far from obvious, is that relatively short proofs demonstrating that a number has no factors other than 1 and itself were discovered in 1975 by Vaughan Pratt of the Massachusetts Institute of Technology. Still, it is not known whether the composite-number problem and its complement are in the class P.

It is easy to show that every problem in the class P is also in the class NP. If a problem is in P, then by definition there is an efficient algorithm for it. To produce a short and convincing proof that the answer to some instance of the problem is yes, all we need to do is follow the algorithm: a record of its operation constitutes the required certificate.

Another way of defining NP is as the class of yes-or-no problems that can be solved by guessing certificates. If one is given an instance of a problem in NP for which the answer happens to be yes, then with luck one may discover the required certificate fairly quickly by making a sequence of guesses; if the answer is no, guessing cannot possibly yield an answer any faster than an exhaustive search could. For example, in solving Hamilton's problem one might find a correct path (if there is one) on the first try by tracing small portions of the path and guessing at each stage how to proceed. Such a procedure, it should be emphasized, is not an algorithm. It could be made into an algorithm only by crossing off each trial path as it is tested and checking all possible paths, but that is equivalent to the method of exhaustive search.

A mathematical procedure defined in terms of luckly guesses may seem bizarre, but it is a quite legitimate approach to defining the problems in the class NP. In principle the procedure could even be mechanized by building a device called a nondeterministic Turing machine. This device can do all that an ordinary Turing machine can do; in addition, at some points in its operation it may have more than one choice of what to do next. Such a machine would be considered to answer yes to a question if there were some sequence of choices that could lead it to a yes conclusion. NP, the class of nondeterministic polynomial-time problems, consists of precisely those problems whose yes instances can, be identified by machines making comparatively short guessing computations.

The inclusion of guessing in the definition of these problems suggests strongly to many mathematicians that P and NP are not the same set and hence that efficient algorithms can never be found for the intractable problems in the class NP. If every problem in NP were actually in P, then all the guesswork and luck could be replaced by some systematic procedure without great sacrifice in time. It is hard to believe the ability to guess and be lucky could win so little.

The class NP includes a variety of commonly encountered problems that seem to defy efficient solution. We have already mentioned Hamilton's problem and the problem of composite numbers. Another example is known as the matching problem. It can be considered in terms of the task faced by the colleges every September, when a new class of freshmen must be assigned to shared dormitory rooms.

For the sake of simplicity let us assume that all the information gathered about the students' smoking habits, bed time hours, taste in music and so forth results in a single yes-or-no decision as to the compatibility of each possible pair of students. The entire class can then be represented as a graph in which the points correspond to students and a line is drawn connecting every two students who can be placed in the same room. If each room holds just two students, the assignment can be made efficiently by a clever polynomial-time algorithm discovered by Jack Edmonds of the University of Waterloo. If each room is to be shared by three students, however, there is no known efficient algorithm. The problem is in the class NP, since all yes instances have succinct certificates: an acceptable room assignment, once it is discovered, can easily be exhibited. Of course, a solution could be found by exhaustive search, albeit inefficiently. With luck a suitable assignment, if there is one, can be guessed quickly.

Map coloring is a problem in the class NP that concerns mathematicians more than it does cartographers. The question is whether the countries on a given map can be colored with a given number of colors so that no two countries that share a border have the same color. It is easy to find out if a map can be colored with two colors, it can be if there are no places on the map where an odd number of countries meet at one point. It is even easier to tell if a map can be colored with four colors; indeed, there is no need even to look at the map, since Kenneth Appel and Wolfgang Haken of the University of Illinois proved in 1975 that four colours suffice for any map. Surprisingly, however, no efficient algorithm is known for determining whether three colors are enough for a given map. The problem is in the class NP, since a correctly colored map can-serve to certify a yes answer.

Map coloring can be regarded as a special case of another problem called graph coloring. Any map can be converted into a graph by reducing each country to a point and drawing a line between two points if the corresponding countries share a border. Coloring the graph is then equivalent to coloring the map, subject to the rule that two points connected by a line cannot have the same color. Graph coloring, however, is a more general problem, with applications outside graph theory. For example, a graph can represent the scheduling of work in a factory. Each point of the graph stands for some job to be done, and two points are connected by a line, if the jobs cannot be done concurrently, perhaps because they require the same piece of machinery. A coloring of the graph with three colors would then supply a schedule dividing the work of the factory into three shifts. Like map coloring, the graph-coloring problem is in the class NP.

It often happens that if one problem can be solved efficiently, so can many others. For example, if an efficient algorithm could be found for the problem of graph coloring, it could be applied with only minor modifications to the problems of map coloring and factory scheduling. Map coloring and factory scheduling are therefore said to be efficiently reducible to graph coloring. In the past several years it has become apparent that some of the problems in the class NP have a remarkable property: all the problems in NP are efficiently reducible to them. These elite problems within the class NP are called NP-complete. If any one of them has an efficient algorithm, then every problem in NP can be solved efficiently.

The first proof that a problem is NP-complete was presented in 1971 by Stephen A. Cook of the University of Toronto. His reasoning follows a path essentially parallel to the path of Turing's earlier work on mathematical machines and their relation to problems of formal logic. Cook stated his proof in terms of the prepositional calculus, the formal language in which separate logical statements, which individually may be either true or false, are joined together by the lexical elements "and," "or" and "not." In general a sentence in the propositional calculus can be shown to be either true or false depending on which of its component statements are assumed to be true or false. Certain sentences, however, cannot be true under any interpretation because they are self-contradictory. Sentences that cannot be made true are said to be unsatisfactory.

Cook employed the propositional calculus to describe the operation of the nondeterministic Turing machines, the mechanized guessing devices essential to the definition of the class NP. He showed that the calculations of any such machine can be described succinctly by sentences of the propositional calculus. When the machine is given a yes insfance of a problem in NP, its operation is described by a satisfiable sentence, whereas the operation of a machine given a no instance is described by a sentence that cannot be satisfied.

It follows from Cook's proof that if one could efficiently determine whether a sentence in the propositional calculus can be satisfied, one could also determine efficiently in advance whether the problem presented to a nondeterministic Turing machine will be answered yes or no. Since the problems in the class NP are by definition all those that can be solved by nondeterministic Turing machines, one would then have an efficient method for solving all those problems. The catch, of course, is that there is no known efficient method of determining whether a sentence in the propositional calculus can be satisfied.

Cook's argument states in essence that the propositional calculus is a universal language for describing problems in the class NP. Every instance of such a problem corresponds to a sentence in that language, and if the sentence is satisfiable, the instance has a yes answer. Many other problems have since been shown to be NP-complete because the satisfiability problem can efficiently be reduced to them. Hamilton's problem, the problem of matching groups of three roommates and the problem of coloring graphs with three colors are all NP-complete. The first to point out the broad applicability of this theory was Richard M. Karp of the University of California at Berkeley. Similar investigations were independently conducted by the Russian mathematician P. A. Levin. Since NP-complete problems capture the difficulty of all other problems in NP, it is widely thought today that all NP-complete problems are computationally intractable. A proof that a problem is NP-complete is usually considered a strong argument for abandoning further efforts to devise an efficient algorithm for its solution.

Even the assumption that all NP-complete problems are intractable would not settle all questions about the class NP. In addition to the mystery of the NP-complete problems there is an even more obscure area: problems in NP for which no efficient algorithms are known but which have not been proved to be NP-complete either. The problem of composite numbers is one of these.

Not all problems that can be solved by a computer are of the yes-or-no. type. Another common type is the optimization problem. For example, suppose one is given the positions of some cities on a map and asked to find the shortest possible railroad network connecting them. In one version of this problem one is allowed to lay down a straight section of track between any two cities, but one is not allowed to install isolated junction points; tracks can be joined only at cities. One property of the solution to this problem is immediately apparent: the optimum network can never include a closed circuit, because if it did, the network could be made shorter simply by omitting one link in the circuit. Thus the best net-work always branches like a tree, and the problem itself is called the spanning-tree problem.

The spanning-tree problem can be solved correctly and quite efficiently by a method called the greedy algorithm, devised by Joseph B. Kruskal of Bell Laboratories. The procedure is simply to connect the closest pair of cities, then the next-closest and so on without adding any superfluous lines (lines joining cities that are already linked indirectly). It is far from obvious that this method always yields the shortest network, but it does, and it has the pleasant property of requiring no foresight and no reconsideration of earlier decisions.

The greedy algorithm can be relied on to find the shortest network between cities under the rules specified, but in general that network will not be the shortest possible one. Further savings can be achieved by establishing junction points between cities. The properties of networks with such junctions were studied by the Swiss mathematician Jakob Steiner. It can be shown that any shortest network must be arranged so that each junction point is made up of three lines that meet at angles of 120 degrees. This rule provides some guidance in evaluating networks, but there are many possible networks with Steiner junction points. No algorithm has been discovered that finds the best network quickly.

The problem of the traveling salesman's tour is closely related. Again one is given a set of cities, but now one is asked to find the shortest round-trip tour. As a first guess the greedy algorithm suggests that perhaps the salesman should always go to the nearest city he has not yet visited, but this procedure does not work. Indeed, the problem is notorious for having resisted all attempts to find an efficient solution.

Optimization problems are not fundamentally different from those that ask yes-or-no questions; in fact, every optimization problem can be rewritten in a yes-or-no form. In the traveling salesman problem, for example, we might be given a length along with a set of cities and asked to state whether a tour can be constructed that does not exceed the specified length. This yes-or-no problem cannot be harder than the associated optimization problem, because if the optimum tour could be found by some efficient method, it would be a trivial task to determine whether it exceeds a given number. Hence if the yes-or-no version is computationally intractable, one cannot hope to solve the optimization problem itself efficiently. For this reason certain optimization problems, such as that of the traveling salesman's tour and that of placing Steiner junction points, are said to be

NP-complete.

Optimization problems are encountered often in engineering, economics, operations research and other fields. The discovery that at least some of these problems are NP-complete is therefore of considerable practical interest. Since the NP-complete problems probably have no efficient algorithms, there would seem to be little point in expending further effort in seeking optimum solutions. An alternative that has recently been adopted is to seek approximate solutions that are good even if they are not precisely optimal.

One technique that has been applied to the traveling salesman problem offers a solution that may not be optimum but is guaranteed to be no worse than twice the optimum path. The procedure starts with the shortest spanning tree, which can be generated efficiently by the greedy algorithm. This network can be converted into a tour of the cities by traversing each line twice and returning to the origin. It is known that the optimum spanning tree must be shorter than any possible tour of the cities, since a tour can be converted into a spanning tree (albeit one without any branches) by simply omitting one segment. Thus twice the length of the optimum spanning tree cannot be longer than twice the optimum tour. The method is a polynomial-time algorithm. Recently Nicos Christofides of the Imperial College of Science and Technology in London has found a way to improve the algorithm so that it yields a tour guaranteed to be no more than half again as long as the optimum.

A more profound compromise gives up not only the requirement that a solution be optimal but also the insistence that a less than optimum solution be guaranteed to fall within a specified range. Instead the assurance is given that the solution will not often deviate far from the optimum. An underlying assumption of such techniques is that the maps encountered in practice are not concocted to confound basically plausible techniques; such maps are encountered frequently only when they are constructed by computer scientists to reveal the flaws in methods proposed by their colleagues. Indeed, if the salesman's tour algorithms discussed above are applied to "natural" maps, they deliver far more than they promise. The resulting tours are not 100 percent or 50 percent longer than the optimum but closer to 5 percent.

A reasonable assumption about the properties of many maps is that cities are randomly placed. A theorem describing the statistical properties of optimum tours through such randomly distributed points was proved in 1958 by Jillian Beardwood, J. H. Halton and John M. Hammersley of the University of Oxford. Relying on that theorem, Karp has shown that a simple method of constructing tours almost always yields near-optimum results, when it is applied to maps with many cities.

 Karp begins by dividing the map into many small regions. Within each of these regions the cities are sufficiently few to find the optimum tour by exhaustive search, even though that method involves an exponential-time algorithm. The tours of the small areas are then linked by a variant of the greedy algorithm. Perhaps significantly, the method is not very different from the method usually adopted by people solving the problem manually.

Efficient but approximate solutions can be found for many NP-complete optimization problems. From the standpoint of mathematics, however, the important question is whether NP is identical with P. The repeated failure of attempts to find an efficient algorithm for the NP-complete problems has created considerable confidence that NP and P are not the same. There is now suspicion that they are not identical, but the proof of their distinctness may be beyond present mathematical capabilities. The question may join that select group of mathematical enigmas that remain unresolved for decades, and the solution may have to await the development of new methods in mathematics.