In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov–Chaitin complexity, algorithmic entropy, or programsize complexity) of an object, such as a piece of text, is a measure of the computability resources needed to specify the object. It is named after Andrey Kolmogorov, who first published on the subject in 1963.^{[1]}^{[2]}
For example, consider the following two strings of 64 lowercase letters and digits:
abababababababababababababababababababababababababababababababab
4c1j5b2p0cv4w1x8rx2y39umgw5q85s7traquuxdppa0q7nieieqe9noc4cvafzf
The first string has a short Englishlanguage description, namely "ab 32 times", which consists of 11 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, which has 64 characters.
More formally, the complexity of a string is the length of the shortest possible description of the string in some fixed universal description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings, like the abab example above, whose Kolmogorov complexity is small relative to the string's size are not considered to be complex.
The notion of the Kolmogorov complexity can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.
Definition
To define the Kolmogorov complexity, we must first specify a description language for strings. Such a description language can be based on any computer programming language, such as Lisp, Pascal, or Java virtual machine bytecode. If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g. 7 for ASCII).
We could, alternatively, choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.
Any string s has at least one description, namely the program:
function GenerateFixedString()
return s
If a description of s, d(s), is of minimal length (i.e. it uses the fewest characters), it is called a minimal description of s. Thus, the length of d(s) (i.e. the number of characters in the description) is the Kolmogorov complexity of s, written K(s). Symbolically,
 $K(s)\; =\; d(s).\; \backslash quad$
The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the invariance theorem).
Invariance theorem
Informal treatment
However, there are some description languages which are optimal, in the following sense: given any description of an object in a description language, I can use that description in my optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, or the object being described.
Here is an example of an optimal description language. Our descriptions will have two parts:
 The first part describes another description language.
 The second part is a description of the object in that language.
In more technical terms, the first part of a description is a computer program, with the second part being the input to that computer program which produces the object as output.
The invariance theorem follows: Given any description language $L$, our optimal description language is at least as efficient as $L$, with some constant overhead.
Proof: If we have a description $D$ in $L$, we can convert it into a description in our optimal language by first describing $L$ as a computer program $P$ (part 1), and then using the original description $D$ as input to that program (part 2). The
total length of this new description $D\text{'}$ is (approximately):
 $l(D\text{'})\; =\; l(P)\; +\; l(D)$
The length of $P$ is a constant that doesn't depend on $D$. So, there is at most a constant overhead, regardless of the object we're trying to describe. Therefore, it follows that our optimal language is universal up to this additive constant.
A more formal treatment
Theorem: If K_{1} and K_{2} are the complexity functions relative to description languages L_{1} and L_{2}, then there is a constant c – which depends only on the languages L_{1} and L_{2} chosen – such that
 $\backslash forall\; s\backslash \; K\_1(s)\; \; K\_2(s)\; \backslash leq\; c.$
Proof: By symmetry, it suffices to prove that there is some constant c such that for all bitstrings s
 $K\_1(s)\; \backslash leq\; K\_2(s)\; +\; c.$
Now, suppose there is a program in the language L_{1} which acts as an interpreter for L_{2}:
function InterpretLanguage(string p)
where p is a program in L_{2}. The interpreter is characterized by the following property:
 Running InterpretLanguage on input p returns the result of running p.
Thus, if P is a program in L_{2} which is a minimal description of s, then InterpretLanguage(P) returns the string s. The length of this description of s is the sum of
 The length of the program InterpretLanguage, which we can take to be the constant c.
 The length of P which by definition is K_{2}(s).
This proves the desired upper bound.
History and context
Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structures).
The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by Ray Solomonoff, who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference"^{[3]} as part of his invention of algorithmic probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in Information and Control.^{[4]}^{[5]}
Andrey Kolmogorov later independently published this theorem in Problems Inform. Transmission,^{[6]} Gregory Chaitin also presents this theorem in J. ACM – Chaitin's paper was submitted October 1966 and revised in December 1968, and cites both Solomonoff's and Kolmogorov's papers.^{[7]}
The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm, and the code lengths it allows, to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information.
When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority.^{[8]} For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence, while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal prior probability distribution. The broader area encompassing descriptional complexity and probability is often called Kolmogorov complexity. The computer scientist Ming Li considers this an example of the Matthew effect: "... to everyone who has more will be given ..."^{[9]}
There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on selfdelimiting programs, and is mainly due to Leonid Levin (1974).
An axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov (Burgin 1982).
Basic results
In the following discussion, let K(s) be the complexity of the string s.
It is not hard to see that the minimal description of a string cannot be too much larger than the string itself  the program GenerateFixedString above that outputs s is a fixed amount larger than s.
Theorem: There is a constant c such that
 $\backslash forall\; s\; \backslash \; K(s)\; \backslash leq\; s\; +\; c.\; \backslash quad$
Incomputability of Kolmogorov complexity
The first result is that there is no way to compute K.
Theorem: K is not a computable function.
In other words, there is no program which takes a string s as input and produces the integer K(s) as output. We show this by contradiction by making a program that creates a string that should only be able to be created by a longer program. Suppose there is a program
function KolmogorovComplexity(string s)
that takes as input a string s and returns K(s). Now, consider the program
function GenerateComplexString(int n)
for i = 1 to infinity:
for each string s of length exactly i
if KolmogorovComplexity(s) >= n
return s
This program calls KolmogorovComplexity as a subroutine. The program tries every string, starting with the shortest, until it finds a string with complexity at least n (if there is one), then returns that string (or goes into an infinite loop if there is no such string). Clearly there is always at least one such string for any n, as otherwise all possible strings (infinitely many) could be generated by the (finitely many) programs with lower complexity, so GenerateComplexString must always return. Therefore, given any positive integer n, it produces a string with Kolmogorov complexity at least as great as n. The program itself has a fixed length U. The input to the program GenerateComplexString is an integer n. Here, the size of n is measured by the number of bits required to represent n, which is log_{2}(n). Now, consider the following program:
function GenerateParadoxicalString()
return GenerateComplexString(n_{0})
This program calls GenerateComplexString as a subroutine, and also has a free parameter
n_{0}. The program outputs a string s whose complexity is at least n_{0}. By an auspicious choice of the parameter n_{0}, we will arrive at a contradiction. To choose this value, note that s is described by the program GenerateParadoxicalString whose length is at most
 $U\; +\; \backslash log\_2(n\_0)\; +\; C\; \backslash quad$
where C is the "overhead" added by the program GenerateParadoxicalString. Since n grows faster than log_{2}(n), there must exist a value n_{0} such that
 $U\; +\; \backslash log\_2(n\_0)\; +\; C\; <\; n\_0.\; \backslash quad$
But this contradicts the definition of s as having a complexity at least n_{0}. That is, by the definition of K(s), the string s returned by GenerateParadoxicalString is only supposed to be able to be generated by a program of length n_{0} or longer, but GenerateParadoxicalString is shorter than n_{0}. Thus the program named "KolmogorovComplexity" cannot actually computably find the complexity of arbitrary strings.
This is proof by contradiction, where the contradiction is similar to the [3]
In the programming language community there is a corollary known as the full employment theorem, stating that there is no perfect sizeoptimizing compiler.
Chain rule for Kolmogorov complexity
The chain rule for Kolmogorov complexity states that
 $K(X,Y)\; =\; K(X)\; +\; K(YX)\; +\; O(\backslash log(K(X,Y))).\backslash quad$
It states that the shortest program that reproduces X and Y is no more than a logarithmic term larger than a program to reproduce X and a program to reproduce Y given X. Using this statement, one can define an analogue of mutual information for Kolmogorov complexity.
Compression
It is straightforward to compute upper bounds for $K(s)$ – simply compress the string $s$ with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the length of the resulting string.
A string s is compressible by a number c if it has a description whose length does not exceed $sc$. This is equivalent to saying that $K(s)\; \backslash le\; sc$. Otherwise, s is incompressible by c. A string incompressible by 1 is said to be simply incompressible – by the pigeonhole principle, which applies because every compressed string maps to only one uncompressed string, incompressible strings must exist, since there are $2^n$ bit strings of length n, but only $2^n\; \; 1$ shorter strings, that is, strings of length less than n, (i.e. with length 0,1,...,n − 1).^{[10]}
For the same reason, most strings are complex in the sense that they cannot be significantly compressed – $K(s)$ is not much smaller than $s$, the length of s in bits. To make this precise, fix a value of n. There are $2^n$ bitstrings of length n. The uniform probability distribution on the space of these bitstrings assigns exactly equal weight $2^\{n\}$ to each string of length n.
Theorem: With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least $12^\{c+1\}+2^\{n\}$.
To prove the theorem, note that the number of descriptions of length not exceeding $nc$ is given by the geometric series:
 $1\; +\; 2\; +\; 2^2\; +\; \backslash cdots\; +\; 2^\{nc\}\; =\; 2^\{nc+1\}1.\backslash $
There remain at least
 $2^n2^\{nc+1\}+1\backslash $
bitstrings of length n that are incompressible by c. To determine the probability, divide by $2^n$.
Chaitin's incompleteness theorem
We know that, in the set of all possible strings, most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proven, if the complexity of the string is above a certain threshold. The precise formalization is as follows. First, fix a particular axiomatic system S for the natural numbers. The axiomatic system has to be powerful enough so that, to certain assertions A about complexity of strings, one can associate a formula F_{A} in S. This association must have the following property:
if F_{A} is provable from the axioms of S, then the corresponding assertion A must be true. This "formalization" can be achieved, either by an artificial encoding such as a Gödel numbering, or by a formalization which more clearly respects the intended interpretation of S.
Theorem: There exists a constant L (which only depends on the particular axiomatic system and the choice of description language) such that there does not exist a string s for which the statement
 $K(s)\; \backslash geq\; L\; \backslash quad$ (as formalized in S) can be proven within the axiomatic system S.
Note that, by the abundance of nearly incompressible strings, the vast majority of those statements must be true.
The proof of this result is modeled on a selfreferential construction used in Berry's paradox. The proof is by contradiction. If the theorem were false, then
 Assumption (X): For any integer n there exists a string s for which there is a proof in S of the formula "K(s) ≥ n" (which we assume can be formalized in S).
We can find an effective enumeration of all the formal proofs in S by some procedure
function NthProof(int n)
which takes as input n and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here, since every possible proof in the language of S is produced for some n. Some of these are complexity formulas of the form K(s) ≥ n where s and n are constants in the language of S. There is a program
function NthProofProvesComplexityFormula(int n)
which determines whether the nth proof actually proves a complexity formula K(s) ≥ L. The strings s, and the integer L in turn, are computable by programs:
function StringNthProof(int n)
function ComplexityLowerBoundNthProof(int n)
Consider the following program
function GenerateProvablyComplexString(int n)
for i = 1 to infinity:
if NthProofProvesComplexityFormula(i) and ComplexityLowerBoundNthProof(i) ≥ n
return StringNthProof(i)
Given an n, this program tries every proof until it finds a string and a proof in the formal system S of the formula K(s) ≥ L for some L ≥ n. The program terminates by our Assumption (X). Now, this program has a length U. There is an integer n_{0} such that U + log_{2}(n_{0}) + C < n_{0}, where C is the overhead cost of
function GenerateProvablyParadoxicalString()
return GenerateProvablyComplexString(n_{0})
(note that n_{0} is hardcoded into the above function, and the summand log_{2}(n_{0}) already allows for its encoding). The program GenerateProvablyParadoxicalString outputs a string s for which there exists an L such that K(s) ≥ L can be formally proved in S with L ≥ n_{0}. In particular, K(s) ≥ n_{0} is true. However, s is also described by a program of length U + log_{2}(n_{0}) + C, so its complexity is less than n_{0}. This contradiction proves Assumption (X) cannot hold.
Similar ideas are used to prove the properties of Chaitin's constant.
Minimum message length
The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is Bayesian (i.e. it incorporates prior beliefs) and informationtheoretic. It has the desirable properties of statistical invariance (i.e. the inference transforms with a reparametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (i.e. even for very hard problems, MML will converge to any underlying model) and efficiency (i.e. the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe (1999) showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity).
Kolmogorov randomness
Kolmogorov randomness – also called algorithmic randomness – defines a string (usually of bits) as being random if and only if it is shorter than any computer program that can produce that string. To make this precise, a universal computer (or universal Turing machine) must be specified, so that "program" means a program for this universal machine. A random string in this sense is "incompressible" in that it is impossible to "compress" the string into a program whose length is shorter than the length of the string itself. A counting argument is used to show that, for any universal computer, there is at least one algorithmically random string of each length. Whether any particular string is random, however, depends on the specific universal computer that is chosen.
This definition can be extended to define a notion of randomness for infinite sequences from a finite alphabet. These algorithmically random sequences can be defined in three equivalent ways. One way uses an effective analogue of measure theory; another uses effective martingales. The third way defines an infinite sequence to be random if the prefixfree Kolmogorov complexity of its initial segments grows quickly enough  there must be a constant c such that the complexity of an initial segment of length n is always at least n−c. This definition, unlike the definition of randomness for a finite string, is not affected by which universal machine is used to define prefixfree Kolmogorov complexity.
Relation to entropy
For dynamical systems, entropy rate and algorithmic complexity of the trajectories are related by a theorem of Brudno, that the equality K(x;T) = h(T) holds for almost all x.^{[11]}
It can be shown^{[12]} that for the output of Markov information sources, Kolmogorov complexity is related to the entropy of the information source. More precisely, the Kolmogorov complexity of the output of a Markov information source, normalized by the length of the output, converges almost surely (as the length of the output goes to infinity) to the entropy of the source.
See also
Notes
References

 Brudno, A. Entropy and the complexity of the trajectories of a dynamical system., Transactions of the Moscow Mathematical Society, 2:127{151, 1983.
 Burgin, M. (1982), "Generalized Kolmogorov complexity and duality in theory of computations", Notices of the Russian Academy of Sciences, v.25, No. 3, pp. 19–23.
 Cover, Thomas M. and Thomas, Joy A., Elements of information theory, 1st Edition. New York: WileyInterscience, 1991. ISBN 0471062596. 2nd Edition. New York: WileyInterscience, 2006. ISBN 0471241954.


 Lajos, Rónyai and Gábor, Ivanyos and Réka, Szabó, Algoritmusok. TypoTeX, 1999. ISBN 9632790146
 Li, Ming and Vitányi, Paul, An Introduction to Kolmogorov Complexity and Its Applications, Springer, 1997. Introduction chapter fulltext.
 Yu Manin, A Course in Mathematical Logic, SpringerVerlag, 1977. ISBN 9780720428445
 Sipser, Michael, Introduction to the Theory of Computation, PWS Publishing Company, 1997. ISBN 0534950973.
 Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, 1999).
External links
 The Legacy of Andrei Nikolaevich Kolmogorov
 Chaitin's online publications
 Solomonoff's IDSIA page
 J. Schmidhuber
 Ming Li and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition, Springer Verlag, 1997.
 Tromp's lambda calculus computer model offers a concrete definition of K()
 Universal AI based on Kolmogorov Complexity ISBN 3540221395 by M. Hutter: ISBN 3540221395
 Occam's razor pages.
 P. Grunwald, M. A. Pitt and I. J. Myung (ed.), ISBN 0262072629.
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.