Various Properties of Sturmian Words

This overview paper is devoted to Sturmian words. The first part summarizes different characterizations of Sturmian words. Besides the well known theorem of Hedlund and Morse it also includes recent results on the characterization of Sturmian words using return words or palindromes. The second part deals with substitution invariant Sturmian words, where we present our recent results. We generalize one-sided Sturmian words using the cut-and-project scheme and give a full characterization of substitution invariant Sturmian words.


Introduction
In recent years, the combinatorial properties of finite and infinite words have become significantly important in fields of physics, biology, mathematics and computer science.One of the first impulses for extensive research in this field was the discovery of quasi-crystals.
Normal crystal structures show rotational and translational symmetry.In 1982, however, Dan Shechtman discovered an aperiodic structure (which was formed by rapidly-quenched aluminum alloys) that has a perfect long-range order, but no three-dimensional translational periodicity (see e.g.[1] or [2]).Since then many stable and unstable aperiodic structures have been discovered.They are now known as quasi-crystals.
The problematics of aperiodic structures has been studied from various points of view and there are numerous relations with other applications (besides solid-state physics), such as pseudo-random number generators [3], pattern recognition and symbolic dynamical systems.Early results are reported in [4] or [5].
This paper is devoted to Sturmian words.Sturmian words are infinite words over a binary alphabet with exactly n+1 factors of length n for each n ³ 0. They represent the simplest family of quasi-crystals.The history of Sturmian words dates back to the astronomer J. Bernoulli III [6].He considered the sequence

[
] [ ] , where n ³ 1 and a is a positive irrational.Based on the continued fraction expansion of a, he gave (without proof) an explicit description of the terms of the sequence.There also exist some early works by Christoffel and Markoff. A. A. Markoff was the first to prove the validity of Bernoulli's description.He did that in his work [7], where he described the terms of the sequence [ ] [ ] [ ] a a, n ³ 1 (later known as a mechanical sequence).
The first detailed investigation of Sturmian words is due to Hedlund and Morse [4], who studied such words from the point of view of symbolic dynamics and, in fact, introduced the term "Sturmian"; named after the mathematician Charles Francois Sturm.
It appears that there are several equivalent ways of constructing Sturmian words.We will describe some of them and show the relationship with other notions from the combinatorics on words such as palindromes and return words.Then we make some notes on an extension of Sturmian words using the cut-and-project scheme.The next section is devoted to the important question of the invariance of Sturmian words on a substitution.In the last section we present some open problems related to generalizations of Sturmian words.

Sturmian words
An infinite one-sided word w w w w w K is a sequence of letters from a finite set A which is called an alphabet.We use the notation N for the set of integers and The set of all finite words over the alphabet A is denoted by A*.A finite word u A Ï * is a factor of w if there exist 0 £ £ k l such that u w w k l = K .The empty word is denoted by e.The set of factors of w of length n is written L n (w) and the set of all factors of w is denoted by L(w).The set L(w) is often called the language.
An infinite word w is ultimately periodic if there exist a word u and a word v such that w uv = w where v w is the infinite concatenation of the word v.It is periodic if u is the empty word.If the infinite word w does not have any of the previous forms we say that it is aperiodic.We say that u is a prefix (resp.suffix

K
).The infinite word w, l ³ 0, is a suffix of w.
An infinite word w is uniformly recurrent if for every integer k there exists an integer l such that each word of L w k ( ) occurs in every word of length l.
The usual way of defining Sturmian words is via the complexity function.Let w be an infinite word and C w be a mapping There have been some attempts to extend Sturmian words to words over alphabets with more than two letters, for instance [8] or [9], but none of these constructions show such nice properties as Sturmian words.However, the approach of Arnoux and Rauzy presented in [8] resulted in another interesting family of words called Arnoux-Rauzy sequences.
Another combinatorial definition of Sturmian words is based on the distribution of letters in the word.Let w be an infinite word, u v L w , ( ) Î be two factors of the same length u v = and function d be defined d( , ) where u 0 denotes the number of occurrences of 0 in u.We say that w is balanced if and only if d( , ) u v £1 for all u v L w , ( ) Î with u v = .Note that the structure of a balanced word over the alphabet A = { , } 0 1 is formed, either by a block of 0's between two consecutive 1's, or by a block of 1's between two consecutive 0's.It is easy to see that the length of the blocks of 0's between two consecutive 1's (resp.blocks of 1's between two consecutive 0's) differs at most by 1 in a balanced word.As we will see balanced words are equivalent to Sturmian words.
In the literature one can find several constructions of binary words.We start with the construction presented by Hedlund and Morse in [4] + is an integer for some n.If we consider b = 0 and n = 0, then s a, ( ) 0 0 0 = and s a, ( ) 0 0 1 = and we obtain the important special case of mechanical words The infinite word c a is called the characteristic sequence of a.
Crisp et al. study, in their work [10], another way of constructing characteristic sequences.Consider again the integer lattice Z 2 and the straight line y x = r , where r is a positive irrational and x ³ 0. We label the intersections of the line y x = r with verticals of the grid using 0, and we label by 1 the intersections of y x = r with horizontals.The sequence of labels forms the so called cutting sequence and is denoted by S r .It can be shown that (see e.g.[10]).
One can often encounter the term r-interval exchange map in connection with infinite words.Properties of words generated by the r-interval exchange map (called the coding of the r-interval exchange map) are studied from different aspects in [11], [12] or [13].Let us closely mention the 2-interval exchange map and its reference to Sturmian words.
Let a Î( , ) 0 1 be an irrational number and x Î[ , ) 0 1 .Let a be the decompositions of the interval [ , ) 0 1 .The map T a , given by the rule is called the 2-interval exchange map.It can be written in a more complex way ë û For the n-th iteration of the 2-interval exchange map T a we obtain Since a is irrational, T a does not have any fixed point.It is not difficult to check that ë û ë û a which implies that Let us show the most famous example of Sturmian words, the Fibonacci word.Example 2.1: Let j be the map given by j: , 0 011 0 a a with the property j j j ( ) ( ) ( ) for any finite words u, v over the alphabet {0, 1}.Define the n-th finite Fibonacci word f n in the following way: f 0 0 = and for all n ³ 0, = and j(0) starts with 0, then f n is the prefix of f n +1 for each n ³ 0. The Fibonacci word is defined as the limit of the sequence of words f n and thus lim Note that the further applications of the map f n do not change the Fibonacci word.Observe that the length f n is the n-th element of the Fibonacci sequence , for all n ³ 0. It can be shown that the complexity of the Fibonacci word equals to n+1, thus the word is Sturmian.The Fibonacci word also coincides with the mechanical word with slope 1 2  t and zero intercept, where )is the golden mean.We finalize this part with a theorem by Hedlund and Morse [4] which states that Sturmian, balanced and mechanical words are indeed equivalent.Theorem 2.1: (Hedlund, Morse).Let w be an infinite word over the alphabet A = { , } 0 1 .The following conditions are equivalent: 1. w is Sturmian; 2. w is balanced and aperiodic; 3. there exist an irrational a, a Î( , ) 0 1 and a real b Î[ , ) 0 1 such that w s = a b , or w s = a b , , for all n ³ 0. There exist several proofs of the theorem.The original proof [4] is of combinatorial nature, while the other by Lunnon and Pleasants [14] is based on geometrical considerations.

Other characteristics of Sturmian words
We have mentioned several equivalent definitions of Sturmian words as those with minimal complexity, balanced aperiodic sequences and mechanical words.In the past few years, there have been successful attempts to find a new characterization of Sturmian words.The first one, which we will describe uses return words, while the second uses palindromes.
Let w be a one-sided infinite word and u a factor of w.We say that a finite word v is the return word over u if vu is a factor of w, u is a prefix of vu and there are exactly 2 occurrences of u in vu.In other words, the return word v over u starts with the occurrence of u and ends just before the next occurrence of u.Example 2.2: Let 0100101001001010010100100101… be the Fibonacci word.The set of return words over 0101 contains words 01010010 and 01010.For clarity, here is the Fibonacci word with indicated return words over 0101 : 0100101001001010010100100101… Vuillon [15] observed that the number of return words indicates whether a word is Sturmian or not.He showed the following theorem.Theorem 2.2: A binary infinite word w is Sturmian if and only if the set of return words over u has exactly two elements for every non empty word u.
Note that the proof of the necessary condition includes a nice application of Rauzy graphs, which are often used for investigation of growth of the complexity in infinite words [8].
Let us focus on palindromes.A palindrome is a finite word that reads the same backwards as forwards.For instance, these are the first palindromes of the Fibonacci word: e, 0, 1, 00, 010, 101, 1001, 00100, 01010,… In [16], Droubay and Pirillo showed the characterization of Sturmian words by observing the number of palindromes of even and odd length.

Theorem 2.3: An infinite word is Sturmian if and only if, for each nonnegative integer n, there is exactly one palindrome of length n, if n is even, and there are exactly two palindromes of length n, if n is odd.
The mapping that assigns to an integer n the number of palindromes of length n in a word is called the palindromic complexity.
In the context of palindromes and Sturmian words, let us draw attention to paper [17].The authors proved that the number of palindromes in a word n of length n is less or is equal to n+1.Note that this holds for any kind of words (not necessary Sturmian) over arbitrary alphabets.However in the case of Sturmian words it was shown in [17] that the number of all palindromes in a factor of length n is equal to n+1.The reader may like to try finding words of length n over a 2-letter alphabet with the number of palindromes less than n+1; this is not as trivial as it may appear to be.

Bidirectional Sturmian words
In the previous sections we have outlined several characteristics of Sturmian words.However we have limited ourselves, from the very first definitions, to one-sided infinite words.This restriction is typical for a large number of papers, in spite the fact that the definitions of notions like balanced words, mechanical words etc. can be extended, very naturally, to both sides.The question is, whether the above listed theorems still hold for bidirectional infinite words.Let us show a way of generating bidirectional infinite words from which it will be clear that such a generalization is possible.form an increasing sequence ( ) x n n Z Î .We can compute the lengths between two consecutive points of ( ) The term ë û ë û The sequence ( ) where | denotes a delimiter.Since the distances between the consecutive points in S e h b b , ( , ] -1 are ordered as the mechanical word s e b , then w (i.e. the infinite word to the right from the delimiter) is the Sturmian word.From the construction it is clear that the language of w w w 0 1 2 K is the same as the language of K w w 1 .The complexity function C w has been defined for the right sided infinite words, however the definition can be naturally extended to left sided infinite words, say ¢ = -- There are a couple of papers dealing with cut-and-project sets.In [18] it is shown that a cut-and-project set has either two or three distances between adjacent points; two distances correspond to the case of the unit length of the acceptance window and the distances form a Sturmian word.On the other hand, the words corresponding to the cut-and-project sets with three distances are exactly those which arise from the coding of the 3-interval exchange map, and vice versa.In [19], [20] the authors study substitution properties and the substitutivity of cut-and-project sets.

Sturmian words and substitutions
Let us take a look at Sturmian words from a different point of view.One can see in Example 2.1 that there exist Sturmian words which are generated by certain maps (here we will call them substitutions).In fact, the mentioned Fibonacci word is a fixed point of the map j.The question is whether this is a general property of all Sturmian words, or whether there exists a class of Sturmian words invariant under substitutions.Let us now state basic definitions and then we give an overview of the most interesting results.
A morphism j is a map of A* into itself satisfying for each u v A , * Î .The morphism is called non-erasing if j( ) a i is not an empty word for any a A i Î .A non-erasing morphism j is called a substitution.
Czech Technical University in Prague Acta Polytechnica Vol. 45 No. 5/2005 The action of j can be extended to bidirectional infinite words We say that the word ( ) w n n Z Î is invariant under the substitution j (or is a fixed point of j) if Suppose that we have a substitution j and there exist letters a a A i j , Î , such that j( ) a ua i i = and j( ) a a v j j = for some non-empty words u v A , * Î .Then by repeated application of j on the pair a a i j | of letters separated by the delimiter | we obtain words j j Clearly , for certain words u v A n n , * Î .The limit of j j for n ® ¥ is an infinite bidirectional word ( ) w n n Z Î and we say that j generates the word ( ) w n n Z Î .Let us define a weaker notion of a substitutive word.We say that ( ) | w w w w ww is a substitutive word, if there exists a substitution j on an alphabet B with a fixed point ( ) and a map c :B A a such that w v n n = c( ), for each n Z Î .Note that all fixed points of a substitution are substitutive.The opposite is not true.
Let A a a k = { , , } 1 K be an alphabet.To a substitution j one may assign a substitution matrix A Î Ń k k in the following way: ( ) = number of letters in the word j .
The problem of invariance under a substitution (or the weaker notion of substitutivity) has motivated many papers.There are some partial results, where authors consider only one sided Sturmian words or characterize substitution invariant bidirectional Sturmian words depending on the slope a if the intercept b = 0.
Crisp et al. [10] carried on the work of Brown [21] and studied substitution invariant cutting sequences.They proved that the cutting sequence S r (resp.the mechanical sequence c a is substitution invariant if and only if the continuous fraction expansion of r (resp.a) has a certain form.The author in [22] used some of their results and simplified their condition on the invariance of the characteristic sequence c a under a substitution.He showed that c a , a Î( , ) 0 1 , is invariant under a substitution j if and only if a is a quadratic irrational with conjugate ¢ Ï a ( , ) 0 1 .Such a is called a Sturm number.This result was shown independently by Allauzen, [23].
Parvaix [24] proved that bidirectional non-pointed Sturmian words with b ¹ 0 are invariant under a substitution if and only if a is a Sturm number and the intercept b belongs to the quadratic field Q a . A complete characterization of infinite one-sided substitution invariant Sturmian words was done by Yasutomi [25].Berthé et al. [26] studied infinite words which arise from the coding of the 2-interval exchange map and gave an alternative proof of Yasutomi's result using Rauzy fractals associated with invertible primitive substitutions.The authors also de-fined for every fixed Sturm number a a matrix M a Î Q2 2 that is called the generating matrix of a and is closely related to the smallest solution of a Pell equation.They showed that a Sturmian sequence s a b , (a Sturm number, b Î[ , ) 0 1 ) is a fixed point of a substitution with substitution matrix A if and only if A has the form M a l , for some l ³1.
We complete this overview by giving a few notes on paper [27].The authors have completely solved the question of the substitution invariance of pointed bidirectional Sturmian words.The main theorem shown in the paper is the following.Theorem 3.1: Let a be an irrational number, a Î( , ) The pointed bidirectional Sturmian word with slope a and intercept b is invariant under a non-trivial substitution if and only if the following three conditions are satisfied: Note that this result is analogous to those derived in [25] and [26] for the one-sided case.
The proof presented in [27] is constructive based on the cut-and-project scheme that was sketched in the paragraph devoted to bidirectional words.This approach has not been used yet in the study of substitution invariant Sturmian words.It turns out to be a good choice because the proof is simple.One of the advantages of the cut-and-project scheme is that the more difficult parts of the proof can be illustrated on vivid examples, which makes the whole paper more comprehensible.
We believe that methods similar to those in [27] can solve the question of the substitution invariance of words over a 3 letter alphabet, which arise from the coding of the 3-interval exchange map.

Open problems
We have summarized several interesting properties of Sturmian words and here we would like to highlight how the results about Sturmian words can help in further advances in the field of aperiodic words.As Sturmian words are the simplest aperiodic structures, the question of generalisation of results obtained for Sturmian words is particularly interesting.The direct generalization of Sturmian words are infinite words which arise from the coding of a 3-interval (resp.r-interval) exchange map.We believe that results and techniques used in the study of Sturmian words can be applied with success to open problems connected with words coding a 3-interval exchange map.Below you may find a list of the most interesting issues.1.A necessary and sufficient condition on the substitution invariance of the words coding a 3-interval exchange map is still not known.The problem of finding a similar condition to that of Theorem 3.1 is still open, but the techniques used in [27] may bring some answers.2. Giving a description of substitution matrices of substitutions which generate the words coding a 3-interval exchange map is a problem closely related to the previous one.The question of finding generating matrices of such substitutions seems to be very challenging.This issue was solved in [26] for Sturmian words.3.Although a description of palindromes in the words coding a 3-interval exchange map is already known [17], we believe that the use of the cut-and-project scheme can give an alternative proof of this result.Nevertheless the palindromic complexity is described only for a 2-interval exchange map (see theorem 2.3) and a 3-interval exchange map [28], but we believe that a general description can be obtained by considerations based on the cut-and--project scheme.
. Let a, b be real numbers, where a Î( , ) 0 1 and b Î[ usually referred to as mechanical words and a is called the slope and b the intercept.Mechanical words of s a b , resp.s a b , have a nice geometrical interpretation.Consider the integer lattice Z 2 , the straight line y x = + a b, x ³ 0, and two sequences of integer points .The elements of X n cover integer points of the lattice just below the line y x = + a band the elements of Y n cover points just above the line.If s n a b , ( ) = 0 then the points X n and X n+1 lie on the horizontal line, and if s n a b , ( ) =1 then they lie on the diagonal line.The same holds for s n a b , ( ) and Y n .In fact, the sequences s a b , , s a b , are the coding of a discretization of a straight line.Note that if a is irrational then s a b , and s a b , differ by at most for two values of n.Clearly, this can happen only if na b Consider h, e fixed positive irrational numbers, .e. any element of S e h b b some n Z Î .Using this fact we can write e b e b takes only two values 0 and 1 for each n Z Î .Thus there are only two distances between neighbors in S e h b b , ( , ] -1 , namely 1 + h and h.Let us define a sequence ( ) w n n Z Î of 0's and 1 1 is also Sturmian.Since an arbitrary shift of the whole interval ( , ] b b -1 (i.e.we are keeping the unit length of the interval) does not change the set S e h b b , ( , ] -1 , it just shifts the position of the delimiter, we can conclude that the word ( ) w n n Z Î is the pointed bidirectional infinite Sturmian word.From now on we will consider only bidirectional infinite words.Note that the set S e h b b , ( , ] -1 is called the cut-and-project set and the interval ( , ] b b -1 is an acceptance window.