When it comes to practical side of computer science, we often work with
**regular** and **context-free** languages. Regular languages are most common
for expressing syntax through the widely used **regular expressions**. Somewhat
stronger context-free grammars dominate in the field of compilers and various
language processors.

As we know from the Chomsky hierarchy of formal languages, the class of regular languages is a subset (or a sort-of a special case) of context-free language class. This means that every regular language can be described not only by finite automatons, regular expressions or regular grammars but also by context-free grammars, pushdown automatons and also (for the hard-core folks) a turing machine.

In theory this is fine, but it’s not very good for practice. I mean, if you were developing some app that requires syntax checking, would you rather develop some sort of lineary bounded automaton or use a library for regular expressions?

Sometimes we want to prove, that we don’t need to do magic to check the
correct syntax of an email address. There are some ways of doing so. For
example **pumping theorem**, which is fairly straight forward. But pumping
lemma can be only used to prove,
that a language is **not** regular. The lemma doesn’t provide sufficient
condition for a language to be regular. That’s where the folks Myhill and
Nerode come in!

The Myhill-Nerode theorem on the contrary provides necessary and sufficient condition for the language to be regular! Yay! But as you can imagine it’s not that simple :-P.

## Formal definition

**Definition 1.** Let Σ be an alphabet and ~ a relation of equivalency on
Σ^{*}. Then relation ~ is right invariant if for all *u*, *v*, *w* ∈
Σ^{*}:

*u*~*v*<=>*uw*~*vw*

**Definition 2.** Let *L* be a language (not neccessarily regular) over Σ. We
define relation ~*L* called **prefix equivalence** on Σ^{*} as follows:

*u*~*L**v*<=> ∀*w*∈ Σ^{*}:*uw*∈*L*<=>*vw*∈*L*

**Theorem 1.** (*Myhill-Nerode*) Let *L* be a language over Σ. Then these three
statements are equivalent:

*L*is accepted by some deterministic finite automaton.*L*is the union of some of the equivalence classes of a**right invariant**equivalence relation of finite index.- Relation ~
*L*is of finite index.

## Informal description

The heart of this theorem is the fact that finite automaton has only a finite number of states. With this, any language that can be described by a finite automaton must consist of only finite number of string patterns. This is expressed by the right invariant relation with finite index.

The prefix equivalence is a form of the right invariant equivalence, but stronger. It’s tied to the respective language and says that if some strings are ~L-equivalent they both are included or excluded from the language after we extend them. The Myhill-Nerode theorem says, that a regular language always has a finite number of equivalence classes, i.e., there is only a finite number of word patterns that can be repeated through the string.

### Example proof

Now a little example of how to show, that a language is not regular by using
this theorem. Let’s take for instance the most classic context-free language of
all times *L* = { *a ^{n}b^{n}* |

*n*>= 0 }.

We’ll be interested in the third part of Myhill-Nerode theorem which states,
that **relation ~L is of finite index**. We need to find a way of showing that
it’s not and therefore the language is not regular (since Myhill-Nerode theorem
is an equivalence).

In this case, the most elegant way is proof by
contradiction. We will show, that
*L* = { *a ^{n}b^{n}* |

*n*>= 0 } is not reglar.

**Proof 1.** Suppose that *L* is a regular language. Let *i*, *j* ∈ *N*
be two natural numbers so *i* != *j*. Then consider words *u*, *v*, *w* ∈
Σ^{*} and a sequence of words over Σ^{*} ∈ *a*_{1},
*a*_{2}, …, *a*_{i}.

Now let’s assign some values to strings *u*, *v* and *w*:

u = a^{i}, v = a^{j}, c = b^{i}.

According to the definition of prefix equivalence (**definition 2**),

a^{i}b^{i} ∈ L -> a^{j}b^{i} ∈ L

we can see, that none of the words *ai* from the sequence are
**~L**-equivalent. The sequence is infinite, so the **index of the relation
will be infinite**. But that’s a contradiction with the Myhill-Nerode theorem.
Therefore, the language is **not** regular.

## Sources

- Myhill-Nerode Theorem
- Myhillova-Nerodova věta
- Češka M., Vojnar T., Smrčka A.. Teoretická informatika: studijní opora. FIT VUT v Brně. 2010.