# Strings in Computer Science

You have probably used a string data type in some programming
language. In C it’s `<string.h>`

, in C++ `std::string`

, Python has them, even
PHP does! They’re useful and pretty straight-forward. The most beautiful thing is,
that there is a theoretical foundation for them. String operations like
reversation, concatenation, making sub-strings and, of course, the string itself
has a mathematical definition.

Let’s have a look.

**Definition 1.** String is a finite ordered sequence of characters. Let Σ be
an alphabet, then *s* ∈ Σ^{*} is a string over Σ.

The mysterious Σ^{*} is an alphabet iteration. That is a set of all
existing strings over Σ. Which makes it a infinite set, unless the alphabet is
empty. You see, all the strings fall within this definition. s = “abcc” might
be a string over alphabet Σ = {a, b, c}. One, sort of special case exists - an
empty string, that is usually denoted by ε and its length is 0. Now, let’s
proceed to some string operations. One of the most important is
**concatenation**.

**Definition 2.** Let w and x be strings over Σ. Then string y = wx is a
concatenation of these two. It’s also a string over Σ. You see, that
concatenation is not commutative. You cannot switch the strings and expect to
get the same result. Length of the new string |y| = |w| + |x| i.e. is equal to
a sum of lengths of the concatenated strings.

Concatenation is very important string operation. It allows you to connect
multiple strings together. When you have two or multiple strings connected, for
instance w, x, y, z ∈ Σ^{*} , z = wxy, you can say that w is a **prefix**
of z, y is a **suffix** of z and x is a **substring** of z. A **proper prefix**
of a string is not equal to the string itself, i.e. xy != ε and equivalently y
is a **proper suffix** if wx != ε.

**Definition 3.** Let x = a_{1}a_{2}…an be a string over Σ.
Reversal of the string, y = *reverse*(x) =
a_{n}…a_{2}a_{1}.

One rather trivial definition in the end, reverse string is the same sequence of characters backwards. The thing is, why can string reversal be defined this rather intuitive way? It’s because we can be absolutely sure, that each string has a finite number of symbols, i.e. we know the n.

## Note

It’s important to point out, that we’re talking about **theoretical strings**.
You can apply this theory to string literals in programming (like I did) and it
will work, but the strings in theory are far more abstract than that. It
actually depends on what is your alphabet. Let’s say, that a every symbol in Σ
represent a single processor instruction. If you create a string over that Σ,
you’ll get a sequence of instructions. You can do it multiple times and get
programs *x*, *y* ∈ Σ^{*}. Now, you can use concatenation, right? Then
*y* = *xz*, where y is also a program in which *x* and *z* are **subroutines**.

## Conclusion

Strings are useful, when you need to model any **sequential behavior**.
Computers are sequential machines (well, not that much lately, but in
principle) so there is plenty to cover with these couple of definitions!

## Sources

- Češka M., Vojnar T., Smrčka A., Teoretická informatika, Studijní opora.