GCSE Link: None

Regular expressions provide us with a concise language to describe patterns in strings.

Each regular expression (or regex) describes a set of strings (which can be finite or infinite) that follow some given pattern. Regular expressions are made up of literal characters (like a, Z or 1) and meta-characters (special symbols) which can be used on the literal characters.

Table 1 shows the functions of the meta-characters.

Table 1

Meta-character Function Examples
+ Match the previous character/group one or more times a+{"a", "aa", "aaa", ...}
ab+{"ab", "abb", "abbb", ...}
a+b+{"ab", "aab", "abb", "aaab", "aabb", "abbb", ...}
* Match the previous character/group zero or more times a*{"", "a", "aa", "aaa", ...}
ab*{"a", "ab", "abb", "abbb", ...}
a*b*{"", "a", "b", "aa", "ab", "bb", "aaa", "aab", "abb", "bbb" ...}
? Optionally match the previous character/group (i.e. either zero or one times) a?{"", "a"}
ab?{"a", "ab"}
a?b?{"", "a", "b", "ab"}
| Match either the left hand side or the right hand side a|b{"a", "b"}
a|b*{"a", "", "b", "bb", "bbb", ...}
aa?|b+{"a", "aa", "b", "bb", "bbb", ...}
() Creates a matching group (ab)+{"ab", "abab", "ababab", ...}
(a|b)*{"", "a", "b", "aa", "ab", "ba", "bb", ...}
(a?|bb)+{"", "a", "bb", "aa", "abb", "bba", "bbbb", ...}


We've now learned about two different ways of representing sets of strings: finite state machines and regular expressions. It turns out that they are equivalent: it is possible to convert any regular expression to a finite state machine and vice versa.

A regular language is a set of strings that can be defined using a regular expression (or alternatively using a FSM).

Diagram 1 shows a finite state machine.

Diagram 1



Write a regular expression to match the set of strings represented by the FSM in Diagram 1.

Firstly, identify the dead state, S5, and ignore any arrows that go into it.

Next, let's find the quickest ways to get from the initial state to the accepting state: AB (S0 → S1 → S4) and ACCB (S0 → S1 → S2 → S3 → S4). The regex to represent this would be AB|ACCB, or equivalently A(CC)?B.

Finally, we can have ACB at the end any number of times, as it gets us from S4 back to S4 (through S2 and S3), so we add (ACB)* to the end.

Therefore the final answer is A(CC)?B(ACB)* .