Table of Contents
III. Theory and Mathematical Background – 40%
B. Automata and language theory
2. Formal languages and grammars (regular and context free)
a. Definition: The users manual for the computational machines (see III.B.1.a) consist of the things you are allowed to do to them. To say it another way, all of the inputs you are allowed to use on a machine are said to be the machine’s language.
b. We studied context free grammars (CFG) originally to describe human languages, but they are useful for understanding computer operations that need to parse (e.g. compilers). A CFG is a series of statements in the form a®b | t. We call this statement a production. a and b are variables and t is terminal. Given the starting variable, the CFG parses the string until all elements have been reduced to only the terminal statements. Here is a very simple example.
A -> Aa | B
B -> BB | b
This CFG can produce a string that has any number of “a” preceded by at least one “b”. Note that it is common to represent variables as capital letters and terminals as lower case. We also often call the first variable on the first line the start variable.
c. Backus-Naur Form (BNF) is a very common metasyntax for CFGs. We enclose non-terminals in angle brackets < > (See question #12). We read the series of symbols ::= as “can be”, and the pipe symbol | as “or”. We use the normal meaning for ellipse (…); in question #12, it represents a series of letters and numbers.
d. Strings are also elements of a grammar or a language. When expressing strings in languages we sometimes use regular expressions. Boolean operators are legal in regular expressions and they have the typical meaning. To concatenate two sides of an expression we can use the symbol ° or we can use the same shorthand as multiplication. For example, we can write (0 È 1) ° 0 or we can just use the shorthand (0 È 1)0. The star operator * means that we can repeat that section of the expression infinite times. For example, 0*1 means as many zeros as we want (including none) followed by exactly one 1. We can also use it in complex situations: (0 È 1)*0 means any number of 0 or 1 followed by exactly one 0.
e. Regular languages Ì Context-free languages Ì Decidable languages Ì Recursively enumerable languages.
f. A regular language is a language that can be expressed by a regular expression and recognized by a DFA. We can use the pumping lemma to prove that a language is regular. Regular languages are closed under union, concatenation, and the star operation. The compliment of any regular language is also regular.
g. A Context-free language (CFL) is any language that can be expressed using a pushdown automata (PDA).
h. We use a different pumping lemma to show that a language is a CFL or is not a CFL. CFL are closed under concatenation and union. The compliment of a CFL is not necessarily a CFL.