Sunday, September 27, 2009

Principles of Programming Languages - Module 3

Module 3 
Describing Syntax  and Semantics

The study of programming languages, like the study of natural languages, can be divided into examinations of syntax and semantics.  The syntax of a programming language is the form of its expressions, statements, and program units.  Its semantics is the meaning of those expressions, statements, and program units.  For example, the syntax of a C if  statement is if () .  The semantics of this statement form is that if the current value of the expression is true, the embedded statement is selected for execution.

Syntax and semantics are closely related.  In a well-designed programming language, semantics, should follow directly from syntax; that is, the form of statement should strongly suggest what the statement is meant to accomplish.

Languages, whether natural (such as English) or artificial (such as Java), are sets of strings of characters from some alphabet.  The strings of a language are called sentences or statements.  The syntax rules of a language specify which strings of characters from the language's alphabet are in the language. 

Formal descriptions of the syntax of programming languages, often do not include descriptions of the lowest level syntactic units.  These small units are called lexemes.  The lexemes of a programming language include its identifiers, literals, operators, and special words.

A  token  of a language is a category of its lexemes.  An identifier is a token that can have lexemes, or instances.  In some cases, a token has only a single possible lexeme. 

For example:

Given  the C statement:  index = 2 * count + 20;

The lexemes and tokens of this statement

Lexemes        Tokens
index              identifier
=                     equal_sign
2                     int_literal
*                      mult_op
count              identifier
+                     plus_op
20                   int_literal
;                      semicolon

A Grammar for Simple Assignment Statements

Given:   a = b * ( a + c )

    =
      a =
                    a = *
                    a = b *
                    a = b * ( )
                    a = b * ( + )
                    a = b  * ( a + )
                    a = b * ( a +
                    a = b *  ( a + c)

Parse Trees

One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define.  These structures are called parse trees.

Given: a = b * ( a + c )

Ambiguity

A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous.

Given: a = b *  a + c
Note:  The parse tree will be supplied by the teacher in class.

Operator Precedence

A parse tree of a sentence with multiple operators, regardless of the particular operators involved, has the rightmost operator in the expression at the lowest point in the parse tree, with the other operators in the tree moving progressively higher as one moves to the left in the expression. 

Static Semantics

The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics).  In many cases, the static semantic rules of a language state its type constraints.  Static semantics is so named because the analysis required to check these specifications can be done at compile time.

Denotational Semantics

This is the most rigorous widely know method for describing the meaning of programs.  The fundamental concept of denotational semantics is to define for each language entity both a mathematical object and a function that maps instances of that entity onto instances of the   mathematical object.

No comments:

Post a Comment

Related Posts with Thumbnails