I
|
and followed by a semicolon ;
, which acts as terminator: S ::= the apple | an orange
;
the apple
an orange
S
(said non-terminal) allows the generation of symbols the
apple
as well as an orange
(said terminal).the apple
to be generated
is equal to 1 every 3 times; and the same for an orange
:
thus, when 2 productions occur, we have 1 in 2 chances each; when 5 occur,
we have 1 in 5, etc. S ::= the Animal is eating
a Animal ;
Animale ::= cat | dog ;
the cat is eating a cat
the cat is eating a dog
the dog is eating a cat
the dog is eating a dog
S
as the starting one: every grammar must therefore define it at least
unless another starting symbol has been specified as argument to the program. S ::= a Pet called "Pet"
;
Pet ::= cat | pig | dog ;
a cat called Pet
a pig called Pet
a dog called Pet
S ::= "(" (apple | orange) ")" ;
( apple )
( orange )
::=
in a definition, a subproduction of any form can be specified between
round brackets:S ::= an (
apple
| orange
) is on the (
table
|
desk
) ;
an apple is on the table
an orange is on the table
an apple is on the desk
an orange is on the desk
S ::= an (apple | orange) is on the
(table | desk) [in the (living | dining) room] ;
an apple is on the table
an apple is on the table in the living room
an apple is on the table in the dining room
an orange is on the table
an orange is on the table in the living room
ecc.(*
and *)
keywords. Such text will be completely ignored by PolyGen. S ::= apple | rainge
(* | banana *) |
mango ;
(* this is comment
too *)
apple
orange
mango
^
can be either prefixed or suffixed to as well as infixed in any point
within a production in order to make the program not insert a white space
character in the output string: S ::= "(" ^ (apple | orange) ^ ")" ;
(apple)
(orange)
S ::= "I" Verb ^ e Verb ^ ing ;
Verb ::= lov | hat ;
I love hating
I love loving
I hate hating
I hate loving
_
stands for the empty production, formally called epsilon. S ::= ball | _ ;
ball
_
S ::= [palla] ;
palla
_
a
or nothing as
output.+
, when prefixed to a (sub)production (however nested), raises the probability
for it to be generated, in respect to the others of that very series;
simmetrically, the minus keyword -
lowers it down. Any number of +
and -
keywords may be specified: S ::= the cat is eating (+
an apple |- an orange | some meat |-- a
lemon) ;
the cat is eating an apple
the cat is eating an orange
the cat is eating some meat
the cat is eating a lemon
S
is internally interpretet as follows: S ::= the cat is eating (
an apple | an apple | an apple | an apple
| an orange | an orange
|
some meat | some meat | some meat
|
a lemon) ;
an apple
to be generated is higher than an
orange
, which is higher that some meat
, on its turn
higher than a lemon
. S ::= ugly cat | nice
Animal ;
Animal ::= dog | bull | pig ;
PRODUCES ugly cat
nice dog
nice bull
nice pig
ugly
cat
to be generated is 1 every 2 times, but it is not the same
for nice dog
, nice bull
and nice pig
,
even though a user may find it reasonable for all them to be generated
with the same probability.ugly cat
and nice
Animal
equally sharing the unit of
prabability of S
: thus the
chances for ugly cat
to be generated is equal to the chances
for nice Animal
, i.e. one among nice
dog
, nice bull
and nice pig
. In the
example above the probability distribution appears as follows:
ugly cat |
1/2 |
nice dog |
1/2 * 1/3 = 1/6 |
nice bull |
1/2 * 1/3 = 1/6 |
nice pig |
1/2 * 1/3 = 1/6 |
S
this
way: S ::=
ugly cat | nice dog | nice bull | nice
pig ;
for unfolding
non-terminal symbols: S ::= ugly cat | nice
>
Animal ;
Animal ::= dog | bull | pig ;
>
keyword
to a non-terminal symbol, during the preprocessing phase the program performs
the translation above, changing the probability distribution as follows:
ugly cat |
1/4 |
nice dog |
1/4 |
nice bull |
1/4 |
nice pig |
1/4 |
S ::= (walk
|
pass)
through
|
look at
|
(
go
|
come
|
move
|
link
|
run
)
to
;
look
at
will be generated for the same reason discussed in section 2.0.4.1. In order to take output
etherogeneity to the desired level, that is where each single verb may
be produced with the same probability, the user should avoid round bracket
usage at all, so that there would be no more 3 macro-productions, and should
suffix the proper preposition to each verb.>
makes the program delegate to the preprocessor the unfolding of the following
subproduction, allowing the user to keep the original source architecture
unchanged. S ::= >(walk
|
pass)
through
|
look
at
|
>(
go
|
come
|
move
|
link
|
run
)
to
;
S ::=
walk through
|
pass through
|
look
at
|
go
to
|
come to
|
move to
|
link to
|
run
to
;
Digit ::=
z: 0 | nz: >(
1
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) ;
Digit ::=
z: 0 | nz:
1 |
nz:
2 |
nz:
3
|
nz:
4 |
nz:
5 |
nz:
6
|
nz:
7 |
nz:
8 |
nz:
9 ;
>
operator is therefore
first permutated, while the unfolding holds and is then performed
at the new position within the sequence. S ::= >{the >(dog | cat)} and
{
a
(
fish
|
bull
)}
;
S ::=
the dog and a
(
fish
|
bull
)
|
the
cat and a
(
fish
|
bull
)
|
a
(
fish
|
bull
)
and the dog
|
a
(
fish
|
bull
)
and the cat
;
S ::=
> >> the
(
dog
|
cat
)
| a (fish | bull) << | an alligator ;
S ::=
the dog
|
the cat
|
a fish
|
a bull
|
an alligator
;
S ::= Verb.inf | Verb.ing ;
Verb ::= (inf: to) (eat | drink | jump) (ing: ^ing) ;
to eat
to drink
to jump
eating
drinking
jumping
S ::= (Conjug.S | Conjug.P).sp |
(Conjug.S | Conjug.P).pp
;
Conjug ::= (Pronoun Verb).1 |
(Pronoun Verb).2 |
(Pronoun Verb).3
;
Pronoun ::= S: (1: "I" | 2: you | 3: (he | she | it))
|
P:
(1: we | 2: you | 3: they)
;
Verb ::= (pp: Be) (eat | drink) (sp:
(S: (3: ^s)) | pp: ^ing) ;
Be ::= S: (1: am | 2: are | 3: is) | P: are ;
I eat
you eat
he eats
she eats
it eats
we eat
they eat
I am eating
you are eating
he is eating
we are eating
etc.1,2,3,S
and P
respectively identify
syntactical forms for the first, second and third persons, singular and
plural, we managed to correctly conjugate both simple present and present
progressive tenses according to a pronoun.Verb
.S
simply activates
all combos of label pairs S,P
and sp,pp
for the production
of Conjug
. In order to avoid
such frequent uncomfortable solutions you're allowed to specifiy, on the
right of the dot operator, a set of labels in round brackets interleaved
by the pipe keyword. S ::= Conjug.(S|P).(sp|pp) ;
+
and
-
keywords. S ::= Ogg.(+S|--P).(sp|-pp) ;
S ::= (Conjug.S |
Conjug
.S |
Conjug
.S
|
Conjug
.S
|
Conjug
.P).sp
|
(Conjug.S |
Conjug
.S
|
Conjug
.S
|
Conjug
.S
|
Conjug
.P).sp
|
(Conjug.S |
Conjug
.S |
Conjug
.S
|
Conjug
.S
|
Conjug
.P).pp
;
\
, which makes the program
perform the capitalization of the very following terminal symbol, i.e. switching
its first letter to uppercase. S
::= \ smith (is | "." \) Eulogy
^ "."
;
Eulogy
::=
rather a smart man
|
really
a gentleman ;
Smith is rather a smart man.
Smith. Rather a smart man.
Smith is really a gentleman.
Smith. Really a gentleman.
S
::= a \ ^ \ _
b
PRODUCES
{
and }
, the program automatically performs all
the permutations among them. S
::= whether
{
is
}
{
therefore
}
{
he
}
;
PRODUCES
S
::=
{
in
10 minutes
}
^,
{
at
3 o'clock
}^,
{"I" {will depart} {
alone
}
}
;
PRODUCES
S
::=
{
in
10 minutes
}
^,
{
at
3 o'clock
}^,
("I"
{
will depart} {
alone
}
)
;
PRODUCES
>>
and <<
: any atom, however
nested, for which the unfolding operation makes sense (see section 2.0.4) is unfolded. As a result, the complete
flattening of every subproduction and non-terminal symbol is done: S
::= look at >>
the
(
dog
|
(
sorian
|
persian
)
cat
)
|
a
(
cow
|
bull
|
Animal
)
<< ;
Animal
::=
pig
|
(
weird
|
ugly
)
chicken ;
S
is translated
into:S ::=
look at
(
the dog
|
the
sorian
cat
|
the
persian
cat
|
a
cow
|
a
bull
|
a
pig
|
a
(
weird
|
ugly
)
chicken
)
;
>
operator for every subproduction or non-terminal symbol within a given subproduction;
on the other hand, it still is sometimes impossible to perform a deep unfolding
of every (sub)atom without generating (unintentional) errors. The PolyGen
grammar definition language allows therefore the user to lock the
unfolding (of an atom for which such operation would make sense) by means
of the prefix operator <
. S
::= look at >>
the
(
dog
|
<
(
sorian
|
persian
)
cat
)
|
a
(
cow
|
bull
|
<
Animal
)
<< ;
Animal
::=
pig
|
(
weird
|
ugly
)
chicken ;
S
is
translated into:S ::=
look at
(
the dog
|
the
(
sorian
|
persian
)
cat
|
a
cow
|
a
bull
|
a
Animal
)
;
S ::= Digit [^ S] ;
Digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ;
0
23
853211
000000
00011122335
etc. S1 ::= canarin | cow | camel ;
S2 ::= canarin | (cow | camel) ;
canarin
cow
camel
S1
and S2
outputs are equal, the probability distribution
for the former is:cat |
1/3 |
cow |
1/3 |
camel |
1/3 |
cat |
1/2 |
cow |
1/2 * 1/2 = 1/4 |
camel |
1/2 * 1/2 = 1/4 |
(cow | camel)
is interpreted someway as a whole block by
the program. S ::= a (+ _ | beautiful) house ;
a house
a beautiful house
beautiful.
S ::= Digit | S.nz [^ S.] ;
Digit ::= z: 0 | nz: {
1 | 2
| 3 | 4 | 5 | 6 | 7 | 8 | 9} ;
0
1
23
23081993
112358
20020723
ecc. S ::= A | B ;
A ::= a ;
B
is not defined. S ::= S | A ;
A ::= B ;
B ::= S | A ;
S ::= a | A ;
A ::= B
;
B ::= A
;
a
, it is still possible for a non-terminating path
to be entered: such cases are therefore signaled by an error message too.>
(see section 2.0.4.1) to a non-terminal
symbol that would cause a cyclic recursion.
S ::= >A ;
A ::= >B ;
B ::= >S ;
S ::= A.3 ;
A ::= 1: a | 2: b ;
_
A ::= apple | orange | banana ;
A ::= mandarin | melon ;
A
is defined twice.I
I
does not allow the usage of the program -info
option. S ::= A.3 | c ;
A ::= 1: a | 2: b ;
c
_
S
::= a {b} c ;
S
is the starting non-terminal symbol. S ::= DEF
| DEF S
DEF ::= Nonterm "::=" PRODS ";"
PRODS ::= PROD
| PROD "|" PRODS
PROD ::= SEQ
| MODIF SEQ
MODIF ::= "+"
| "-"
| "+" MODIF
| "-" MODIF
LABELS ::= LABEL
| LABEL "|" LABELS
LABEL ::= Label
| MODIF Label
SEQ ::= ATOMS
| Label ":" ATOMS
ATOMS ::= ATOM
| ATOM ATOMS
ATOM ::= Term
| "^"
| "_"
| "\"
| UNFOLDABLE
| ">" UNFOLDABLE
| "<" UNFOLDABLE
| ATOM "."
| ATOM DotLabel
| ATOM ".(" LABELS ")"
UNFOLDABLE ::= Nonterm
| "(" PRODS ")"
| "[" PRODS "]"
| "{" PRODS "}"
| ">>" PRODS "<<"
S ::= DEF
| DEF S
DEF ::= Nonterm "::=" PRODS ";"
PRODS ::= SEQ
| SEQ "|" PRODS
SEQ ::= ATOMS
| Label ":" ATOMS
ATOMS ::= ATOM
| ATOM ATOMS
ATOM ::= Nonterm
| Term
| "^"
| "_"
| "(" PRODS ")"
| ATOM "."
| ATOM DotLabel
Term ::= [a-z 0-9 , '][a-z A-Z 0-9 , ']*
| " [A-Z a-z 0-9 ( ) _ - ? . , ! : \ & # +
* / % $ � [ ] { } ~ @ ; : | < > = ^ ' \ "]* "
Nonterm ::= [A-Z][A-Z a-z 0-9 _]*
Label ::= [A-Z a-z 0-9 _]+
DotLabel ::= . Label
Nonterm
in section 4.1.3 recognizes the backslash character
within quotes. A terminal symbol is therefore allowed to contain any
escape sequence among the following:\\ | backslash |
\" | quote |
\n | new line |
\r | carriage return |
\b | backspace |
\t | tab |
\xyz
|
ASCII decimal code xyz
|
concrete syntax |
abstract syntax |
|
1 |
A.( + (a1)-(b1)
l1|...| +(an)-(bn)
ln) |
(A.l1 | (1)
... | (w1)
A.l1
|
... | A.ln | (1)
... | (wn)
A.ln ) where wi = ai -
bi - min {a1-b1
... an-bn } |
2 |
+(a1)-(b1)
P1 | ... | +(an)-(bn)
Pn |
P1 | (1)
... | (w1)
P1 |
... |
Pn | (1)
... | (wn)
Pn
where wi = ai -
bi - min {a1-b1
... an-bn } |
3 |
[ P
] |
(_ | ( P
)) |
4 |
>>
P
<<
|
( P'
) where P' is isomorph to P where unfoldable
atoms are unfolded |
5 |
P1 | ... | A1
{Q1}.
... An { Q n }.
|
... | Pn |
P1 | ... | A1
(Q11 ) .
... An ( Q1 n ) . ln
| ... |
| ... | Pn where Qji
is the i-th element of the j-th
permutation Q1..
Q n(with
i = 1..n, j = 1..n!) |
6 |
P1 | ... | L: A >(Q1 | ... | Qm).l B | ... | Pn |
P1 | ... | L: A (Q1).l B | ... | L: A (Qm).l B | ... | Pn |
6 |
P1 | ... | L: A >X.l B | ... | Pn |
P1 | ... | L: A (Q1).l B | ... | L: A (Qm).l B | ... | Pn |
P , Q |
productions or series of productions |
A , B |
atoms o atom (sub)sequences |
L , l |
labels |
X , Y |
non-terminali symbols |
+(n)-(m) |
juxtaposition of n and m, respectively,
+ and -
operators
|
P | (1)
... | (n)
P |
n-lengthened series of productions P
|