Wiki‎ > ‎OpenCCG‎ > ‎

OpenCCG Gui

Tutorial for the CCG GUI

This is a first tutorial which walks you, step by step, through the process of writing a very small English grammar using OpenCCG. The grammar we're going to write is the 'tinytiny' English grammar, originally written by Ben Wing. As Ben says, this is a truly minimal grammar for CCG. If you haven't gone through the WebCCG tutorial, it might be helpful to do that first, to get a feel for grammar writing. BUT be aware that the format used for specifying grammars in WebCCG is not the same as the format for the CCG GUI.

The instructions below assume you're working on a UTCL lab machine.

Other CCG pages on this wiki: main and grammars

Here is the complete tinytiny grammar.

The advanced topics page contains important information for writing your own grammars.

First steps : nuts & bolts

First, let's get some terminology straight. OpenCCG is a comprehensive system for writing, editing, and running grammars in the Combinatory Categorial Grammar (CCG) framework. The bulk of this tutorial is about using the editing component of OpenCCG the CCG GUI.

Second, open up two terminals, one for running the CCG editor and one for testing the grammar using ''tccg''. Please ask if you need help with this!

Running OpenCCG

The software is installed on the lab machines. To be able to run the OpenCCG command line tools, do the following if you are on the UTCL lab computers:

$ source /usr/local/bin/ccg-setup

Alternatively, if you are on your home computer or another system and have OpenCCG installed in ''/usr/local/openccg'', the following will add the tools to your path:

$ export JAVA_HOME=/usr/local/java
$ export OPENCCG_HOME=/usr/local/openccg
$ export PATH=$OPENCCG_HOME/bin:$PATH

If you installed OpenCCG somewhere else, change the path of the first line as appropriate. Also, set JAVA_HOME to where Java is installed on your system.

Create a directory for this grammar and open up that directory (don't type the ''$''; it's the command prompt).

$ mkdir tinytiny
$ cd tinytiny

To begin editing a grammar, you first need to create the file you want to edit using the ''touch'' command. Invoke the editor with ''ccg-edit'' like this :

$ touch tinytiny.ccg
$ ccg-edit tinytiny.ccg

You should now have an editor window showing a blank ''tinytiny.ccg'' file.
Basics of ccg-format syntax
The ''ccg-format'' was developed by Ben Wing as a front-end for conveniently specifying OpenCCG grammars. This is a major improvement over the native XML format, which is not human-friendly and involves a lot of redundant specification of information. The ''ccg-format'' is designed to be expressive, concise, and human-friendly, with as little required duplication as possible.

The general feel of the syntax is like C, Java, or Perl. Except in expansions (discussed later), things like indentation, whitespace, and even commas are subject to no restrictions. To comment out a line, use ''#'', as shown below. (Commented lines are lines that the computer 'doesn't see' this is your space for making notes & providing any sort of information you like.)

#### tinytiny.ccg ####

# A truly minimal grammar for CCG
# Ben Wing, May 2006

Architecture of the grammar

The old way of specifying grammars with OpenCCG required the user to write and maintain six separate XML files. The new system runs from a single ''.ccg'' file with five subsections. The order of the sections is not fixed, although features and expansions do need to be declared/defined before they can be used.

Features
This is where features are declared. Declaring features allows for simple specification of features in  lexical entries and categories.

Words
This is where words are declared and associated with particular categories, features, and lexical items. This is also where morphological information gets specified.

Rules
This section specifies the rules allowed or disallowed in the grammar. The following rules are enabled by default:
  • application (forward & backward)
  • composition (forward & backward)
  • crossed composition (forward & backward)
Lexicon/Categories
This is where lexical families are declared. Lexical families consist of one or more category declaration and optional declaration of lexical items which are members of that family. For example, in English the lexical family ''Det'' has just a single category: ''np/^n''. The family for ditransitive verbs, though, has two possible categories, one for the double object construction and one for the pp-complement construction.

Testbed
The testbed section is used for testing the grammar. The contents of this section are a list of sentences and the number of parses you expect the grammar to find for each sentence. Specifying the number of parses is useful for making sure the grammar doesn't overgenerate:

 the policeman saw me: 1;
 the policeman saw I: 0;

----
Template for grammar
In grammar writing, it is very helpful to separate these five subsections in the ''.ccg'' file. Copy and paste the grammar template into the blank Edit window.

Now save this change by clicking the ''Save'' button in the upper right-hand corner of the editor window. Next to this button are ''Help'' and ''Quit'' buttons there's no ''Help'' yet, sorry...

----
Using the editor
The editor has five tabs one is the Edit window, where you can see the whole ''.ccg'' file. The other four correspond to four of the five grammar subsections. (We don't have a Words tab yet, but we will soon.)

The editor is designed with local, wiki-style editing. This makes it very easy to modify your grammar and to see the effects of changes before making them final. For example, features are edited from inside the Features window, and individual categories in the Lexicon window each can be edited individually. The same applies for the Rules and Testbed windows.

Changes are NOT saved automatically it's up to you to click that button. The cool thing about this is that you can see the effect of changes without saving them you can try things out without having to save them.
The scope of the grammar
Before we get started writing our ''tinytiny'' grammar, it's useful to think about the range of phenomena we want the grammar to capture. As the name implies, this grammar covers a limited core subset of constructions in English. In general, it's a good idea to start with one simple type of construction and gradually increase the complexity of the grammar.

''tinytiny'' covers these aspects of English grammar:
  • declarative sentences only (i.e. no questions or imperatives)
  • transitive verbs
  • intransitive verbs
  • verbs showing transitive/intransitive alternation
  • common nouns, both singular and plural
  • determiners
  • personal pronouns
We'll start by looking at intransitive verbs.

Intransitive verbs

The grammar needs to properly handle the following sentences:

testbed {
  the policeman sleeps: 1;
  the policemen sleeps: 0;
  the policemen sleep: 1;
  the policeman sleeps the peach: 0;
}

Add these to your testbed by clicking on the Edit button inside the Testbed window and entering the code in the box above. This is essentially a single declaration of a set of sentences paired with their correct number of parses we are declaring the contents of the testbed. The contents of the testbed must be inside a pair of braces, and each item must end with a semi-colon, even if there's only one item in the testbed.
Lexical entry
This section describes, bit by bit, how to implement the lexical family for intransitive verbs. To enter a brand new family, use the Edit window. To make changes to a category once it has been created, use the local editing functionality of the Lexicon tab.

First, nothing fancy. Create the lexical family for intransitive verbs and give it one entry: the basic category for intransitives in English. When creating a new family, the family name should either be the same as a part of speech, or the part of speech should be given in parentheses after the family name. The syntax for ''family'' declarations has the same basic shape as that for the testbed (braces and semicolons).

family IntransV(V) {
  entry: s \ np;
}


Now switch to the Lexicon tab and you'll see a graphic representation of the family and entry you just created. Go ahead and save these changes.
Exercise 1
Expand the feature declaration to include a ''num'' feature (''sg'',''pl'') and a ''pers'' feature (''1st'',''2nd'',''3rd'').

Solution

----
Feature hierarchies
It is sometimes useful to specify a hierarchy of features. For example, in English we want to have a ''non-3rd'' person feature to handle subject-verb agreement. To write a feature hierarchy with the ccg-format, the name of the higher level of the hierarchy is followed by a list of the elements in the next lower level, enclosed in braces. Modify the ''pers'' feature you just wrote to include a ''non-3rd'' hierarchy encompassing ''1st'' and ''2nd''.

pers<2>: non-3rd {1st 2nd} 3rd;

Feature hierarchies can be as deep as they need to be. We'll write a bigger hierarchy later for semantic types.
Adding semantics
We use semantic variables to associate semantics with particular atomic elements of the category. Generally, ''E'' is used as an event variable, and ''X'' and ''Y'' are used to refer to participants in the event. Semantic variables appear in square brackets with the other features. Within the brackets, order doesn't matter, and separating the features with commas is allowed but is not required.

Here's the intransitive verb category enhanced with semantic variables.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom];
}

Now we use these semantic variables to relate the syntactic category with the logical form. This is also where we provide some additional semantic information (''action'' and ''animate-being''). Let's take this in a couple of steps. First, we create the basic logical form.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom] : E (* <Actor>X);
}

The asterisk stands in for the proposition expressed by the verb instantiating the category. It's possible to give semantic types to each of the semantic variables, as shown below.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom] : E:action(* <Actor>X:animate-being);
}

Once you've modified this entry by adding the semantics, hit Done, take a look at how things have changed, and then click over to the Features tab. Just like any other features, the semantic types must be declared as features. We can also declare semantic features such as tense or semantic number in this way. Note the nesting in the ontology. Add these semantic features to your feature declaration (don't enter the dots > these are meant to show that stuff is missing).

feature {
    .
    .
  tense<E>: past pres;
  sem-num<X>: sg-X pl-X;

  ontology: sem-obj {
     phys-obj {
        animate-being {person animal}
        thing
       }
     situation {
        change {action}
        state
       }
    };
}

Instantiating the category (adding words)
Now we want to instantiate the category with lexical items. This is done in the Words section of the grammar. The editor doesn't have a tab for this section yet, so this part of the editing must be done in the main Edit window.

To declare a word, we use the following format: ''word'' followed by the lexical item and its category (or categories), followed by any features associated with the lexical item. Semantic types can be supplied in parentheses immediately following the category (see the entries for ''policeman'' and ''policemen''). Note that the same lexical item (''sleep'' below) can be given more than one entry.

word the:Det;
word policeman:N(person): sg 3rd;
word policemen:N(person): pl 3rd;
word sleep:IntransV: pres non-3rd sg;
word sleep:IntransV: pres pl;
word sleeps:IntransV: pres 3rd sg;
word slept:IntransV: past;

Specifying each word individually, as shown above, involves a lot of repetitive work. This is where expansions come in expansions are a powerful tool for reducing redundant specification. We'll get to expansions in a minute, or you can jump down now.
Exercise 2
The word declarations shown above use some categories we don't have yet. Create the lexical family ''N''.

Solution
Determiners
We also need a category for determiners. Recall that our basic category for ''Det'' in English is ''np / n'' but with constraints on the mode of combination. We also want features of the ''n'' to be passed up to the result category ''np''.

To set the two feature structures to be the same, we simply give them the same feature-structure ID. Create this family in the Edit window and then take a look at what you get in the Lexicon window.

family Det {
  entry: np<2> / n<2>;
}


Modalities are implemented in the CCG GUI by "decorating" the slashes with the following characters:
  • ''*'' is the application only modality
  • ''x'' is the crossing modality
  • ''^'' is the diamond, or harmonic, modality
  • ''.'' or unspecified is the anything goes modality
Now add the diamond modality to the slash in the ''Det'' category. Use the local edit button in the Lexicon window. Once you've added the modality, click Done and see the result! While you're in the local editing mode, you can scroll through the entire grammar file. The Home button will take you back to the category you were first editing. Before hitting Done, you can use the Undo button to undo any changes you've made during the current local editing session (in other words, since the last time you hit the small Edit button).

While we're at it, let's add features and semantic variables.

family Det {
  entry: np<2> [X pers=3rd] /^ n<2> [X];
}


What about the semantics for ''Det''?  The semantic contribution of the determiner is essentially a modification of the ''n''. To implement this, we need to create a new semantic relation, which we'll call ''det''. The relation is declared as a parameter of the lexical family, and once it's declared we can use it as part of the logical form. Make these changes and take a look at the new semantic representation for ''Det''.

family Det(indexRel="det") {
  entry: np<2> [X pers=3rd] /^ n<2> [X]: X:sem-obj(<det>*);
}

Exercise 3
We've already written a word declaration for the definite determiner. Write a declaration for the indefinite determiner a. What feature(s) does a convey that the does not?

Solution

The big payoff : testing the grammar

Now we're ready to try parsing a sentence with ''tinytiny''.
The first step is to generate the XML files required by the parser. If you already have a second terminal window open, switch to that terminal and ''cd'' into the directory where ''tinytiny.ccg'' is stored. Type the following command:

$ ccg2xml tinytiny.ccg


As the XML files are created, you'll see messages on the screen. Now we load the grammar into the parser using the command ''tccg'' (stands for "text CCG"). This command takes one argument, the grammar file generated by ''ccg2xml''.

$ tccg tinytiny-grammar.xml


Here are some useful commands:
  • '':h'' shows a list of options
  • '':derivs'' and ''noderivs'' toggle between showing the full derivations and not showing them
  • '':sem'' and '':nosem'' toggle between showing the full semantics and not showing them
  • '':feats'' and '':nofeats'' toggle between showing the features and not showing them
  • '':q'' takes you out of tccg.
To parse a sentence, simply type the sentence (all lowercase, no punctuation) at the ''tccg>'' prompt. Try parsing the policeman sleeps look at the output you get with and without derivations, semantics, and features. Don't worry if you don't understand everything immediately. You'll get more comfortable with the representations as you use the system more.

IMPORTANT If you are having trouble loading the grammar and/or parsing sentences, ask for help right away.

If you make changes to the .ccg file and want to try parsing with the new version of the grammar, you'll need to exit ''tccg'', rerun ''ccg2xml'', and reload the grammar.

Transitive verbs

Now we can move on to transitive verbs. For now, we'll write an entirely separate category for transitive verbs, but a more elegant solution might be to define transitive verbs in relation to transitive verbs. We'll deal with this topic in the next tutorial.
Exercise 4
Create the lexical family for transitive verbs. Start with the basic category, then add features and semantic variables. Finally, create the logical form.

Solution

----
Exercise 5
Now add the following sentences to the testbed with the correct number of parses.

the policeman saw the book
the policeman saw the books
the policemen see the books
the policeman see the book


Solution

----

What about a verb like 'eat'?

Some verbs, like eat, can be transitive or intransitive. This is no problem we simply give the word membership in both lexical families (both ''IntransV'' and ''TransV'').

word eat:IntransV: pres non-3rd sg;
word eat:IntransV: pres pl;
word eat:TransV: pres non-3rd sg;
word eat:TransV: pres pl;
word eats: .... (etc.)


Of course, writing all of these separate entries is a really tedious process, and one of the great things about the ccg-format is that it makes it really easy to eliminate most of these redundancies. This brings us to the topic of more efficient word declarations, but first, add these sentences to  the testbed:

the boys eat
the boys eat the peaches

Efficient word declaration

Rather than writing a separate declaration for each form of the same lemma, we can avoid redundantly specifying information with two devices: (1) methods for declaring inflected forms with a single word declaration, and (2) using expansions to do string rewrites. This section covers those two topics.
Multiple inflections: pronouns
Let's use the case of pronouns to look at how to declare inflected forms of a word with a single declaration. To declare the various forms of the first-person pronoun, we'll use an umbrella category ''pro1'' and list the individual forms and their features in braces following.

word pro1:Pro(animate-being){
  I: 1st sg nom;
  me: 1st sg acc;
  we: 1st pl nom;
  us: 1st pl acc;
}


Note that we've used a category (''Pro'') that we haven't declared yet. This one will work nicely:

family Pro {
  entry: np<2> [X]: X:sem-obj(*);
}

Exercise 6
Write the declaration for 3rd person feminine pronouns.

Solution
Using expansions
The basic idea behind expansions is very simple: they simply do string substitution, but are made more powerful by the fact that they can take parameters. To define an expansion, give the name of the expansion (''change_me'' in the example below) with its parameters (''Stem, Newstem''), followed by the substitute text (''Newstem'').

def change_me(Stem, Newstem) {
   newstem
}


This is a really, really dumb expansion. Basically what it says is 'every time an expression of the form ''change_me(X,Y)'' occurs, replace it with ''Y''. Another way to think of it is as a rewrite rule with variables: ''change_me(X, Y) > Y ''. In expansions, parameters functioning as variables

Let's look at an example using nouns. We're going to write an expansion which will replace a simple statement with a word declaration which declares the inflected forms. Take a look (a more detailed explanation follows):

def noun(Sing, Plur, Class) {
  word Sing:N(Class) {
    *: sg sg-X;
    Plur: pl pl-X;
  }
}


In these types of declarations, the ''*'' is replaced with the lemma/umbrella category (which is the first thing you see after the word ''word''.

Here we have defined an expansion called ''noun'' which takes three arguments
  • ''Sing'' is a variable for the single form of the noun
  • ''Plur'' is a variable for the plural form of the noun
  • ''Class'' is a variable for the semantic sort of the noun
When the editor sees a call to that expansion, it "replaces" the call with whatever it finds in braces. In this case, what's in the braces is a word declaration for nouns, replacing any variables with the values supplied. So if we write ''noun(book, books, thing)'', the variable ''Sing'' is instantiated as ''book'', ''Plur'' is instantiated as ''books'', and ''Class'' is instantiated as ''thing''. So the complete word declaration looks like this (but you don't have to write all of this, all you have to write is ''noun(book, books, thing)''.

word book:N(thing) {
    *: sg sg-X;
    books: pl pl-X;
  }


NOTE This is the only case where order of declarations matters. Expansions must be defined before they can be used.
Exercise 7
Add the ''noun'' expansion in the ''Words'' section of your .ccg file (this has to be done in the Edit tab we don't have a Words tab yet).

Now add ''noun'' declarations for peach, boy, and policeman.

Solution

Minutiae

Commas, quotes, reserved words, etc.
More about features
Members of families

Error messages

  - Integer value too large — occurs when
  - Illegal character — occurs when
  - Redefining macro — occurs when
  - Bad tuple pair — occurs when
  - Property X not found in Y — occurs when
  - Family/part-of-speech X not found (word declaration Y) — occurs when
  - Original entry for X not found — occurs when you press Home in the Lexicon view. If you edit the family name in the little edit window that opens when you want to edit a single category in the Lexicon view, and then press Home, the program searches for the original string which is missing now due to editing.
  - Undo all changes till now? — occurs when you press Undo All in the Lexicon view. All edits till then will be lost if you go ahead and okay it.
  - Invalid element in rules hash: X — occurs when
  - Syntax error at X — occurs when parsing cannot be completed due to some error in the syntax. You can reach back to it in the Edit view through the line number
  - Unexpected end of file — occurs when
  - Unknown file in --omit-output argument — occurs when
  - Errors during compilation, files not output. — occurs when

Solutions

Solution 1
feature {
  case<2>: nom acc;
  num<2>: sg pl;
  pers<2>: 1st 2nd 3rd;
}

Back
Solution 2
family N {
  entry: n<2>[X]: X:sem-obj(*);
}

Note the semantic type ''sem-obj'' is the top level of the ontology, allowing things with the category ''N'' to be of any of the semantic types in the ontology. Again, the asterisk (''*'') represents the semantic proposition denoted by the lexical item.

Back
Solution 3
word the:Det;
word a:Det: sg;

Back
Solution 4
Basic category:

family TransV(V) {
  entry: s \ np / np;
}

Add features:

family TransV(V) {
  entry: s<1> \ np<2> [nom] / np<3> [acc];
}

Add semantic variables:

family TransV(V) {
  entry: s<1> [E] \ np<2> [X nom] / np<3> [Y acc];
}

Add logical form:

family TransV(V) {
  entry: s<1> [E] \ np<2> [X nom] / np<3> [Y acc]:
  E:action(* <Actor>X:animate-being <Patient>Y:sem-obj);
}

Back
Solution 5
testbed {
   .
   .
   the policeman saw the book: 1;
   the policeman saw the books: 1;
   the policemen see the books: 1;
   the policeman see the book: 0;
}

Back
Solution 6
word pro3:Pro(sem-obj) {
   she: 3rd sg nom sg-X;
   her: 3rd sg acc sg-X;
   they: 3rd pl pl-X;
}

If you like, add the masculine pronouns as well as a gender feature. You might also think about whether this is the ideal way to specify third person plural.

You could also write declaration for 2nd person pronouns. Question: is this really needed? How much inflection is there?

Back
Solution 7
noun(peach, peaches, thing)
noun(boy, boys, person)
noun(policeman, policemen, person)

Back
Comments