Wiki‎ > ‎OpenCCG‎ > ‎

VisCCG Tutorial

Tutorial for VisCCG

Main UT OpenCCG page

This is a first tutorial which walks you, step by step, through the process of writing a very small English grammar for OpenCCG using VisCCG. The grammar we're going to write is the 'tinytiny' English grammar, originally written by Ben Wing. As Ben says, this is a truly minimal grammar for CCG. If you haven't gone through the WebCCG tutorial, it might be helpful to do that first, to get a feel for grammar writing. BUT be aware that the format used for specifying grammars in WebCCG is not the same as DotCCG.

The instructions below assume you're working on a UTCL lab machine.

Here is the complete tinytiny grammar.

The advanced topics page contains important information for writing your own grammars.

First steps : nuts & bolts

First, let's get some terminology straight. OpenCCG is a comprehensive system for writing, editing, and running grammars in the Combinatory Categorial Grammar (CCG) framework. The bulk of this tutorial is about using VisCCG, an editing and grammar visualization application for OpenCCG.

Second, open up two terminals, one for running the CCG editor and one for testing the grammar using ''tccg''. Please ask if you need help with this!

Running OpenCCG

The software is installed on the lab machines. To be able to run the OpenCCG command line tools, do the following if you are on the UTCL lab computers:

$ source /usr/local/bin/ccg-setup

Alternatively, if you are on your home computer or another system, download the latest version of OpenCCG, and install it in ''/usr/local/openccg'', the following will add the tools to your path:

$ export JAVA_HOME=/usr/local/java
$ export OPENCCG_HOME=/usr/local/openccg

If you would prefer to install OpenCCG somewhere else, change the path of the first line as appropriate. Also, set JAVA_HOME to where Java is installed on your system.

NOTE: Mac Users — your Java home is likely to be in a directory similar to ''/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home''.

Create a directory for this grammar and open up that directory (don't type the ''$''; it's the command prompt).

$ mkdir tinytiny
$ cd tinytiny

To begin editing a grammar, you first need to create the file you want to edit using the ''touch'' command. Invoke the editor with ''visccg'' — like this :

$ visccg

You should now have an editor window showing a blank grammar with suggestions for where to put your grammar specifications. Now select File->Save As and provide the filename ''tinytiny.ccg''.
Basics of DotCCG syntax
DotCCG was developed by Ben Wing as a front-end for conveniently specifying OpenCCG grammars. This is a major improvement over the native XML format, which is not human-friendly and involves a lot of redundant specification of information. DotCCG is designed to be expressive, concise, and human-friendly, with as little required duplication as possible.

The general feel of the syntax is like C, Java, or Perl. Except in expansions (discussed later), things like indentation, whitespace, and even commas are subject to no restrictions. To comment out a line, use ''#'', as shown below. (Commented lines are lines that the computer 'doesn't see' — this is your space for making notes & providing any sort of information you like.)

#### tinytiny.ccg ####

# A truly minimal grammar for CCG
# Ben Wing, May 2006

Top of page

Architecture of the grammar

The old way of specifying grammars with OpenCCG required the user to write and maintain six separate XML files. The new system runs from a single ''.ccg'' file with five subsections. The order of the sections is not fixed, although features and expansions do need to be declared/defined before they can be used.

This is where features are declared. Declaring features allows for simple specification of features in  lexical entries and categories.

This is where words are declared and associated with particular categories, features, and lexical items. This is also where morphological information gets specified.

This section specifies the rules allowed or disallowed in the grammar. The following rules are enabled by default:
  • application (forward & backward)
  • composition (forward & backward)
  • crossed composition (forward & backward)
This is where lexical families are declared. Lexical families consist of one or more category declaration and optional declaration of lexical items which are members of that family. For example, in English the lexical family ''Det'' has just a single category: ''np/^n''. The family for ditransitive verbs, though, has two possible categories, one for the double object construction and one for the pp-complement construction.

The testbed section is used for testing the grammar. The contents of this section are a list of sentences and the number of parses you expect the grammar to find for each sentence. Specifying the number of parses is useful for making sure the grammar doesn't overgenerate:

 the policeman saw me: 1;
 the policeman saw I: 0;

Top of page

Template for grammar
In grammar writing, it is very helpful to separate these five subsections in the ''.ccg'' file. Copy and paste the grammar template into the blank Edit window.

Now save this change by clicking the ''Save'' button in the upper right-hand corner of the editor window. Next to this button are ''Help'' and ''Quit'' buttons — there's no ''Help'' yet, sorry...

Using the editor
The editor has five tabs — one is the **Edit** window, where you can see the whole ''.ccg'' file. The other four correspond to four of the five grammar subsections.

The editor is designed with local, wiki-style editing. This makes it very easy to modify your grammar and to see the effects of changes before making them final. For example, features are edited from inside the Features window, and individual categories in the Lexicon window each can be edited individually. The same applies for the Rules and Testbed windows.

Changes are NOT saved automatically — it's up to you to click that button. The cool thing about this is that you can see the effect of changes without saving them — you can try things out without having to save them.
The scope of the grammar
Before we get started writing our ''tinytiny'' grammar, it's useful to think about the range of phenomena we want the grammar to capture. As the name implies, this grammar covers a limited core subset of constructions in English. In general, it's a good idea to start with one simple type of construction and gradually increase the complexity of the grammar.

''tinytiny'' covers these aspects of English grammar:
  • declarative sentences only (i.e. no questions or imperatives)
  • transitive verbs
  • intransitive verbs
  • verbs showing transitive/intransitive alternation
  • common nouns, both singular and plural
  • determiners
  • personal pronouns
We'll start by looking at intransitive verbs.

Top of page

Intransitive verbs

The grammar needs to properly handle the following sentences:

testbed {
  the policeman sleeps: 1;
  the policemen sleeps: 0;
  the policemen sleep: 1;
  the policeman sleeps the peach: 0;

Add these to your testbed by clicking on the Edit button inside the Testbed window and entering the code in the box above. This is essentially a single declaration of a set of sentences paired with their correct number of parses — we are declaring the contents of the testbed. The contents of the testbed must be inside a pair of braces, and each item must end with a semi-colon, even if there's only one item in the testbed.
Lexical entry
This section describes, bit by bit, how to implement the lexical family for intransitive verbs. To enter a brand new family, use the Edit window. To make changes to a category once it has been created, use the local editing functionality of the Lexicon tab.

First, nothing fancy. Create the lexical family for intransitive verbs and give it one entry: the basic category for intransitives in English. When creating a new family, the family name should either be the same as a part of speech, or the part of speech should be given in parentheses after the family name. The syntax for ''family'' declarations has the same basic shape as that for the testbed (braces and semicolons).

family IntransV(V) {
  entry: s \ np;

Now switch to the Lexicon tab and you'll see a graphic representation of the family and entry you just created. Go ahead and save these changes.
Features & feature structures
Of course this entry is too pared-down and will overgenerate, allowing sentences like me slept. This brings us to the use of features and feature-structures. Feature-structure IDs are assigned to category elements using angled brackets. Modify your ''IntransV'' entry by right-clicking on the name of the family (IntransV).

family IntransV(V) {
  entry: s<1> \ np<2>;

When you are done, right-click again on the name of the family and select Done.

It doesn't really matter what number you use, but the entry above follows the usual conventions for assigning feature-structure numbers. The next step is to indicate that the ''np'' sought by this verb must be nominative case. Features are enclosed in square brackets following.

family IntransV(V) {
  entry: s<1> \ np<2> [nom];

Now would be a good time to look at the check boxes at the top of the Lexicon window, which let you choose how much information you want to see. Note that two of these don't seem to change anything. We'll get to semantics in a minute, but what about **full-form features**?

Without a declaration of the feature, ''nom'' doesn't mean anything yet. Move to the Features tab and click the Edit button. Each feature declaration consists of a feature type followed by a number, followed by possible values for the feature. The number is a feature-structure ID — when an atomic category in a lexical category definition has the corresponding ID, the feature will be inserted into that feature-structure for that category.

feature {
  case<2>: nom acc;

Now you can see what the **full-form features** checkbox does.

NOTE: There is a known bug which is that if your grammar doesn't have a 'feature {}' declaration to begin with, the Features tab won't show anything. You should add a feature specification, quit VisCCG and then reload — after that the Features tab will display information correctly.


Top of page
Exercise 1
Expand the feature declaration to include a ''num'' feature (''sg'',''pl'') and a ''pers'' feature (''1st'',''2nd'',''3rd'').


Feature hierarchies
It is sometimes useful to specify a hierarchy of features. For example, in English we want to have a ''non-3rd'' person feature to handle subject-verb agreement. To write a feature hierarchy with DotCCG, the name of the higher level of the hierarchy is followed by a list of the elements in the next lower level, enclosed in braces. Modify the ''pers'' feature you just wrote to include a ''non-3rd'' hierarchy encompassing ''1st'' and ''2nd''.

pers<2>: non-3rd {1st 2nd} 3rd;

Feature hierarchies can be as deep as they need to be. We'll write a bigger hierarchy later for semantic types.
Adding semantics
We use semantic variables to associate semantics with particular atomic elements of the category. Generally, ''E'' is used as an event variable, and ''X'' and ''Y'' are used to refer to participants in the event. Semantic variables appear in square brackets with the other features. Within the brackets, order doesn't matter, and separating the features with commas is allowed but is not required.

Here's the intransitive verb category enhanced with semantic variables.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom];

Now we use these semantic variables to relate the syntactic category with the logical form. This is also where we provide some additional semantic information (''action'' and ''animate-being''). Let's take this in a couple of steps. First, we create the basic logical form.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom] : E (* <Actor>X);

The asterisk stands in for the proposition expressed by the verb instantiating the category. It's possible to give semantic types to each of the semantic variables, as shown below.

family IntransV(V) {
  entry: s<1> [E] \ np<2> [X nom] : E:action(* <Actor>X:animate-being);

Once you've modified this entry by adding the semantics, hit Done, take a look at how things have changed, and then click over to the Features tab. Just like any other features, the semantic types must be declared as features. We can also declare semantic features such as tense or semantic number in this way. Note the nesting in the ontology. Add these semantic features to your feature declaration (don't enter the dots —> these are meant to show that stuff is missing).

feature {
  tense<E>: past pres;
  sem-num<X>: sg-X pl-X;

  ontology: sem-obj {
     phys-obj {
        animate-being {person animal}
     situation {
        change {action}
Instantiating the category (adding words)
Now we want to instantiate the category with lexical items. This is done in the Words section of the grammar. Note that VisCCG's Words tab only displays the words — you'll need to edit them by going to the main Edit window.

To declare a word, we use the following format: ''word'' followed by the lexical item and its category (or categories), followed by any features associated with the lexical item. Semantic types can be supplied in parentheses immediately following the category (see the entries for ''policeman'' and ''policemen''). Note that the same lexical item (''sleep'' below) can be given more than one entry.

word the:Det;
word policeman:N(person): sg 3rd;
word policemen:N(person): pl 3rd;
word sleep:IntransV: pres non-3rd sg;
word sleep:IntransV: pres pl;
word sleeps:IntransV: pres 3rd sg;
word slept:IntransV: past;

Specifying each word individually, as shown above, involves a lot of repetitive work. This is where expansions come in — expansions are a powerful tool for reducing redundant specification. We'll get to expansions in a minute, or you can jump down now.

Top of page
Exercise 2
The word declarations shown above use some categories we don't have yet. Create the lexical family ''N''.

We also need a category for determiners. Recall that our basic category for ''Det'' in English is ''np / n'' but with constraints on the mode of combination. We also want features of the ''n'' to be passed up to the result category ''np''.

To set the two feature structures to be the same, we simply give them the same feature-structure ID. Create this family in the Edit window and then take a look at what you get in the Lexicon window.

family Det {
  entry: np<2> / n<2>;

Modalities are implemented in the CCG GUI by "decorating" the slashes with the following characters:
  • ''*'' is the application only modality
  • ''x'' is the crossing modality
  • ''^'' is the diamond, or harmonic, modality
  • ''.'' or unspecified is the anything goes modality
Now add the diamond modality to the slash in the ''Det'' category. Use the local edit button in the Lexicon window. Once you've added the modality, click Done and see the result! While you're in the local editing mode, you can scroll through the entire grammar file. The Home button will take you back to the category you were first editing. Before hitting Done, you can use the Undo button to undo any changes you've made during the current local editing session (in other words, since the last time you hit the small Edit button).

While we're at it, let's add features and semantic variables.

family Det {
  entry: np<2> [X pers=3rd] /^ n<2> [X];

What about the semantics for ''Det''?  The semantic contribution of the determiner is essentially a modification of the ''n''. To implement this, we need to create a new semantic relation, which we'll call ''det''. The relation is declared as a parameter of the lexical family, and once it's declared we can use it as part of the logical form. Make these changes and take a look at the new semantic representation for ''Det''.

family Det(indexRel="det") {
  entry: np<2> [X pers=3rd] /^ n<2> [X]: X:sem-obj(<det>*);

Top of page
Exercise 3
We've already written a word declaration for the definite determiner. Write a declaration for the indefinite determiner a. What feature(s) does a convey that the does not?


The big payoff : testing the grammar

Now we're ready to try parsing a sentence with ''tinytiny''.
The first step is to generate the XML files required by the parser. If you already have a second terminal window open, switch to that terminal and ''cd'' into the directory where ''tinytiny.ccg'' is stored. Type the following command:

$ ccg2xml tinytiny.ccg

As the XML files are created, you'll see messages on the screen.

You can also use the ''-p'' option (for ''prefix'') to tell ''ccg2xml'' not to use any prefix on the files.

$ ccg2xml -p "" tinytiny.ccg

This command produces the following output:

ccg2xml: Processing tinytiny.ccg
Outputting XML file: ./lexicon.xml
Outputting XML file: ./grammar.xml
Outputting XML file: ./morph.xml
Outputting XML file: ./rules.xml
Outputting XML file: ./testbed.xml
Outputting XML file: ./types.xml

Now we don't have to specify the argument for ''tccg'', and as long as we're in the right directory, we can simply use this command to run the grammar:

$ tccg
Loading grammar from URL: file:/home/apalmer/test/grammar.xml
Grammar 'tinytiny.ccg' loaded.

Enter strings to parse.
Type ':r' to realize selected reading of previous parse.
Type ':h' for help on display options and ':q' to quit.
You can use the tab key for command completion,
Ctrl-P (prev) and Ctrl-N (next) to access the command history,
and emacs-style control keys to edit the line.


Running the parser
Now we load the grammar into the parser using the command ''tccg'' (stands for "text CCG"). This command takes one argument, the grammar file generated by ''ccg2xml''.

$ tccg tinytiny-grammar.xml

Here are some useful commands:
  • '':h'' shows a list of options
  • '':derivs'' and ''noderivs'' toggle between showing the full derivations and not showing them
  • '':sem'' and '':nosem'' toggle between showing the full semantics and not showing them
  • '':feats'' and '':nofeats'' toggle between showing the features and not showing them
  • '':q'' takes you out of tccg.
To parse a sentence, simply type the sentence (all lowercase, no punctuation) at the ''tccg>'' prompt. Try parsing the policeman sleeps — look at the output you get with and without derivations, semantics, and features. Don't worry if you don't understand everything immediately. You'll get more comfortable with the representations as you use the system more.

IMPORTANT If you are having trouble loading the grammar and/or parsing sentences, ask for help right away.

If you make changes to the .ccg file and want to try parsing with the new version of the grammar, you'll need to exit ''tccg'', rerun ''ccg2xml'', and reload the grammar.
Running the testbed
The testbed function of OpenCCG provides a nice way for testing the effects of changes in analysis throughout the grammar. A well-designed testbed contains a set of sentences (both grammatical and ungrammatical sentences) which cover the range of phenomena you want your grammar to cover, making sure the grammar gets all of the examples you want it to get but doesn't overgenerate.

To run the testbed, run the following command from the command line:

 $ ccg-test -norealization -g tinytiny-grammar.xml tinytiny-testbed.xml

Transitive verbs

Now we can move on to transitive verbs. For now, we'll write an entirely separate category for transitive verbs, but a more elegant solution might be to define transitive verbs in relation to intransitive verbs. We'll deal with this topic in the next tutorial.

Top of page
Exercise 4
Create the lexical family for transitive verbs. Start with the basic category, then add features and semantic variables. Finally, create the logical form.


Exercise 5
Now add the following sentences to the testbed with the correct number of parses.

the policeman saw the book
the policeman saw the books
the policemen see the books
the policeman see the book



What about a verb like 'eat'?

Some verbs, like eat, can be transitive or intransitive. This is no problem — we simply give the word membership in both lexical families (both ''IntransV'' and ''TransV'').

word eat:IntransV: pres non-3rd sg;
word eat:IntransV: pres pl;
word eat:TransV: pres non-3rd sg;
word eat:TransV: pres pl;
word eats: .... (etc.)

Of course, writing all of these separate entries is a really tedious process, and one of the great things about DotCCG is that it makes it really easy to eliminate most of these redundancies. This brings us to the topic of more efficient word declarations, but first, add these sentences to the testbed:

the boys eat
the boys eat the peaches

Efficient word declaration

Rather than writing a separate declaration for each form of the same lemma, we can avoid redundantly specifying information with two devices: (1) methods for declaring inflected forms with a single word declaration, and (2) using expansions to do string rewrites. This section covers those two topics.

For more detailed coverage, see the advanced topics on expansions and word declarations.
Multiple inflections: pronouns
Let's use the case of pronouns to look at how to declare inflected forms of a word with a single declaration. To declare the various forms of the first-person pronoun, we'll use an umbrella category ''pro1'' and list the individual forms and their features in braces following.

word pro1:Pro(animate-being){
  I: 1st sg nom sg-X;
  me: 1st sg acc sg-X;
  we: 1st pl nom pl-X;
  us: 1st pl acc pl-X;

Note that we've used a category (''Pro'') that we haven't declared yet. This one will work nicely:

family Pro {
  entry: np<2> [X]: X:sem-obj(*);

Top of page
Exercise 6
Write the declaration for 3rd person feminine pronouns.

Using expansions
The basic idea behind expansions is very simple: they simply do string substitution, but are made more powerful by the fact that they can take parameters. To define an expansion, give the name of the expansion (''change_me'' in the example below) with its parameters (''Stem, Newstem''), followed by the substitute text (''Newstem'').

def change_me(Stem, Newstem) {

This is a really, really dumb expansion. Basically what it says is 'every time an expression of the form ''change_me(X,Y)'' occurs, replace it with ''Y''. Another way to think of it is as a rewrite rule with variables: ''change_me(X, Y) —> Y ''. In expansions, parameters function as variables

Let's look at an example using nouns. We're going to write an expansion which will replace a simple statement with a word declaration which declares the inflected forms. Take a look (a more detailed explanation follows):

def noun(Sing, Plur, Class) {
  word Sing:N(Class) {
    *: sg sg-X;
    Plur: pl pl-X;

In these types of declarations, the ''*'' is replaced with the lemma/umbrella category (which is the first thing you see after the word ''word''.

Here we have defined an expansion called ''noun'' which takes three arguments —
  • ''Sing'' is a variable for the single form of the noun
  • ''Plur'' is a variable for the plural form of the noun
  • ''Class'' is a variable for the semantic sort of the noun
When the editor sees a call to that expansion, it "replaces" the call with whatever it finds in braces. In this case, what's in the braces is a word declaration for nouns, replacing any variables with the values supplied. So if we write ''noun(book, books, thing)'', the variable ''Sing'' is instantiated as ''book'', ''Plur'' is instantiated as ''books'', and ''Class'' is instantiated as ''thing''. So the complete word declaration looks like this (but you don't have to write all of this, all you have to write is ''noun(book, books, thing)''.

word book:N(thing) {
    *: sg sg-X;
    books: pl pl-X;

NOTE This is the only case where order of declarations matters. Expansions must be defined before they can be used.

Top of page
Exercise 7
Add the ''noun'' expansion in the ''Words'' section of your .ccg file (this has to be done in the Edit tab) and then look at the results in the Words tab.

Now add ''noun'' declarations for peach, boy, and policeman.



Commas, quotes, reserved words, etc.
More about features
Members of families

Error messages

  - Integer value too large — occurs when
  - Illegal character — occurs when
  - Redefining macro — occurs when
  - Bad tuple pair — occurs when
  - Property X not found in Y — occurs when
  - Family/part-of-speech X not found (word declaration Y) — occurs when
  - Original entry for X not found — occurs when you press Home in the Lexicon view. If you edit the family name in the little edit window that opens when you want to edit a single category in the Lexicon view, and then press Home, the program searches for the original string which is missing now due to editing.
  - Undo all changes till now? — occurs when you press Undo All in the Lexicon view. All edits till then will be lost if you go ahead and okay it.
  - Invalid element in rules hash: X — occurs when
  - Syntax error at X — occurs when parsing cannot be completed due to some error in the syntax. You can reach back to it in the Edit view through the line number
  - Unexpected end of file — occurs when
  - Unknown file in --omit-output argument — occurs when
  - Errors during compilation, files not output. — occurs when

Top of page


Solution 1
feature {
  case<2>: nom acc;
  num<2>: sg pl;
  pers<2>: 1st 2nd 3rd;

Solution 2
family N {
  entry: n<2>[X]: X:sem-obj(*);

Note the semantic type — ''sem-obj'' is the top level of the ontology, allowing things with the category ''N'' to be of any of the semantic types in the ontology. Again, the asterisk (''*'') represents the semantic proposition denoted by the lexical item.

Solution 3
word the:Det;
word a:Det: sg;

Solution 4
Basic category:

family TransV(V) {
  entry: s \ np / np;

Add features:

family TransV(V) {
  entry: s<1> \ np<2> [nom] / np<3> [acc];

Add semantic variables:

family TransV(V) {
  entry: s<1> [E] \ np<2> [X nom] / np<3> [Y acc];

Add logical form:

family TransV(V) {
  entry: s<1> [E] \ np<2> [X nom] / np<3> [Y acc]:
  E:action(* <Actor>X:animate-being <Patient>Y:sem-obj);

Solution 5
testbed {
   the policeman saw the book: 1;
   the policeman saw the books: 1;
   the policemen see the books: 1;
   the policeman see the book: 0;

Solution 6
word pro3:Pro(sem-obj) {
   she: 3rd sg nom sg-X;
   her: 3rd sg acc sg-X;
   they: 3rd pl pl-X;

If you like, add the masculine pronouns as well as a gender feature. You might also think about whether this is the ideal way to specify third person plural.

You could also write declaration for 2nd person pronouns. Question: is this really needed? How much inflection is there?

Solution 7
noun(peach, peaches, thing)
noun(boy, boys, person)
noun(policeman, policemen, person)