User manual for MonaSearch

  1. Running MonaSearch
  2. Installation instructions
    1. Requirements
    2. Linux
    3. Mac
    4. Windows
  3. First execution
  4. Using MonaSearch
    1. Quick start
    2. Choosing a treebank
    3. Composing formulas
      1. Base formulas
      2. Complex formulas
    4. Submitting your query
    5. Saving your queries
  5. Keyboard shortcuts
  6. Command Line Interface
  7. Developers

Running MonaSearch

Thanks to Java Web Start, you can simply run MonaSearch by clicking the link on the homepage. This should work for the major platforms (Win, Linux, Mac); contact me if it doesn’t.

It will install a desktop shortcut after first launch, so you don’t have to start your browser each time. On platforms where this is supported, it will also insert a submenu in the main menu.

It will ask you to trust a certificate. This is necessary since MonaSearch needs access to the file system to store preprocessed tree banks and intermediate results. I brewed a home-made certificate with digital fingerprint 64:60:D0:63:92:1A:54:88:EF:E7:AE:2A:AF:24:08:54:1C:6A:AE:93. If this seems hokey to you, contact us and we’ll try to get a real certificate.

Java Web Start also takes care of automatically updating the program if a newer version is available. This will require a restart of the program.

Note that it is still necessary to install MONA first.

Installation instructions

Requirements

Java 6 (Mac users see below for where to get it).

MONA

You can install MONA by either using one of the installation packages on the homepage, or getting it from the MONA website. There is a README in the links on the homepage which would be interesting.

Linux

To check which version of java is the default type:

java -version

You should see something like this:

java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)

or, on 64-bit systems:

java version
"1.6.0_06" Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)

Extract MonaSearch anywhere you like. Open a terminal and cd into the MonaSearch directory. In that directory there should be files called MonaSearch.jar and libmona.so. To run MonaSearch from the current directory simply type:

java -Djava.library.path=. -jar MonaSearch.jar

Alternatively, you can copy libmona.so into a directory where the JVM looks for it and omit the -D option. To find out which directories are suited for this, you have to find out what the variable java.library.path in the JVM is set to. The easiest way to find out is to run Roedy Green’s Wassup applet. You’ll have to grant it full rights, since otherwise it won’t show the restricted values, of which java.library.path is one. Typical values for Linux include /lib and /usr/lib (/usr/lib64 probably won’t do!).

Make sure the library and the java you are running are compiled for the same architecture (especially, a 32-bit Java will complain about a 64-bit library, and converse).

Mac

MonaSearch requires a 64 bit Intel Mac with Java 6. If enough people ask for a version that works with Java 5, we’ll consider making one, so feel free to ask! Java 6 for Leopard is available from Apple’s update site. If you are running Tiger you can try SoyLatte.

Whichever java you choose you’ll probably want to make it your default Java. To do that add java 6 to your path by adding the following to your ~/.profile (if you use SoyLatte modify it accordingly):

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home export PATH=$JAVA_HOME/bin:$PATH

Then at the console type

source ~/.profile

Extract MonaSearch anywhere you like. Open a console and cd into the MonaSearch directory. In that directory there should be files called MonaSearch.jar and libmona.jnilib. To run MonaSearch from the current directory simply type

java -Djava.library.path=. -jar MonaSearch.jar

Alternatively, you can copy libmona.jnilib into a directory where the JVM looks for it and omit the -D option. To find out which directories are suited for this, you have to find out what the variable java.library.path in the JVM is set to. The easiest way to find out is to run Roedy Green’s Wassup applet. You’ll have to grant it full rights, since otherwise it won’t show the restricted values, of which java.library.path is one. Typical values for Mac include the current directory, /Library/Java/Extensions, /System/Library/Java/Extensions and /usr/lib/java.

Windows

Extract MonaSearch anywhere you like. Open a Command Line (under ) and cd into the MonaSearch directory. In that directory there should be files called MonaSearch.jar and mona.dll. To run MonaSearch from the current directory simply type

java -Djava.library.path=. -jar MonaSearch.jar

Alternatively, you can copy mona.dll into a directory where the JVM looks for it and omit the -D option. To find out which directories are suited for this, you have to find out what the variable java.library.path in the JVM is set to. The easiest way to find out is to run Roedy Green’s Wassup applet. You’ll have to grant it full rights, since otherwise it won’t show the restricted values, of which java.library.path is one. Typical values for Windows include C:\Windows and C:\Windows\System32. In Windows, the PATH is also included, as is the current working directory, so if you’re in the same directory the library is, it will work without the -D parameter as well.

Make sure the library and the java you are running are compiled for the same architecture (especially, a 32-bit Java will complain about a 64-bit library, and converse).

First execution

At the first execution, MonaSearch will ask you to choose a directory where preprocessed treebanks and temporary results can be stored. While you can choose any directory for this purpose, it is not necessary to access it yourself at any time. Therefore, it is recommended to make it a hidden directory. Suggested places are:

Linux and Mac
~/.MonaSearch
Windows
%APPDATA%/MonaSearch[]

Java remembers where this directory is in a preference file. A lot of other stuff is stored there as well, such as the window size, the location of the last edited file and treebank and so on. If something goes awry you can try deleting those settings, have a look at Roedy’s page on Java preferences for how to get at them.

Using MonaSearch

The main window allows you to compose queries and submit them for querying. Several related tasks can be done here, such as selecting a treebank and saving the queries to file.
The MonaSearch GUI

Quick start

For people in a hurry, here is a quick summary of the steps:

  1. Choose a treebank.
  2. Compose your query with the menus or open a file containing them.
  3. Make sure the query to submit is a closed formula.
  4. Hit “Submit”.
  5. Review results and save them in a file.

Choosing a treebank

First of all you will want to choose a treebank to pose your queries on. You can do this by clicking the button “Treebank” and choosing a file. For now, only the NEGRA export format is supported.

The first time, MonaSearch will precompile the treebank and store this information in the directory you chose at first execution. As long as this process is going on, you cannot submit queries, but you can already start composing them.

Composing formulas

The central field is a kind of scrap book with formulas. When constructing a query, most people think in a bottom-up fashion focusing first on the atomic relations. This approach is systematically supported: first you add the atomic properties such as node labels or relations between nodes (like dominance), then complex formulas are constructed from simpler ones via boolean connectives and quantification.

Base formulas

Base formulas are divided in two sorts: properties of nodes and relations between nodes or between nodes and node sets. Properties of nodes are linguistic labels such as the category of function for non-terminal nodes and word, lemma, morphology or part-of-speech information for terminal nodes. Relations between nodes can be various forms of dominance, precedence and equality, whereas relations involving node sets can be the containment of a node in a set or the equality of sets.

To enter a base formula, choose the appropriate formula from the “Base” menu. A dialog will show up asking for the name of the node or node set involved. A second dialog will ask you for the name of the second node or node set in case of a binary relation; in case of a node property, the dialog will ask for the value of that particular property.

The names of variables can be chosen freely, with the restriction that names for nodes must be lowercase and names of node sets must be upper case, as is usual in logic.

Linear precedence is defined bottom up. I.e., it is firstly defined on the terminal nodes, because they have a clearly defined linear order. For internal nodes u and v it is defined that u precedes v iff the complete subtree dominated by u precedes the complete subtree dominated by v.

In case of crossing branches this means that neither u precedes v nor vice versa.

Formally: u linearly precedes v if for there is are nodes x, y, z such that y and z are daughters of x, y is to the left of z, y dominates u and z dominates v.

Complex formulas

Once the base relations have been entered, you will want to combine them to more complex formulas. In order to do this, you select the formulas to be combined; you can either do this by clicking on them while holding the Ctrl key, or by holding the Shift key and using the arrow keys.

For implication, the order of the formulas is important; the upper one will be considered the first one. If need be, you can swap formulas by selecting them and choosing “Swap” in the “Query” menu, or by dragging them around with the mouse.

For conjunction and disjunction, you can select an arbitrary number of formulas.

For the different kinds of quantification, the name of the variable that is quantified over will be asked in a dialog.

The newly created formula will appear at the top of the scrap book.

Submitting your query

An important restriction on supported queries is that they have to be closed. This means that for every variable which occurs in the formula, there must be a quantification over it, either universal or existential. If you do not know what to do with your variables, it is safest to just put existential quantifiers before each of them.

Now that your query is finished, you can submit it. Select it and click the “Submit” button. After a while a window will pop up showing you the results of the query.

Evaluating the results

The results, if any, are presented as a list of tree identifiers. This identifier is the number or the tree in the original treebank file.
The results screen in MonaSearch

We are working on a visualization component which will show you the tree and the sentence, check back in a next release!

Saving the results

You can save your results by hitting the save button. They will be stored as a simple text file containing one result per line.

Saving your queries

Since composing queries is tedious, you do not want to do this over and over again. Therefore, it is possible to store the queries in a file. Choose “Save” in the “File” menu and enter the name of the file. You can then open this file next time to continue editing the queries or resubmitting them by choosing “File” → “Open”.

Keyboard shortcuts

For power users, there are a lot of keyboard shortcuts available:

Keyboard shortcuts in MonaSearch
Command Shortcut
Submit Enter
Choose treebank Ctrl + T
Open file Ctrl + O
Save file Ctrl + S
Save file as Ctrl + Shift + S
Quit MonaSearch Ctrl + Q
Delete a query Delete
Clear the query field Ctrl + Delete
Dominance Ctrl + D
Proper dominance Ctrl + G
Immediate dominance Ctrl + y
Precedence Ctrl + P
Equality Ctrl + Z
Membership (inclusion) Ctrl + I
Category Ctrl + C
Word Ctrl + W
Morphology Ctrl + M
Lemma Ctrl + L
Grammatical function Ctrl + F
Negation Ctrl + R
Conjunction Ctrl + A (broken)
Disjunction Ctrl + V
Implication Ctrl + J
Universal quantification Ctrl + U
Existential quantification Ctrl + E
Save results Ctrl + S
Close result window Ctrl + W

Command Line Interface

For scripting purposes, a simple command line interface is provided. The usage message is as follows:

Usage:
MonaSearch [options] <query>
options
  --help (-?,-h)            print this message
  verbosity                 determines how much information is written to the
                            command line
    --quiet (-q)            be extra quiet
    --verbose (-v)          be extra verbose
  --logfile (-l) <file>     use given file for log
  --treebank (-t) <name>    use the given treebank (identified by its name)
  --baseFileDir <dir>       directory where additional metadata, such as
                            preprocessed treebanks and precompiled queries
                            reside

The results of the query are written to the standard output, so in the simplest case, this means you would simply invoke it as follows:

java -jar MonaSearch.jar "query" > results.txt

All the other options are set to sensible defaults automatically. That is, just like in the GUI, your preferences are stored containing which treebank you last queried, what the base directory as etc. You can however specify a treebank, a directory to store intermediate results and a logfile.

These options are also remembered from the graphical user interface, so the easiest thing to do is to run MonaSearch in graphical mode once and set all options as wanted. Then afterwards you only need to specify the query at the command line.

Command Line Arguments

--help, -h, -?
Indeed does nothing else but printing the message above and exiting the program.
--quiet, -q
Output less information
--verbose, -v
Output more progress information
--baseFileDir
Use the given directory for storing preprocessed corpora, intermediate results and log files. If none specified, it takes the same from the previous execution. If that value cannot be found, it uses ~/.MonaSearch.
--logfile, -l
Write error information to the given log file. If not specified, it will be written to a file called MonaSearchXX.log in the base directory, where XX is a number.
--treebank, -t
Use this treebank for querying. If the treebank is not preprocessed yet, it will be. If none specified, it takes the same from the previous execution.
<query>
The query to execute. The format is a Lisp-like format borrowed from fsq, described in the following section. Note that you have to protect spaces in the query against the shell. This is best done by surrounding the whole query with single or double quotes.

As of now the verbosity options do nothing, but this is WIP. Do tell which information you would like to see.

Formula Syntax

The syntax of formulas is LISP-like, i.e., each (sub-)formula is surrounded by braces (), and there is a strict prefix notation, the functional head always comes first.

Formulas are divided into atomic formulas and complex formulas. Atomic formulas are further divided into 2 groups. The first group comprises formulas for node labels, the second group comprises relations between nodes and between nodes and sets of nodes.

The same restriction as in the GUI holds: queries must be closed. When in doubt, simply prefix an appropriate number of existential quantifiers.

In the following, let x and y be variables, φ, φ1, φ2, φ3, … and ψ formulas.

Variables

A variable is a string of letters or numbers, delimited by white space. Examples: x,y, z, X,Y, Z, v12, V21, 3, … By convention, lowercase letters are used for first-order variables and uppercase letters are used for second-order variables, but the syntax is precise enough that this does not have to be enforced.

Node label formulas

(word x T)
node x has word T (this can be a punctuation character or other also)
(lemma x L)
node x has lemma L
(cat x C)
node x is of category (has POS tag) C
(morph x M)
node x has morphological tag M
(fct x F)
node x is of grammatical function (has edge label) F

All label descriptions are expected to be regular expressions as provided by Java. Normally, this should not bother you, but if you get unexpected results, try escaping special symbols with a double backslash.

Node and set relations

(> x y)
node x is the mother of y
(>+ x y)
node x properly dominates y
>+ is the transitive closure of >
(. x y)
node x precedes y
(= x y)
equality of nodes x and y
(in x X)
node x is a member of set X
(sub X Y)
set X is a subset of set Y
(= X Y)
equality of sets X and Y

Complex Formulas

(! φ)
negation of φ
(& φ1 … φn)
conjunction of φ1 … φn
(| φ1 … φn)
disjunction of φ1 … φn
Disjunctions and conjunctions can have arbitrary width.
(-> φ ψ)
implication: φ implies ψ
(<-> φ ψ)
bi-implication: φ if and only if ψ
(A1 x φ)
first-order universal quantification of x in the formula φ
(E1 x φ)
first-order existential quantification of x in the formula φ
(A2 X φ)
second-order universal quantification of X in the formula φ
(E2 X φ)
second-order existential quantification of X in the formula φ

Developers

Developers can discuss stuff on the Launchpad page.

[] To know what %APPDATA% looks like, open a run command (WinKey+r) and enter it there. Using this directory will probably work well in Windows XP, but I am unsure about Vista. Please report suggestions/problems.