Get Version


→ ‘dsl’ → ‘ruby’


Semr is the gateway drug framework to supporting natural language processing in you application. It’s goal is to follow the 80/20 rule where 80% of what you want to express in a DSL is possible in familiar way to how developers normally solve solutions. (Note: There are other more flexible solutions but also come with a higher learing curve, i.e. like treetop)

Project Page Rdoc


sudo gem install semr

The semr gem uses the oniguruma library to leverage more mature regular expression features then what ruby supports. This library is part of the ruby 1.9 build but it needs to be installed if running semr on 1.8.* For more info on gem:

sudo gem install oniguruma

The basics

There are two constructs when defining the grammar in semr…a phrase and a concept.


These are the structure of statements supported by the DSL and what concepts are to be matched. When a user’s statement matches a phrase, the concepts are extracted, processed as defined by the concept (see normalizers) and then passed to the block defined by the phrase for execution. Phrases are matched in the order they are defined and are only matched to one phrase. Semr delimits a users input by the newlines which means each line represents a statement that will be executed by one of the phrases if it matches.

  phrase 'match this :some_concept here' do |concept|
    #when the user types a statement that matches the phrase, 
    #the concept, :some_concept, is extracted and passed to this block


Concepts are the meaningful variations in the DSL that we want to perform certain actions on. Examples of concepts are any word that changes, a number, multiple words in quotes, or an ActiveRecord model class. After predefining concepts then you express the phrases that the concepts will appear in.

  #Concepts are just named regular expressions
  concept :number,  /([0-9]*)/ 
  #any_name is a more explicit way to express a number 
  #(see: expressions in rdoc for more)
  concept :number,  any_number 
  #normalizers handle the concepts before passing to 
  #the phrase block  (see: normailzers in rdoc for more)
  concept :number,  any_number, :normalize => as_fixnum  
  #matches any word
  concept :a_work, /(\w+)/
  #matches only hi, goodbye, hello
  concept :greeting, words('hi', 'goodbye', 'hello') 
  #matches any string of characters in quotes i.e. 'some words'
  concept :greeting, words_in_quaotes                
  #built in support for rails applications, allow users 
  #to directly reference ActiveRecord models.
  concept :model,    all_models                      


After processing all of the user’s statements, semr returns the hash result. This hash is accessible to the phrases during execution and is used to store all the output of each phrases execution.

  phrase 'match this :some_concept here' do |concept|
    context[:value] = concept
  context = language.parse('match this thing here')
  context[:value] #=> 'thing'

Demonstration of usage

Here is a basic example that lets the user express how often to greet someone. Valid greetings are goodbye, hi and hello.

require 'rubygems'
require 'semr'

language = Semr::Language.create do #also accepts a path to a file instead of a block
  concept :number,    any_number, :normalize => as_fixnum
  concept :greeting,  words('hi', 'goodbye', 'hello')
  phrase 'say :greeting :number times' do |greeting, number|
    number.to_i.times { puts greeting }

language.parse('say hello 6 times')
# hello
# hello
# hello
# hello
# hello
# hello

language.parse('say goodbye 2 times')
# goodbye
# goodbye

A more complicated example below allows the user to express finding our domain concepts (i.e. models) in relatively non technical terms

require 'rubygems'
require 'semr'

concept :action, word('feature', 'show', 'highlight')
concept :model,  word(*all_models.with_synonyms.and.with_plurals.to_a), :normalize => as_class

phrase ':action :model <where><with><from> :attribute <is> :criteria' do |action, model, attribute, criteria|
  context[action] ||= []
  context[action] << model.find(:first, :conditions => {attribute.to_s => criteria})

context = language.parse('feature people where age is 30')
context['feature'] #=> an array of Person models where age is 30

context = language.parse('show events from city bangalore')
context['events'] #=> an array of events in bangalore

context = language.parse("Highlight articles with title 'Growth in China.'")
context['highlight'] #=> an array of articles


How to submit patches

Read the 8 steps for fixing other people’s code You can fetch the source from:

git clone git://


This code is free to use under the terms of the MIT license.


Comments are welcome. Send an email to Matthew Deiters or email the semr group via forum

FIXME full name, 14th November 2008
Theme extended from Paul Battley