this is totally gonna work… » Blog Archive » PCI4R Update

PCI4R Update

We finally made some progress this week on the languishing pci4r project. First, congrats to Sandro Paganotti for the first commit to pci4r–the prize is in the mail. This morning, after a bit of git-fiddling, I managed to get the second commit for the project in. It’s code for document classification, which is the topic of Chapter 6 of Toby Segaran’s “Programming Collective Intelligence”. I deviated quite a bit from Toby’s original code. In some cases this was simply a side-effect of porting from Python to idiomatic Ruby. In other cases though changes were made for simple aesthetic reasons.

In short, here’s what you can do:

  c = Filtering::NaiveBayes.new

  c.train("Nobody owns the water", :good)
  c.train("the quick rabbit jumps fences", :good)
  c.train("buy pharmaceuticals now", :bad)
  c.train("make quick money at the online casino", :bad)
  c.train("the quick brown fox jumps", :good)

  c.prob("quick rabbit", :good)  #=> ~ 0.156
  c.prob("quick rabbit", :bad)   #=> ~ 0.050

Here we create a new NaiveBayes classifier, train it with some text and then query it with other text. Nifty eh? There is another classifier included in the package called Fisher, which has a slightly more clever classification algorithm.

Both of these default to in-memory storage of classification data. You can override it by using the built-in ActiveRecord persistence adapter like so:

  ar_config = Filtering::Persistence::ActiveRecordAdapter(
    :adapter => "sqlite3",
    :database => "mydb.sqlite3"
    :timeout => 5000
  )
  c = Filtering::NaiveBayes(ar_config)

Finally, there’s an executable in the ‘bin’ directory where you can interactively classify RSS feeds using either of the classifiers or persistence mechanisms provided. This relies on the
feed_tools gem.

So there it is. The rest of the pci4r team should be spooling up soon and hopefully we’ll make some more progress. Stay tuned…

This entry (Permalink) was posted on Sunday, March 23rd, 2008 at 11:13 am and is filed under Ruby. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response , or trackback from your own site.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>