Marpa

I hope Marpa will become the standard for parsing languages that are too complex for regular expressions to handle. Marpa's first stable release is Marpa::XS, a C language extension of Perl.

What Marpa Does

  • Marpa parses anything you can write in BNF, including ambiguous and even infinitely ambiguous grammars.
  • Marpa easily and efficiently handles both left- and right-recursion.
  • If a grammar is in one of the classes in practical use today, Marpa parses it in O(n) (linear) time.
  • Marpa never goes exponential.  Worst case, even for wildly ambiguous grammars, is O(n3) (cubic) time.
  • Marpa's run-time error detection is revolutionary.  Marpa has complete situational awareness.  It knows at all times which rules it is attempting to apply, how far it has progressed in them, and exactly which tokens it can accept.  And Marpa can communicate its situational awareness back to the application.
  • Marpa allows the application to efficiently retry rejected input.  This, combined with its situational awareness, allows "Ruby Slippers" parsing:  When an application has a token rejected, it can ask the parse engine which tokens would be acceptable.  It can then create a virtual token that allows the parse to continue -- the application can make the parse engine's "wishes" come true.  The Ruby Slippers are easy to use and surprisingly wide in their application.

What Marpa is

Marpa is a new parsing algorithm with a decades-long heritage.  Its lineage starts with the algorithm invented by Jay Earley.  Marpa is the first algorithm to combine the improvements to Earley's algorithm made by Joop Leo with those discovered by John Aycock and R. Nigel Horspool.  Marpa's "situational awareness", and Ruby Slippers parsing, are a new feature.


Posts about Marpa

I blog about Marpa on my Ocean of Awareness blog.  Here are the more important posts:
  • Marpa::XS is now 1.000000 is my latest release announcement
  • How to Parse HTML describes a new strategy for parsing HTML, using Marpa and the Ruby Slippers techniques it makes possible.  It is the most worked out example of Marpa use that I've done to date.  The series continues in How to parse HTML, part 2; and  How to parse HTML, part 3.
  • What is the Marpa algorithm? lays out the essentials of the Marpa algorithm.
  • Marpa and the Ruby Slippers talks more about Marpa and the Ruby Slippers.
  • Which Marpa distribution to use? states it with a lot more detail, but the message boils down to this: use Marpa::XS, if that is at all possible.
  • Marpa v. Perl regexes: some numbers is a benchmark.  It pits Marpa directly against Perl regexes.  Marpa will never beat regular expressions at their own game, but regular expressions are often taken out of their game, and in those cases the results can get interesting.
  • Why I Stuck with Perl:  I liked Python very much, and at the time I started Marpa, Python looked like a better choice in terms of mindshare.  In this post, I explain why the initial Marpa implementation uses Perl.
  • Jay Earley's Idea:  A user-friendly description of Earley's algorithm.  Marpa's algorithm is much more complicated, but at heart, Marpa is an Earley parser.
  • Marpa and OO:  Comments on the relationship between object-oriented programming and Marpa.
  • What?  No Lexer!:  A generation of programmers have grown up identifying lexing with parsing.  Marpa is not a lexer, and Marpa::XS does not even contain a lexer.  This post explains why, and how to get your lexing done if you are using Marpa. (This post is rapidly becoming outdated.  Several people have undertaken lexers specifically intended as Marpa front-ends, and I expect that one or more will soon be on CPAN.)
  • Developing parsers incrementally with Marpa:  Hints on how to get started with Marpa::XS.
  • User experiences with Marpa:  How users react to Marpa.  Of course, I am a hopelessly biased source. but I do get a lot of feedback.
  • Making the parsing game safe:  What Marpa means for Domain-Specific Languages (DSLs) and Language-Oriented Programming (LOP).

The Bovicide Series

Yacc and its derivatives are considered the standard in parser generators.  Yacc is part of a great series of traditions it was my privilege to watch being formed, and I have the greatest respect for its inventors.  But to my mind yacc is one tradition more honored in the breach than in the observance.  In this series I describe what will be required for a parsing algorithm to displace yacc's LALR, and why I think the Marpa algorithm has what it takes and more.

The first three requirements are described in my first post, Killing Yacc: 1, 2 & 3:
  • Requirement 1: Handle Arbitrary Context-Free Grammars in O(n3)
  • Requirement 2: Handle "Average" Grammars in Linear Time
  • Requirement 3: A Sound Theoretical Basis
Error reporting has long been overlooked, and it is something at which yacc was astonishingly bad.  I looked at the issue of development-time error reporting in Why the Bovicidal Rage?:
  • Requirement 4: Easy Diagnosis of Grammar Problems
And in Bovicide 5: Parse-time Error Reporting, I looked at the other half of error reporting.
  • Requirement 5: Good Reporting of Parse-time Errors
Both hand-written recursive descent and Marpa meet, to various degrees, all five of the previous requirements.  The last requirement, stated in Bovicide 6: The Final Requirement, was the tie-breaker.  Hand-written recursive descent has many strengths, but it will never be available as a library.
  • Requirement 6: Available as a Library

The "Perl and parsing" series

My "Perl and parsing" series used Perl to illustrate parsing theory and parsing theory to investigate Perl.  I am not one of those who believes that mathematics takes place in vacuum -- it follows trends and even what are more accurately called fashions.  Therefore this series has a lot of discussion of how current parsing theory came to be.

  • Parsing with Ruby Slippers: This is the post where I debuted the "Ruby Slippers" parsing technique, using both HTML and an example from Perl to illustrate it.
  • Parsing Perl 2: Down the Garden Path: Larry Wall did an amazing job getting LALR to work for the Perl grammar, but occasionally the limits show through.  The post contains an example.
  • Parsing Perl 3: Perl and Minimalism: I had intended a couple of posts about Perl & parsing.  At this point I realized the topic was expanding on me.  This post is about the influence of minimalism on computer languages and parsing theory.  Larry Wall is perhaps the only language design who has managed to escape the influence of minimalism and that is the reason Perl is so different from other languages.
  • Minimalism: Philosophy, Tradition, or Lunacy?   An  explanation of why the programmers of my generation and prior -- the ones who shaped the field -- are, almost without exception, stanch minimalists.  Not numbered as part of the series, but it fits into the discussion of minimalism.
  • Perl and Parsing 4: The Golden Age of Right Parsing: Having nibbled at philosophy, I now charge into a history of parsing theory.  My approach is unusual.  I treat parsing theory as what it is: a human endeavor.  This post describes the rise of LR, or right parsing.
  • Perl and Parsing 5: Rewind: If you require evidence that parsing theory can be as much about trends, ideology and fashion as it is about mathematics and techology, here it is.  The dominant parsing algorithm for production compilers today is hand-written recursive descent, exactly the one that was displaced in 1975, when the technology was almost unimaginably less powerful.
  • Perl and Parsing 6: Error Handling: A discussion of Perl error detection.  Despite the limits of its underlying LALR parse engine, the Perl interpreter can seem psychic at times.  I show how the conjuring is done, and give some other examples where the limits imposed by LALR parsing show up very clearly.
  • Perl and Parsing 7: Do List Operators have Left/Right Precedence?: The documents describe Perl's list operators in terms of left- and right-precedence.  Despite the fancy names, these are not terms from parsing theory and are not well-defined.  In this post I show that, whatever their meaning, it is wrong, and I propose an alternative.
  • Perl and Parsing 8: The Where and Why of Rejection: Sometimes Perl is just not that into your syntax.  An illustration of the limits of LALR error detection, using Perl.
  • Perl and Parsing 9: "Use" and the Ruby Slippers: While Marpa is the first parser to allow it in anything close to its full power and generality, Ruby Slippers parsing is not unprecedented.  Perl parses its use statement by dummying up tokens based on the grammar.  But in order to use the Ruby Slippers, so unhelpful is LALR, Perl has to simulate the parse inside the lexer.
  • Perl and Parsing 10: "Use" the Easier Way: In this post I use Marpa to parse Perl's use statement.  Marpa is a lot better at the Ruby Slippers technique, but here the problem can be solved even more simply with another of Marpa's extended capabilities: ambiguous tokens.  Ambiguous tokens are tokens whose type is left uncertain.  It is left up to the parser to decide which token to use, something which Marpa can do easily.
  • Perl and Parsing 11: Are all Perl programs parseable?  The answer is no.  Because I was the first to formally prove it, I initially got the blame, and now get the credit as the discoverer that Perl parsing is undecidable.  While a very interesting result, its significance can be overstated.  In particular, while undecidable parsing has definite benefits for the Perl user, no downside has ever been demonstrated.
  • Perl and Parsing 12: Beating up on the "use" statement.  I had spotted a gotcha in the parsing of the "use" statement, which I saved up from from the previous two posts.  In the meantime chromatic noticed another.  Piling on is really not fair but, hey, can you really blame me for unlucky timing?

Examples

Here are some simple & elegant examples of Marpa parsers that others have done:
Here is a more ambitious one:

Theory

For those interested in the mathematics behind Marpa, there's a paper with pseudocode, and proofs of correctness and of my complexity claims.