Filtering using Scala Parser Combinators
Filtering using Scala Parser Combinators
So I got interesting task to enable record filtering according to some rules that we get from configuration represented as text.
well so far we know two things :
1. we need a filter that accepts a record as an input and might return a record as its output :well so far we know two things :
def filter(record:Record ):Option[Record]
*well I know that "null" is a dirty word in FP but that was the requirement since it was a mix-in with java project.
Following the rule of three so is our todo list :
- parse the text
- evaluate it's Boolean expression in a form of Abstract Syntax Tree.
- filter according the Boolean evaluation.
The complete Source Code can be found here (you are more than welcome to contribute ) Parser Combinators to the rescue
Inspired by this excellent post an introduction to scala parser combinators and by this stackoverflow answer .
let's start writing the preserved words that we will use :
object FilterConstants{ val And = "AND" val OR = "OR" val UnaryAnd = "&&" val UnaryOr = "||" val Equals = "=" val NotEquals = "!=" val Greater = ">" val Smaller = "<" val GreaterOrEqual = ">=" val SmallerOrEqual = "<=" val IsNull = "IS NULL" val IsNotNull = "IS NOT NULL" }
and add our AST objects
next let's write the parser it self for that we will need extend JavaTokenParsers and PackratParsers . since JavaTokenParsers enables us to capture text using regex we need to do a little override to inherited regex of the string literal in order to capture single quote text .
object ASTObjects { sealed abstract class AST sealed abstract class BooleanExpression extends AST sealed abstract class Constant extends AST case class BooleanOperation(op: String, lhs: BooleanExpression, rhs: BooleanExpression) extends BooleanExpression case class Comparison[T <: Constant](lhs: Constant,op: String) extends BooleanExpression case class LRComparison (lhs: Variable, op: String, rhs: Constant) extends BooleanExpression case class NumericConstant(value: Double) extends Constant case object NullConstant extends Constant case class DateConstant(value: Date) extends Constant case class TimestampConstant(value: Timestamp) extends Constant case class StringConstant(value: String) extends Constant case object EmptyConstant extends Constant case class Variable(name: String) extends Constant { def eval(env: Map[String, Constant]) = env(name) override def toString = name } }
The Parser
trait ConditionParser extends JavaTokenParsers with PackratParsers { import FilterConstants._ override def stringLiteral: Parser[String] = """(["'])[^"']*\1""".r var simpleDateFormat:SimpleDateFormat = new SimpleDateFormat("yyyy-mm-dd") val d = """'+\d{4}-\d{2}-\d{2}+'""".r // "'2015-03-25'".matches("""'+\d{4}-\d{2}-\d{2}+'""") def dateLiteral: Parser[String] = d val ts = """'+\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(\.\d)?+'""".r def dateTimeLiteral: Parser[String] = ts val booleanOperator : PackratParser[String] = literal(UnaryOr) | literal(UnaryAnd) | literal(And) | literal(OR) val comparisonOperator : PackratParser[String] = literal(SmallerOrEqual) | literal(GreaterOrEqual) | literal(Equals) | literal(NotEquals) | literal(Smaller) | literal(Greater) val nullCompare : PackratParser[String] = literal(IsNotNull) | literal(IsNull) val constant : PackratParser[Constant] = floatingPointNumber ^^ { x => NumericConstant(x.toDouble) } | stringLiteral ^^ {s => StringConstant(s.replaceAll("'","")) } | dateLiteral ^^ {d => DateConstant(new Date(simpleDateFormat.parse(d).getTime))} | dateTimeLiteral ^^ {d => DateConstant(new Date(simpleDateFormat.parse(d).getTime))} | stringLiteral ^^ {s => StringConstant(s.replaceAll("'","")) } val variable = ident ^^ {s=>Variable(s)} val comparison : PackratParser[BooleanExpression] = (variable ~ comparisonOperator ~ constant) ^^ { case lhs ~ op ~ rhs => LRComparison(lhs, op, rhs) } | (variable ~ nullCompare) ^^ {case lhs ~ nc => Comparison(lhs,nc)} lazy val p1 : PackratParser[BooleanExpression] = booleanOperation | comparison val booleanOperation : PackratParser[BooleanExpression]= (p1 ~ booleanOperator ~ p1) ^^ { case lhs ~ op ~ rhs => BooleanOperation(op, lhs, rhs) }| "(" ~> booleanOperation <~ ")" def parse(text:String) :Option[BooleanExpression] = { val res = parseAll(p1, text) res match { case Success(r, n) => Some(r) case Failure(msg, n) => println("Failure parsing "+ text+"\nMessage: " +msg) None case Error(msg, n) => println("Error"+msg) None } } }
I do not want to get into those funny symbols but just a short intro to symbols I used above:
~ match a successful parser following other successful parser
| match a parser OR other parser
^^ that if parser succeeds it's content will be used in the following function
<~ if the parser on the left succeeds it is ignored and continues with right side
we can write some tests :
class ConditionParserTest extends FlatSpec with ShouldMatchers with ConditionParser{ "Parser" should "Parse Null comparison" in { parse("foo IS NULL") should equal (Some(Comparison(Variable("foo"),IsNull))) parse("foo IS NOT NULL") should equal (Some(Comparison(Variable("foo"),IsNotNull))) } it should " parse left right comparison numeric values" in { parse("foo = 8") shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(8D))) parse("foo = 8.5") shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(8.5))) parse("foo = -8")shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(-8D))) } it should " parse left right comparison String values" in { parse("foo = 'BAR'") shouldEqual Some(LRComparison(Variable("foo"),Equals,StringConstant("BAR"))) parse("foo != 'BAR'") shouldEqual Some(LRComparison(Variable("foo"),NotEquals,StringConstant("BAR"))) } }
so this concludes the Parsing part
Evaluating the Boolean expressions
trait Evaluator{ import FilterConstants._ def evaluate(expression:BooleanExpression)(implicit env: Map[String, Constant]) : Boolean = expression match { case LRComparison(v:Variable, Equals, c) => (v.eval(env),c) match{ case (StringConstant(l),StringConstant(r)) => l == r case (NumericConstant(l),NumericConstant(r)) => l == r case (StringConstant(l),NumericConstant(r)) => r == l.toDouble case (a,b) => println(s"FAILED to evaluate '==' got $a not equal to $b" ) false } case LRComparison(v:Variable, NotEquals, c) => (v.eval(env),c) match{ case (StringConstant(l),StringConstant(r)) => l != r case (NumericConstant(l),NumericConstant(r)) => l != r case (StringConstant(l),NumericConstant(r)) => r != l.toDouble case (a,b) => println(s"FAILED to evaluate != got $a not equal to $b" ) false } case LRComparison(v:Variable, Greater, c) => (v.eval(env),c) match{ case (NumericConstant(l),NumericConstant(r)) => l > r case (StringConstant(l),NumericConstant(r)) => l.toDouble > r case (a,b) => println(s"FAILED to evaluate $Greater got $a not equal to $b" ) false } case LRComparison(v:Variable, Smaller, c) => (v.eval(env),c) match{ case (NumericConstant(l),NumericConstant(r)) => l < r case (StringConstant(l),NumericConstant(r)) => l.toDouble < r case (a,b) => println(s"FAILED to evaluate $Smaller got $a not equal to $b" ) false } case LRComparison(v:Variable, SmallerOrEqual, c) => (v.eval(env),c) match{ case (NumericConstant(l),NumericConstant(r)) => l <= r case (StringConstant(l),NumericConstant(r)) => l.toDouble <= r case (a,b) => println(s"FAILED to evaluate $SmallerOrEqual got $a not equal to $b" ) false } case LRComparison(v:Variable, GreaterOrEqual, c) => (v.eval(env),c) match{ case (NumericConstant(l),NumericConstant(r)) => l >= r case (StringConstant(l),NumericConstant(r)) => l.toDouble >= r case (a,b) => println(s"FAILED to evaluate $GreaterOrEqual got $a not equal to $b" ) false } case Comparison(v:Variable,IsNotNull) => v.eval(env) match{ case EmptyConstant => false case _ => true } case Comparison(v:Variable,IsNull) => v.eval(env) match{ case EmptyConstant => true case _ => false } case BooleanOperation(UnaryOr, a, b) => evaluate(a) || evaluate(b) case BooleanOperation(OR, a, b) => evaluate(a) || evaluate(b) case BooleanOperation(UnaryAnd, a, b) => evaluate(a) && evaluate(b) case BooleanOperation(And, a, b) => evaluate(a) && evaluate(b) } }
Finally our Filter
class ConditionalFilter(ruleText:String) extends ConditionParser with Evaluator{ val rule = parse(ruleText) val fields = rule match { case Some(r) => extractFields(r) case _ => Nil } def extractFields(b:BooleanExpression):List[String] = b match { case BooleanOperation(_,l,r) => extractFields(l):::extractFields(r) case LRComparison(l,_,_) => List(l.toString) case Comparison(l,_) => List(l.toString) case _ => Nil } def getMap(record:Record,fields:List[String]):Map[String, Constant] = { fields.foldLeft(Map.empty[String, Constant]){(m,f) => m + (f -> { Option(record.getFieldValue(f)) match { case Some(v) => StringConstant (v.toString) case _ => EmptyConstant } }) } } def filter(record:Record ):Option[Record] = { implicit val m = getMap(record,fields) if (rule.exists(evaluate)){ Some(record) } else None } }
Now it is all complete and we can
Test it
class ConditionalFilterTest extends FlatSpec with ShouldMatchers{ def getRecord(fields:Map[String,String]) = Record(fields) def getFilter(expr:String ) = new ConditionalFilter(expr) "Filter by col_name with rule " should "IS NULL include records with null values " in{ val f = getFilter("foo IS NULL" ) val f2 = getFilter("foo IS NOT NULL" ) val r = getRecord(Map("foo"-> null,"bar"-> "5.2")) f.filter(r) should equal( Some(r)) f2.filter(r) should equal( None) } it should "IS NOT NULL remove records with null values " in{ val f = getFilter("foo IS NOT NULL" ) val f2 = getFilter("foo IS NULL" ) val r = getRecord(Map("foo"-> "foo","bar"-> "5.2")) f.filter(r) should equal( Some(r)) f2.filter(r) should equal( None) } it should "= output records that equal the value" in { val f = getFilter("foo = 'FOO'" ) val f2 = getFilter("foo = 'FOOx'" ) val r = getRecord(Map("foo"-> "FOO","bar"-> "5.2")) f.filter(r) should equal( Some(r)) f2.filter(r) should equal( None) } }
more tests and updates can be found in the source code here .
In my next post we I will extend the Expression ADT using Type classes and Implicits .
contributes/feedbacks/issues are welcome
contributes/feedbacks/issues are welcome
Cheers
Avi
Comments
Post a Comment