Categories
Uncategorized

Scala Data Validation Series Part 1- Why You Should Become Familiar With Monads (For Comprehensions) And Either[E, B] For Validating Data

My goal in this first part is very humble, to convince you to stop, for God’s sake, stop throwing exceptions during your data validation in Scala.

I will provide you with friendly incremental examples to make a case that…

Throwing exceptions in Scala is self limiting:

throw new Exception(“kaboom”) # Please never do this

And instead you should embrace the following:

  1. Either[Error, A] as your main transport type for objects in your code
  2. And for comprehensions where you can compose a result.

This blog posting style, again, will be incremental. We will start by presenting a java like example, meaning code will respond to corrupt data by throwing exceptions. Then we will gradually move forward as we add scala features until we reach our goal, to take advantage of the capabilities of scala to produce cleaner scala code with powerful validation techniques.

There will be peaks and valleys of outcomes during this process so get ready for the ride!

I will be presenting scala 3.X code instead of scala 2.X code because I would love this blog to be relevant for at least a couple of years.

We will not be using external libraries. The reason I do this is because the reader should be aware that most scala shops are very opinionated on the types of libraries they use. Don’t be surprised if you land a job where you are not allowed to use cats, or ZIO or any library that uses macros.

There are multiple files that support this posting available in repo https://github.com/scala-blog/validation-part1/tree/v1.

Step 1: Throwing Exceptions All Over Town

File Example1JavaLikeValidation.scala summarizes how I often validated data with Java. We had validate() procedures that would either throw exceptions or just move on. If you look at the actual file, you will see all validate(*) methods throw exceptions when something is wrong.

Here is a snippet from where we do validations very much like we do in java from file Example1JavaLikeValidation.scala:

val goodEmail: String = "mrme@xmail.com"
val goodSSN: String = "111-11-2345"
val goodAge: String = "49"

validateEmail(goodEmail)
validateSSN(goodSSN)
validateAge(goodAge)

val goodResult: String =  f"email: ${goodEmail}, ssn: ${goodSSN}, age $goodAge}"

val badAge = "old"
val badSSN = "abc-22-2212"

validateAge(badAge)
validateSSN(badSSN)

val badResult: String =  f"email: ${goodEmail}, ssn: ${badSSN}, age $badAge}"

println(badResult)
println(goodResult)

Here are some issues I see with the code above:

  1. We will not be able to see the output of command println(goodResult) due to an exception thrown lines above. We can salvage this situation by adding a few try-catch, thus making the business logic much harder to understand.
  2. We will not know if badSSN was actually a bad SSN since we already threw an exception where we have validateAge(badAge) . What if we want to find a way to track all the times we had bad SSNs? We again, will have to rely on a series of try-catches and again, that would make the business logic harder to understand.
  3. Even when we validated all the fields, we are still dealing with strings. How do we know for a fact that string goodSSN is actually a social security number, or goodAge is actually a person’s age, or goodEmail is actually an email? This issue can become conflictive if the developer validating the fields is not the same as the developer putting together the business logic.
  4. We don’t know if an exception will be thrown after all validate(*) methods are completed. Sure in, in my simple example, it can be easy to tell. But complex code could fail anywhere. Can we handle unexpected exceptions in a better more “expected” way?
  5. Much of the validation process involves parsing data. Why not reuse much of this code to build actual data types for age, ssn and email to make our code safer?

Step 2: Adding Case Classes

In this step, we add a little flavor of scala by adding case classes. Hopefully we can address issue #3 from step 1 above. Here is a snippet from file Example2JavaLikeValidationWithCaseClasses.scala where we create objects representing real types:

case class SSN(area: Int, group: Int, serial: Int)
case class Email(user: String, domain: String)
case class Age(age: Int)

val goodEmail: String = "mrme@xmail.com"
val goodSSN: String = "111-11-2345"
val goodAge: String = "49"

validateEmail(goodEmail)
validateSSN(goodSSN)
validateAge(goodAge)

val splitEmail: Array[String] = goodEmail.split("@")
val emailUser = splitEmail(0)
val emailDomain = splitEmail(1)
val email: Email = Email(emailUser, emailDomain)

val ageInt = goodAge.toInt
val age: Age = Age(ageInt)

val ssnSplit = goodSSN.split("-")
val ssnArea = ssnSplit(0).toInt
val sshGroup = ssnSplit(1).toInt
val sshSerial = ssnSplit(2).toInt
val ssn: SSN = SSN(ssnArea, sshGroup, sshSerial)

val goodResult: String = f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"
println(goodResult)

Case classes allows us to formalize the types of data we are dealing with. However, we have also added more issues by trying to introduce scala features.

Here is what we gained at this point:

  • We addressed issue #3 from step 1 above. We don’t deal with strings anymore, but with real emails, SSNs and ages.

But we also added more issues while trying to write actual scala code!

  1. For example, we are re executing the split method for validation and for data parsing. This type of duplicate execution is unavoidable when validation is separated from parsing.
  2. Code is becoming harder to read. All those split methods, array access methods are making the business logic unreadable.

Please, don’t ask yourself “what is the point of using scala” yet. Yes, this is a valley in the learning process. But keep following, as this is a necessary learning step.

Step 3: Using Factory Methods From Companion Objects

In this step we will clean up the mess we did in step #2 by adding factory methods inside companion objects. We delegate the responsibility of parsing and validating the string to companion objects.

Now the code looks a bit cleaner (Thank God!). Here is a snippet from file Example3JavaLikeValidationWithCaseClassesAndFactoryMethods.scala:

..
...
  case class Age private (age: Int)
  object Age:
    def fromString(string: String):Age =
      if(string == null)
        throw new Error("Email is null")
      else if(string.size == 0)
        throw new Error("Email is empty")
      else if(string.size != string.filter(_.isDigit).size)
        throw new Error(s"Email $string is malformed")
      else
        Age(string.toInt)
  
  
  val goodEmail: String = "mrme@xmail.com"
  val goodSSN: String = "111-11-2345"
  val goodAge: String = "49"
  
  
  val email: Email = Email.fromString(goodEmail)
  val age: Age = Age.fromString(goodAge)
  val ssn: SSN = SSN.fromString(goodSSN)
  
  val goodResult: String =  f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"
  
  val badAge: String = "old"
  val badSSN: String = "abc-22-2212"

  val ageBroken: Age = Age.fromString(badAge)
  val ssnBroken: SSN = SSN.fromString(badSSN) // We will never make it to here and we will need spaghetti code to handle this case

  // never happens due to previous exceptions
  val badResult: String =  f"email: ${email.user}@${email.domain}, ssn: ${ssnBroken.area}-${ssnBroken.group}-${ssnBroken.serial}, age ${ageBroken.age}"
  

  println(badResult)
  println(goodResult)

In this step, we gained a little, namely..

  • Validation and building of data types for email, ssn, and age are delegated to methods within companion objects. This is a more logical placement for parsing and validating of data from strings instead of doing it a level above, as we have been doing it in step #2. Plus, we can avoid potentially re executing the same code, example, the split methods for Email and SSN parsing.

At this point, you may think we barely progressed from step #1. Don’t get too frustrated. It’s part of the learning process!

Step 4: Introducing Option[A] And For Comprehension

You are entering the no java zone. At this point we eliminate all exceptions and give you a sneak peek at the shape of how scala code should look like.

Here is a snippet from file Example4ValidationWithCaseClassesAndFactoryMethodsPlusOption.scala:

.
..
  case class Age private(age: Int)
  object Age:
    def fromString(string: String): Option[Age] =
      if (string == null)
        None
      else if (string.size == 0)
        None
      else if (string.size != string.filter(_.isDigit).size)
        None
      else
        Some(Age(string.toInt))

  val goodEmail: String = "mrme@xmail.com"
  val goodSSN: String = "111-11-2345"
  val goodAge: String = "49"
  
  
  val goodResult: Option[String] = for 
    email <- Email.fromString(goodEmail)
    age <- Age.fromString(goodAge)
    ssn <- SSN.fromString(goodSSN)
  yield f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"
  
  val badAge: String = "old"
  val badSSN: String = "abc-22-2212"

  val badResult: Option[String] = for 
    email <- Email.fromString(goodEmail)
    age <- Age.fromString(badAge)
    ssn <- SSN.fromString(badSSN)
  yield f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"  

  badResult match 
    case Some(good: String) => println(good)
    case None => println("I know something bad happened. Nothing else.") // this will print out

  goodResult match
    case Some(good: String) => println(good) // this will print out
    case None => println("I know something bad happened. Nothing else.")

There are two key incremental differences in this step:

  1. The factory methods on the companion objects now return Option[A] instead of the actual object of type , allowing a potential None object to be returned.
  2. We are using for comprehensions to glue the building of a result, Namely.
val goodResult: Option[String] = for 
email <- Email.fromString(goodEmail)
age <- Age.fromString(goodAge)
ssn <- SSN.fromString(goodSSN)
yield f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"

At this point, I invite you to take a pause, meditate and fully digest this step if feel you are in unfamiliar territory.

In code above, if any of the fromString(*) methods return a None, the whole goodResult value becomes None. That is what the for comprehension does for you, for free, otherwise you get a burrito (container) of type Some[String] containing the entire correctly formatted string result.

And as usually, during this learning process, we gained some points but we lost some as well.

Here is what we gained at this step:

  1. This pattern allows you to have a clear view of the business logic, while the validation is done in the background. If any of the fromString(*) methods fail, goodResult becomes None.
  2. We can finally see the printed results of goodResult and badResult in the code, handled nicely addressing issue #1 in step #1.

However, unfortunately we also went backwards. In fact, we maybe worse off than step #1!

  • There is no error generation in this code! When a factory method fails, it just returns None, and we don’t know what None means.

We are basically wiping out any information about errors when we return None when a fromString(*) method cannot build a data object.

I intentionally added this step, not to frustrate you further, but as a convenient learning transition to Either[E, A]

Step 5: Replacing Option[A] With Either[E, A]

This is when stuff gets real. With Either[E, A] we can finally carry error information in a consistent manner when we return from fromString(*) methods.

Since Either[E, A] is a cousin of Option[A] (they both implement map and flatMap) we can still reuse much of the code from step above, including the for comprehensions we added in step #4 above.

Here is a snippet from file Example5ValidationWithEither.scala:

.
..
case class Age private(age: Int)
object Age:
  def fromString(string: String): Either[String, Age] =
    if (string == null)
      Left("Age is null")
    else if (string.size == 0)
      Left("Age is empty")
    else if (string.size != string.filter(_.isDigit).size)
      Left(s"Age is malfored : $string")
    else
      Right(Age(string.toInt))

  val goodEmail: String = "mrme@xmail.com"
  val goodSSN: String = "111-11-2345"
  val goodAge: String = "49"

  val goodResult = for
  email <- Email.fromString(goodEmail)
  age <- Age.fromString(goodAge)
  ssn <- SSN.fromString(goodSSN)
    yield f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"
  
  val badAge = "old"
  val badSSN = "abc-22-2212"
  
  val badSsnEither: Either[String, SSN] = SSN.fromString(badSSN)
  
  val badResult = for
  email <- Email.fromString(goodEmail)
  age <- Age.fromString(badAge)
  ssn <- badSsnEither
    yield f"email: ${email.user}@${email.domain}, ssn: ${ssn.area}-${ssn.group}-${ssn.serial}, age ${age.age}"

  badResult match
    case Right(good) => println(good)
    case Left(e) => println(s"I know at least one bad thing bad happened. And I know what it is: ${e}") // this will print out

  goodResult match
    case Right(good) => println(good) // this will print out
    case Left(e) => println(s"I know at least one bad thing bad happened. And I know what it is: ${e}")

  badSsnEither match
    case Left(f) => println(s"BAD SSN TRACKER: ERROR: ${f}")

There is a lot of similarities from the previous step. However, we also added new and better stuff, namely:

  1. The fromString(*) methods now return Either[E, A] instead of Option[A]
  2. We brought back the error messages that we wiped out in step 4 (yey!). The errors are basically wrapped in a Left(errorString) expression.
  3. The good results are returned wrapped in Right(data). The convention is that what comes from Right(goodData) is the expected data object and what comes from Left(error) is bad because data was not parsed successfully.
  4. We track error details about failures in a safe manner using pattern matching (Example, check the last lines of the snippet code for this step)

Great! Finally we have achieved one significant gain…

  • We can easily track any bad data we wish without juggling with nested try-catch that makes business logic hard to understand. In this snippet I showed how we can track a bad SSN despite being parsed a few lines above without worrying about the order of failures.

If you reached this point with a good level of understanding, you are in the place where I am hoping you would be.

Step 6: Handling The Unexpected

You may have noticed I have not dealt with perhaps, the one elephant in the room, that even when we avoid throwing exceptions, it wont mean exceptions wont be thrown during execution.. duh. Perhaps, exceptions are thrown because we did something dumb like an out of index access. Or perhaps we are using libraries that throw exceptions under specific scenarios.

Despite the unexpected exceptions, we can “normalize” these cases and bring them into the scala realm. In this part, we convert exceptions to the Left part of Either[E, A]. Let me explain, consider this code, that doesn’t check for string being null:

object Email:
  def fromString(string: String): Either[String, Email] =
      val split = string.split("@")
      if (split.size != 2)
        Left(s"Email '${string}' is malformed")
      else
        Right(Email(user = split(0), domain = split(1)))

A null pointer exception will be thrown if string is null.

But wait.. does this mean we wasted all this time for nothing? What good is this blog if at the end of the day we can’t handle unexpected exceptions? Well, here is how we can handle these cases, including the cases where an external library may throw an unexpected exception:

object Email:
  def fromString(string: String): Either[String, Email] =
    Try {
      fromStringUnsafe(string)
    } match
      case Success(good) => good
      case Failure(exception) => Left(s"ERROR: Unexpected: parsing email '${string}': " + exception.getMessage)

  private def fromStringUnsafe(string: String): Either[String, Email] =
      val split = string.split("@")
      if (split.size != 2)
        Left(s"Email is malformed. Trying to parse '$string''")
      else
        Right(Email(user = split(0), domain = split(1)))

Of course, you may want to check for a null string, but keep in mind I am doing this to make a point. What we did above is to simply carry the information of an unexpected exception into the left side of the Either[E,A] which by default represents an error container. Perhaps, at this point you can perceive the type we use on the left side of the Either[E, A] in this blog (String) may be insufficient to carry the necessary error information. But that maybe a topic for another blog.

What did we achieve here? Simple..

  • We have handled unexpected exceptions using proper scala

File Example6HandlingTheUnexpected.scala shows how we change all factory methods to handle unexpected errors.

Step 7: Some Refactoring To Clean Up

We will do two refactorings in this step. We will refactor the Either[E, A] and we will also refactor the factory methods.

Using Either[String, A] all the time is cumbersome. If the left side is always a String, why do we need to repeat it all the time? We address this by using a type alias:

type EitherWithErrorString[A] = Either[String, A]

Now our factory method can look like the following instead:

object Email:
  def fromString(string: String): EitherWithErrorString[Email] =
    Try {
      fromStringUnsafe(string)
    } match
      case Success(good) => good
      case Failure(exception) => Left(s"ERROR: Unexpected: parsing email '${string}': " + exception.getMessage)
...
..
.

The next refactoring is to put the fromString method in a trait so it can be reused in Email, SSN, and Age companion objects.

The resulting modifications in this step can be seen in file Example7TypeAndFactoryMethodRefactoring.scala

Summary: What Did We Gain In This Blog?

Some readers maybe panicking after seeing our last scala example quite a few lines longer than the first example in step 1. But we are doing a lot more, as we are addressing all the issues that were explained. I put together a java example that attempts to do the same as step 6, with the goal to explain what it may look like if we didn’t use scala features.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s