ohpauleez/themis

0.1.0-beta12


A validation library for Clojure distilled from practice

dependencies

org.clojure/clojure
1.5.1



(this space intentionally left almost blank)
 
(ns themis.core
  (require [themis.protocols :as protocols]
           [themis.extended-protos :as e-protos]
           [themis.rules :as rules]))

Themis

Why another validation library?

  1. The validation rule set should be expressed as data. Rule sets should be able to completely serialize to EDN.
  2. Validators should always resolve to something that supports IFn. The resolution of data to IFn-ables happens at validation time.
  3. Applying validation should be nothing more than applying functions and seq'ing the results together.
  4. Validation is often domain specific, and the library should be open for modification to adapt easily to different requirements.
  5. Validation results should not be conj'd/merge'd onto the original data (unless the user specifically did that).
  6. Validation rule sets should allow for proper namespacing of validators (symbols/functions/etc) and general data.

Assumptions and expectations

Themis attempts to make no assumptions about the data structure you're validating, the results your validator functions return, or how you want to package up the results of a full validation.

When assumptions are made, there is always an escape hatch allowing you to modify the behavior to better suit your application needs.

Why the name `Themis`?

I was originally using metis, a validation library named after Zeus' first wife. So, I naturally named mine after Zeus' second wife.

Ideal usage

See the comment block below, but (validation my-ds ds-rule-vec)

Fetch the data from within a data structure given coordinates. If the data structure is not extended to Navigable, try get-in. Note: Tuck our internal protocols behind a function for consumption

(defn navigate
  [t coordinate-vec]
  (if (satisfies? protocols/Navigable t)
    (protocols/-navigate t coordinate-vec)
    (get-in t coordinate-vec)))

Create a response vector for a validation call, given the original data structure, the coordinates, the validation function, and the validation optional arg map

(defn raw-validation
  [t coordinate-vec validation-fn opt-map]
  [(get opt-map ::return-coordinates coordinate-vec) (validation-fn t (navigate t coordinate-vec) opt-map)])

Given a single validation rule, pull apart the constituents and apply a raw-validation, returning back the validation result vector

(defn validate-vec
  [t validation-vec]
  (let [[coordinates validations] validation-vec]
    (partition-all 2
      (mapcat (fn [[validation-fn opt-map]]
                (raw-validation t coordinates validation-fn (assoc opt-map ::coordinates coordinates)))
              validations))))

Create a lazy sequence of validating a given data structure against a validation rule-set vector/seq. The resulting seq is of (coordinate validation-result) tuples/seq

(defn validation-seq
  [t rule-set]
  (let [normalized-rules (rules/normalize rule-set)]
    (when (rules/balanced? normalized-rules)
      (mapcat #(validate-vec t %) normalized-rules))))

Transform the results of a validation seq into a hashmap

(defn validation-seq->map
  [validation-seq]
   ;; TODO: This can definitely be done better
  (apply merge-with #(flatten (concat [%1] [%2]))
          (mapcat (fn [result-seq]
                    (map #(apply hash-map %) (partition-all 2 result-seq)))
                  validation-seq)))

Validate a data structure, t, against validation query/rule-set. The rule-set will be normalized if it is not already. Note: By default everything is returned in a map, keyed by the coordinate vector. Multiple validation results are conj'd together in a vector. You can optionally pass in a custom :merge-fn or :validation-seq-fn to process the validation and tailor the results

(defn validation
  [t rule-set & opts]
  (let [{:keys [merge-fn validation-seq-fn]} (merge {:merge-fn validation-seq->map
                                                     :validation-seq-fn validation-seq}
                                                    (apply hash-map opts))]
    (merge-fn (validation-seq-fn t rule-set))))

reckon is validation by another name. It exists only to make code using Themis as data generation more readable. See also: validation

(def 
  reckon validation)

Like validation-seq, but chunks rules based on the number of available cores, and validates the chunks in parallel.

(defn pvalidation-seq
  [t rule-set]
  (let [normalized-rules (rules/normalize rule-set)
        chunks (+ 2  (.. Runtime getRuntime availableProcessors))
        rule-count (count normalized-rules)
        chunked-rules (vec (partition-all (/ rule-count chunks) normalized-rules))
        validate-vec-fn #(validate-vec t %)]
    (when (rules/balanced? normalized-rules)
      (into [] (apply concat
                      (pmap #(mapcat validate-vec-fn %)
                            chunked-rules))))))

Like validation, but will create the validation-seqs in parallel via pvalidation-seq - which is based on the number of recognized cores. Note: :validation-seq-fn is ignored in this call.

(defn pvalidation
  [t rule-set & {:keys [merge-fn]}]
  (validation t rule-set
              :validation-seq-fn pvalidation-seq
              :merge-fn (or merge-fn validation-seq->map)))

preckon is pvalidation by another name. It exists only to make code using Themis as data generation more readable. See also: pvalidation

(def 
  preckon pvalidation)

Unfold the themis results map, expanding coordinates to nested maps, and remove nil results

(comment
  (def paul {:name {:first "Paul", :last "deGrandis"}
             :has-pet true
             :pets ["walter"]})
  (defn w-pets [t-map data-point opt-map]
    (assoc opt-map :pet-name-starts data-point))
  (defn degrandis-pets [t-map data-point opt-map]
    (and (= (get-in t-map [:name :last]) "deGrandis")
         (:has-pet t-map)
         nil))
  (require '[themis.validators :refer [from-predicate presence]])
  (require '[themis.predicates :as preds])
  (def paul-rules [[[:name :first] [[presence {:response {:text "First name is not there"}}]
                                    (fn [t-map data-point opt-map](Thread/sleep 500)(and (= data-point "Paul")
                                                                        {:a 1 :b 2}))]]
                   [[:pets 0] [(from-predicate preds/longer-than? 20 "Too short; Needs to be longer than 20")]]
                   [[:pets 0 0] [[::w-pets {:pet-name-starts }]
                                 (from-predicate char?)
                                 (from-predicate #(or (Thread/sleep 200) (= % \w)) "The first letter is not `w`")]]
                   ;[[:*] ['degrandis-pets]] ;This is valid, but we can also just write:
                   [:* 'degrandis-pets]])
  (def normal-paul-rules (rules/normalize paul-rules))
  (type (validation-seq paul paul-rules))
  (type (pvalidation-seq paul paul-rules))
  (time (validation paul paul-rules))
  (time (pvalidation paul paul-rules))
  (= (validation paul paul-rules) (pvalidation paul paul-rules))
  (mapcat identity (validation paul paul-rules :merge-fn (partial filter second)))
  (validation paul paul-rules :merge-fn (partial keep second))
  (defn unfold-result
    [themis-result-map]
    (reduce (fn [old [k-vec value]]
              (let [validation-value (remove nil? value)
                    seqd-value (not-empty validation-value)]
                (if seqd-value
                  (assoc-in old k-vec
                            (if (sequential? value)
                              (vec seqd-value)
                              value))
                  old)))
            nil themis-result-map))
  (unfold-result (validation paul paul-rules)))
 
(ns themis.extended-protos
  (:require [themis.protocols :as protocols]))
(extend-protocol protocols/Navigable

  clojure.lang.PersistentVector
  (-navigate [t coordinate-vec]
    (get-in t coordinate-vec))

 clojure.lang.IPersistentMap
 (-navigate [t coordinate-vec]
   (get-in t coordinate-vec))

  ;java.lang.String
  ;(-navigate [t coordinate-vec]
  ;  (get-in t coordinate-vec))
  )
 
(ns themis.predicates)

Predicates

It is often easier to reason about validtion composing smaller predicate functions. Below you'll find common ones supplied by Themis.

These also serve as an example of how two write application specific validators

(defn longer-than? [t length]
  (> (count t) length))
(defn shorter-than? [t length]
  (< (count t) length))
(defn length? [t length]
  (= (count t) length))
(defn length-between?
  ([t high]
   (length-between? t 0 high))
  ([t low high]
   (let [length (count t)]
     (<= low length high))))
(defn is-in? [t & items]
  (and (some #{t} items)
       true))
(defn is-not-in? [t & items]
  (not (some #{t} items)))
 
(ns themis.protocols)

Protocols

Themis validation is data structure agnostic, but it must be told how it navigates to coordinates within your data structure.

(defprotocol Navigable
  (-navigate [t coordinate-vec]))
 
(ns themis.rules)

Themis validation rules

Structure

The rule-set/query is specifically just data that gets ingested into some validation engine. This has a few key benefits like composability and packaging/serializing.

A rule-set is a vector of vector pairs. In its short form: [[:coordinate validation-fn]]

In its long form: [[[:coordinate] [vaildation-fn opt-map]]

The rule vectory pair is some coordinate into the data structure you're validating, and the validation function that should be applied at that location (on that data point). Care has been taken to make it work well with hash maps, but it should work equally well with other data structures - the engine is open for modification.

When listing multiple validation functions, it's best to treat the validation function vector as a binding form. You should always pass in an empty map for the options map: [[[:some-key] [one-validator {}, another-validator {}]]]

A normalized rule-set is also called a normalized-query in code

Ensure that every key/map selection is paired to some validation symbol/keyword/vec

(defn balanced?
  [validation-vec]
  (try
    (every? #(even? (count %)) validation-vec)
    (catch UnsupportedOperationException uoe
      (throw (Exception. (str "The validation vector must only contain vectors; " (.getMessage uoe))
                         uoe)))))

Like name, but respects namespaces

(defn nsed-name
  [sym-or-kw]
  (let [tname (name sym-or-kw)
        tns (namespace sym-or-kw)]
    (if tns
      (str tns "/" tname)
      tname)))

Given an element in a validation query, resolve it to a function

TODO potentially protocol this

(defn- normalize-item
  ([validation-item]
   (normalize-item validation-item #(throw (Exception. (str "Validation items must be symbols, keywords, strings, or functions. Not: " (type %))))))
  ([validation-item else-fn]
  (cond
    (symbol? validation-item) @(resolve validation-item)
    (keyword? validation-item) @(resolve (symbol (nsed-name validation-item)))
    (string? validation-item) @(resolve (symbol validation-item))
    (instance? clojure.lang.IFn validation-item) validation-item
    :else (else-fn validation-item))))

Given the validation function vectors, normalize them; resolving symbols/keywords to actual functions

(defn normalize-validation-fns
  [validation-fn-vec]
  (mapv (fn [v-fn]
           (if (vector? v-fn)
             [(normalize-item (first v-fn) identity) (or (second v-fn) {})]
             [(normalize-item v-fn identity) {}])) validation-fn-vec))

Properly wrap query items in vectors

(defn- vectorize
  ([x]
   (vectorize x vector))
  ([x vector-fn]
   (if (vector? x) x (vector-fn x))))

Ensure all coordinates and validators are in a vector (or vector of vectors); Ensure all validation functions are fully resolved

(defn normalize
  [validation-vec]
  (if (::normalized (meta validation-vec))
    validation-vec
    (with-meta
      (map (fn [[coordinates validation]]
             [(vectorize coordinates) (-> validation vectorize normalize-validation-fns)])
           validation-vec)
      {::normalized true})))
(comment
  (def example-map {:foo {:bar [5 6 7]}, :alpha 1})
  (defn valid? [whole-map kw-vec opt-map] {})
  (def valid-query [[[[:foo] [:foo :bar]] ["valid?"]]
                    [:alpha [[valid? {:another-opt true}]]]])
  (def short-query [[:foo valid?]])
  (balanced? (normalize valid-query))
  (balanced? short-query)
  (balanced? [[:a :b] :c])
  (normalize short-query)
  (symbol (nsed-name ::something))
  (nsed-name "themis.core/normalized-query-structure"))
 
(ns themis.validators)

Handling Responses

Responses can come from three places:

  • The :response value in an opt-map
  • Some response passed into validator directly
  • The value of *default-response*

Why the `*default-response*`?

Often times in validation, you want the base case to fit some representation. Most often this is nil, but perhaps your application uses some special map as the default response/result from validation

To allow you full control over the base case, you can bind on default-response

(def ^:dynamic *default-response* nil)

Utility functions

One of the key benefits in Themis is being able to use all of Clojure's built-in functions as "validators"

The following utility functions help to integrate common predicate functions and ease the handling of response data.

Resolve and return a validator's response; The value of :response in the opt-map, the response-data passed directly to the response fn or the default return via default-response

(defn response
  ([response-data]
   (response response-data {}))
  ([response-data opt-map]
   (or (:response opt-map)
        response-data
        *default-response*)))

Given a predicate function that takes a single arg, return a proper validation function for it. You can also abuse this for predicates that take multiple arguments; The data-point arg is expected to be your first arg (otherwise you should just use partial).

(defn from-predicate
  ([f]
   (fn [_ data-point _]
     (when-not (f data-point)
       (response "invalid"))))
  ([f response-data]
   (fn [_ data-point _]
     (when-not (f data-point)
       (response response-data {}))))
  ([f arg-data & more-data]
   (fn [_ data-point _]
     (when-not (apply f data-point arg-data (butlast more-data))
       (response (last more-data) {})))))

Validation functions

These validation functions can be used directly within a rule set. There is no need to wrap them using the utility functions above

Determine that the coordinate exists in the data structure

(defn required
  [t data-point opt-map]
  (let [coords (:themis.core/coordinates opt-map)
        last-coord (last coords)]
    (when-not (contains? (get-in t (butlast coords)) last-coord)
      (response "required key not found" opt-map))))

Determine if the data-point is non-nil; If there is a value present at a specific coordinate. Note: presence does not imply required - you could fail here because the coordinate doesn't actually exist.

(defn presence
  [t data-point opt-map]
  (when (nil? data-point)
    (response "required value is nil" opt-map)))

Returns true if (seq x) will succeed, false otherwise.

Taken from an old contrib

(defn- seqable?
  [x]
  (or (sequential? x)
      (coll? x)
      (string? x)
      (instance? clojure.lang.Seqable x)
      (nil? x)
      (instance? Iterable x)
      (-> x .getClass .isArray)
      (instance? java.util.Map x)))

Determine if the data-point is non-empty; If there is a non-empty value present at a specific coordinate.

(defn non-empty
  [t data-point opt-map]
  (when (and (seqable? data-point)
             (empty? data-point))
    (response "required value is empty" opt-map)))
(comment

  )