View collation for join-like behavior in CouchDB

If you've been playing with CouchDB then you have probably run into the problem of having multiple documents that share a relationship, yet are complex enough that denormalizing them together doesn't make sense. For the sake of example let's assume we have Authors and Posts. Further, let's assume that an Author has many Posts and that these are represented as independent documents. The goal here is to find posts created by a specific author. In SQL, this would probably be represented as a table of authors and a table posts. Getting the aggregate result would be something along the lines of:

SELECT * from authors join posts ON authors.id = posts.author_id

This is just a simple standard JOIN, nothing very interesting about it. However, in a document oriented system this "join" operation doesn't really exist. Enter view collation. View collation allows us to define a map/reduce function pair that takes in multiple document types, aggregates them onto a single key value, and gives us a means to search on author id. Let's say we have an author document like the following:

{'type' : 'author', 'name' : 'Chris Chandler', '_id' : '22d43eaa7e06c9c37ed3e0489401a506' }

and some number of post documents similar to:

{'type' : 'post', 'title' : 'Hello world', 'Body' : 'body text', 'Author' : '22d43eaa7e06c9c37ed3e0489401a506'  }

In this case our "foreign key" is 22d43eaa7e06c9c37ed3e0489401a506. The mapping function would need to connect these records based on that key is something like the following:

function(doc) { if(doc.type == 'author') { emit(doc._id , doc); } else if(doc.type == 'post') { emit(doc.author, doc); } }

This view will generate an intermediate hash table containing entries with the author's key. In essence we have one key (the author's) pointing to either a post the author has created, or the author's record itself. To make this view answer the question 'Show me all posts created by a certain author' we need to write a reduce function that removes the unnecessary author records so the final table will only contain author keys pointing to lists of posts.

function(keys, values, rereduce) { var posts = []; for(var i = 0; i < values.length; i++) { if(values[i].type == 'post') { posts.push(values[i]); } }
return posts; }

The final result set appears as:

"22d43eaa7e06c9c37ed3e0489401a506" [{_id: "d0d0ea6de45c9f4ff983f12a9fed9008", _rev: "2624588756", body: "Weee!", title: "Hello world 2", author: "22d43eaa7e06c9c37ed3e0489401a506", type: "post"}, {_id: "9de65ae955ecc2ea35055b9339f1651c", _rev: "2347078231", body: "Weee!", title: "Hello world", author: "22d43eaa7e06c9c37ed3e0489401a506", type: "post"}, {_id: "5d1ad3eed26f84879835fd47e44f7f55", _rev: "1163133569", body: "Weee!", title: "Hello world 2", author: "22d43eaa7e06c9c37ed3e0489401a506", type: "post"}, {_id: "0717ae0da9bf5919da0957268667c3f4", _rev: "1063237208", body: "Weee!", title: "Hello world 3", author: "22d43eaa7e06c9c37ed3e0489401a506", type: "post"}]

 

Simple stats average with CouchDB

Edit: Thanks to Mike Keen for asking about the reduce function. It was wrong in the rereduce case but I was just lazy and hadn't updated it. It should now work correctly in the rereduce case.

Basic statistics seem to be giving people new to Map/Reduce difficulty, including myself. Here's a short example on how to get the average value of a group of records. Suppose you have a number of records in the format: { 'value' : 345 } The goal is to write a mapping function that emits a key pair with a non-unique key and a value representing the thing you want to average. The non-unique key is important because the values will be appended in the intermediate hash table in a list-style fashion, rather than being replaced like a normal hash table.

function(doc) {   emit('average', doc.value); }

This mapping function makes a single key of "average" available to the reducing function. Since all the numerical values we want were emitted under the "average" key the reducing function only has to sum the values and divide by the length. In a more sophisticated case we could change the result to utilize the rereduce flag and allow incremental processing.

function(key,values,rereduce) {   
  if(rereduce){     
    count = 0;
    sum = 0;
    for(i in values){
      count += values[i]['count'];
      sum += values[i]['sum'];     
    }
    return {"sum" : sum, "count" : count};
  }else{
    return {"sum" : sum(values), "count" : values.length};
  } 
}