CF Summit notes - Know SQL? Try NoSQL, Dan Wilson

October 29, 2013

(Dan is always a great speaker. He's engaging, and delivers his talks with the right combination of humor and information to keep everyone interesting during the entire session.)

Know SQL? Try NoSQL - Dan Wilson

Sharding -- not supported on RDMS
Sharding = put all record where last name begins with A on serverA, all with last name B on server B, etc.

in nosql, no duplicating of data when you do a join
data is nested in hierarchy instead

create indexes
similar
but mongo can do them in the background, so creating an index on 1000000 records doesn't block your database

sql/rdms:
insert into users(...)values

mongo:
db.users.insert()

...SAME syntax for CREATING the user table that you use for inserting records.

so who do yo know what/s what?
it's your JOB to know what's what -- enforce the scheme in the application layer

if you're pipe-lining everything in encapsulated objects, this isn't a big deal and will "just work"


range

where age > 25 and age <=5


...like statements are done using regex , so learn regex!


...useful for making pagination easier

(sql 2012 finally has "skip" and "offset", but earlier versions don't)


...
{ multi: true }
1. what set do you want me to do this ON
2. what do you want me to do/change
3. tell it YES this may affect multiple records


aggregate wants a "series of steps" and it does them in the order you gave them
in this case...
group, don't worry about id
but find the count where "sum" is 1

[dan did more complex examples of group by, sum, count, etc that are available in his slide deck]

diverging from RDMS
can't do natural cross joining across collections(tables)
but if you HAVE to get data from 2 different collections....notion of "map reduce"
concept: create a function in JS that you map to all these collections
that creates sub-sets of collections (tables)
you reduce that into a single entity (collection)

replication --
set up a main server
2 replication servers
9 lines of config code to do this, not hard
if main goes down, other two servers have an "election" to see which has the most current data, and it automatically gets promoted to the main server until "main server" is back up and gets sync'd with the current data
not perfect, but pretty good
whole process only takes 5 or so seconds

sharding ---
collection1
-shardA
-shardB
-shardC
-shardD

selection on "shard key" is important
think how you will access your data and how you will distribute it in a fair/balanced way

if you DONT pass the "shard key" in your query, then EVERY shard will be hit to run the query
not a problem to ask all the shards for info
it's a problem to UNNECESSARILY ask all the shards for info
so pass the shard key when you can

we don't actually TELL Mongo how to split which things to which box
ex: sharing on fist letter of last name.
we don't need 26 boxes to do this
we tell Mongo what the shard key is, and it will figure out how to evenly distribute the work among all the shards

can NOT change the shard key on the fly
unpleasant to change that on the fly. NOT easy
have to dump ALL your data and rebuilding
down time, big pain in the neck

storage formats
BSON
Binary JSON

MongoDB plugin for IntelliJ
can also run at command line

Mongo DOES have the concept of datatypes
they matter in Mongo, it's not as loosey-goosey as CF with that stuff

all mongo DB drivers are platform specific
so the CF driver does things in a CF way
the python driver does it in a python way
if you look at the syntax for the python driver and try to do it in CF, it may not work the same

free mongodb class you can take on line
a few videos, gets you up to speed on mongo
it has homework and everything
final exam at the end, you get graded too!

you can use mongo for everything, will work just as well as RDMS
sometimes it's great, sometimes it's not the right decision
getting complex reports out of sql sever may be easier. the 'aggregation pipeline' in mongo may be a bit of a mindshift and not worth it.

think of it like a "persistent application cache"
works well for that
can throw logging in there, it works great for that
analytics, logging data, etc.

cfmongodb driver for doing this in CF
on github

in onapplicaitonStop()
MAKE SURE you call application.mongo.close();
if you don't close it when the app goes down, you'll have a problem

fine to have multiple collections in an app but remember
1. CANT join across multiple collections
2. max size of a DOCUMENT is 16MB (?) so you can't just shove EVERYTHING into a collection or it will break