Create a JS builder with node.js

A very good and common practice with JS projects bigger than 100 lines of code is to split code in different files.
Benefits are clear:

smaller pieces of code to maintain
swappable portions for experiments and/or improvements or new features, as example including for a build magic2.js and get it, rather than change drastically magic.js and follow the repository logs
better organization of the code, and I'll come back on this in this post
possibility to distribute bigger closures, as example the jQuery approach
create ad hoc builds including or excluding portion of the library, specially suitable for specific version of the code that must be compatible with IE only

Solutions All Over The Place

There are really tons of solutions able to make the described build process easy to use and easy to go. As example, I have created my own one and I am using it with basically every project I am working with: the JavaScript Builder.
However, this builder requires a couple of extra technologies such Python and Java ... but aren't we using simply JavaScript?
So why not an easy to create guide on how to build your code via JS only?
This is what this post is about, and I hope you'll find useful.

How To Structure Your Project

If all files are in the same directory is not easy to find the right file immediately since these could be many. A good solution I came up with is folder related structure with both namespaces and private keywords paths.
Here an example on how I would structure this library ( and please ignore the library itself )


var myLib = (function (global, undefined) {"use strict";

  // private scope function
  function query(selector) {
    return document.querySelectorAll(selector);
  }

  function Wrapper(nodeList) {
    this.length = nodeList.length;
    this._list = nodeList;
  }

  // a prototype method of the Wrapper "class"
  Wrapper.prototype.item = function item(i) {
    return this._list[i];
  };

  // public static query method
  query.asWrapper = function (selector) {
    return new Wrapper(query(selector));
  };

  var // private scope variables
    document = global.document,
    slice = [].slice
  ;

  // the actual object/namespace
  return {
    query: query,
    internals: {
      Wrapper: Wrapper
    }
  };

}(this));

The code should be easy enough to understand. The object used as namespace for myLib has a couple of methods, few private variables and functions and something exposed through the internals namespace.
It does not matter what the library does or how good/badly is structured, what matters is that our folder structure should be smart enough to be able to scale with any sort of allowed JS pattern ... OK?

The Folder

Well, to start with, let's say our source code should be inside an src folder so we can add other folders for tests or builds beside in the same hierarchy.


dist
src
tests
builder.js

We'll see the builder.js later, in the meanwhile, let's have a look into the src folder:


dist
src
    intro.js
    outro.js
    var.js
    function
        Wrapper.js
        query.js
        Wrapper
            prototype
                item.js
        query
            asWrapper.js
tests
builder.js

The distinction will be much cleaner once you read above list through your editor or even your shell ... query and files are well distributed but bear in mind this is only the first example.
Let's see what we are going to write into each file ?

src/intro.js


var myLib = (function (global, undefined) {"use strict";

src/function/query.js


  // private scope function
  function query(selector) {
    return document.querySelectorAll(selector);
  }

src/function/Wrapper.js


  function Wrapper(nodeList) {
    this.length = nodeList.length;
    this._list = nodeList;
  }

src/function/Wrapper/prototype/item.js


  // a prototype method of the Wrapper "class"
  Wrapper.prototype.item = function item(i) {
    return this._list[i];
  };

src/function/query/asWrapper.js


  // public static query method
  query.asWrapper = function (selector) {
    return new Wrapper(query(selector));
  };

src/var.js


  var // private scope variables
    document = global.document,
    slice = [].slice
  ;

src/outro.js


  // the actual object/namespace
  return {
    query: query,
    internals: {
      Wrapper: Wrapper
    }
  };

}(this));

Got it?

Structure Rules

every part of the scope can be distributed
each file can or cannot be compatible as stand alone with a parser because to test the library we need to build it first ( eventually with automations )
function declarations should be included in a dedicated folder called function accordingly with the nested level
var declaration per scope could be included in a folder var accordingly with the nested level. Do not create a var folder per each function where you define variables 'cause if you need it it means the function is too complex. Split it in sub task and do not define 100 variables per a single function: closures are the only exception.
nested closure must be named in order to be able to define nested closure structure following previous rules. Every minifier will be able to remove function expression names, included named closures, while not every developer would like to deeply understand the whole code to recognize why the nested closure was useful. A classic example is the inclusion inside our own closure of an external library that uses its own closure. in this case name that closure so you know were to look for the library inside your folder structure.
function prototypes should be placed inside a prototype folder, inside the function folder.
We don't need to reassign an object when we want to pollute the function prototype so please stop this awkward common practice ASAP: MyFunction.prototype = { /* THIS IS WRONG */ } and use the already available prototype object defined by default in every ECMAScript standard and per each function declaration or expression.
If your argument is that the code will be bigger, use the outer scoped variables definition to address the prototype once and reuse this reference within the prototype folder. This approach will make your life easier once you get use to work with structured and distributed JavaScript files.

Specially about last example, we could have set a shortcut to the Wrapper.prototype object in the var.js file and reuse the reference inside Wrapper.
The structured folders will always help you to find references in the library thanks to the lookup that you, as well as the code, have to do.


    // in the var.js file
    WrapperPrototype = Wrapper.prototype,

    // in the Wrapper/prototype/item.js file
    WrapperPrototype.item = function item(i) { ... };

The Order Partially Matters

In ECMAScript 3rd or higher edition function declarations are always available at the very beginning of the scope. I really don't know why these are so much underrated in daily basis code ... the fact these are always available means we can reference their prototype at any moment in our code:


var internalProto = (function () {

    // address any declaration made in this scope
    var WhateverPrototype = Whatever.prototype;
    return WhateverPrototype;

    // even if defined after a return!!!
    function Whatever() {}
}());

alert(internalProto); // [object Object]

Now, the above code is simply a demonstration about how function declarations work ... I am not suggesting a return in the middle, and declarations after, all I am saying is that the order of things in JavaScript may not be relevant, and function declarations are a perfect example.
Another example is the usage of variables ... if a function, as declaration or as expression, reference a variable defined in the outer scope nothing will break unless we are invoking that function before the referenced variable has been defined.

This are really ABC concepts we all should know about JS before even claiming that we know JavaScript ... OK?
Is really important to get these points because to simplify ASAP the builder file we need to rely in these assumptions.

The builder.js File

It's time to create the magic file that will do the job for us in possibly a smart way so that we can cover all edge cases we could think of.
This is the content of builder.js file, in the root of our project


// @name        builder.js
// @author      Andrea Giammarchi
// @license     Mit Style License

// list of files to include
var
  scriptName = "myLib", // the namespace/object.project name
  fileList = [
    "intro.js",    // beginning of the closure
    "var.js",      // all declared variables
    "function/*",  // all declared functions
    "function/Wrapper/prototype/*", // all methods
    "function/query/*", // all public statics
    "outro.js"     // end of the library
  ],
  fs = require("fs"),  // file system module
  out = [],            // output
  alreadyParsed = []   // parsed files for visual feedback
;

// per each file in the list ...
fileList.forEach(function addFile(file) {
  // if the file contains a wild char ...
  if (file.charAt(file.length - 1) == "*") {
    // read the directory and per each file found there ..
    fs.readdirSync(
      __dirname + "/src/" + file.slice(0, -2)
    ).forEach(function (file) {
      // if the file type is js
      // and the file has not been defined explicitly
      // in the original list
      if (
        file.slice(-3) == ".js" &&
        fileList.indexOf(file) < 0
      ) {
        // call this same function providing the whole path
        addFile(this + file);
      }
       // the path is passed as context to simplify the logic
    }, file.slice(0, -1));
  // if the file has not been included yet
  } else if (alreadyParsed.indexOf(file) < 0){
    // put it into the list of already included files
    alreadyParsed.push(file);
    // add the file content to the output
    out.push(fs.readFileSync(__dirname + "/src/" + file));
  } else {
    // if here, we are messing up with inclusion order
    // or files ... it's a nice to know in console
    try {
      console.log("duplicated entry: " + file);
    } catch(e) {
      // shenanigans
    }
  }
});

// put all ordered content into the destination file inside the dist folder
fs.writeFileSync(__dirname + "/dist/" + scriptName + ".js", out.join("\n"));

// that's it

The reason there are so many checks if a wild char is encountered is quite simple ... the order may not matter but in some case the order matters.
If as example a prototype property is used runtime to define other prototype methods or properties, this cannot be pushed in the output randomly but at the very beginning, example


// src/function/Wrapper/prototype/behavior.js
WrapperPrototype.behavior = "forEach" in [];

// src/function/Wrapper/prototype/forEach.js
WrapperPrototype.forEach = WrapperPrototype.behavior ?
  function (callback) {[].forEach.call(this._list, callback, this)} :
  function (callback) { /* a shim for non ES5 compatible browsers */ }
;

Being file 2 strongly dependent on file 1, the list of files could be written as this:


  fileList = [
    "intro.js",    // beginning of the closure
    "var.js",      // all declared variables
    "function/*",  // all declared functions
    "function/Wrapper/prototype/behavior.js", // precedence
    "function/Wrapper/prototype/*", // all methods
    "function/query/*", // all public statics
    "outro.js"     // end of the library
  ],

When the wild char will be encountered and the behavior passed to the forEach, this will be simply ignored since it has been pushed already in the previous call.
Same concept could happen if a specific file must be parsed runtime at the end:


  fileList = [
    "function/Wrapper/prototype/behavior.js", // precedence
    "function/Wrapper/prototype/*", // all methods
    "function/Wrapper/prototype/doStuff.js" // after all
  ],

I believe these are already edge cases most of the time but at least now we can better understand what the builder will do.

How To Use The Builder

In console, inside the project folder where the builder.js is:


node builder.js

That's pretty much it ... if you try to open dist/myLib.js after above call you will find your beautiful library all in one piece and ready to be minified, debugged, and tested.
If the process does not take long time you may bind the builder to the Constrol+S action with a potential sentinel able to inform you if any problem occurred, as example checking if the output has been polluted with some redundant file logged through the process.

As Summary

All these techniques may be handy for many reasons. First of all it's always good to maintain a structure, rather than a single file with thousands of lines of code, and secondly once we understand how the process work, nothing can stop us to improve, change, make it ad-hoc for anything we may need such regular expressions to strip out some code before the output push or whatever else could come up for some reason at some point.
The minification can be done the way you prefer, as example adding this single line of code at the end of the process assuming you have a jar folder with, as example, google closure compiler.


require('child_process').exec(
  ['java -jar "',
    __dirname + "/jar/compiler.jar",
  '" --compilation_level=SIMPLE_OPTIMIZATIONS --language_in ECMASCRIPT5_STRICT --js "',
    __dirname + "/dist/" + scriptName + ".js",
  '" --js_output_file "',
    __dirname + "/dist/" + scriptName + ".min.js",
  '"'].join(""),
  function (error, stdout, stderr) {
    if (error) console.log(stderr);
  }
);

Enjoy your new builder :)

modisposition