Transform your codebase using codemods

This post will teach you about jscodeshift, a super-useful tool to step by step transform large codebases using small changes. From the basics, like how to traverse your code and how to build code, through some detailed look at the data you work with, like Collections and NodePaths.

At the end I'll write about ignore patterns for jscodeshift, plus some best practices we've learned and handling edge cases in code

Codemods

Codemods are small programs designed to alter large codebases with a very specific change. Historically they were using simple string replacement techniques, and later complex regular expressions.

Luckily for us these days, source code in many languages can be traversed using an Abstract Source Tree (AST) which makes these changes more safe, powerful and easy at the same time. Let’s see how useful these little programs can be!

Use case

For example, if you wish to upgrade a library you use, but it introduced a breaking change, by radically altering the signature of its methods. You could search & replace all the occurrences of the used library, but that has many downsides:

  • manual, repetitive and boring - you may miss something
  • creates a huge changeset
  • teammates working on the codebase on a different branch has to merge and resolve conflicts

If you can forge a tool to do the changes for you, automated, all these can be solved!

  • it’s automated, you just have to write the codemod
  • the changeset is basically the codemod only
  • run the change, and test it before release, test it on branches, the result of the codemod is the same everywhere
  • if a conflict happens, your teammates can ignore your codemod-enhanced version, accept theirs, and re-run your codemod again on their code - easily resolved!

jscodeshift

In 2015 at JSConf EU, Chris Pojer from Facebook introduced a tool called jscodeshift. It’s a codemod runner, wrapping around a module called recast which enables us to traverse, search and alter the source code using simple tools.

This is how installing jscodeshift, and running codemods on a codebase looks like

gist:ed8d8854f42efc530b5c27c55a3eb081#jscodeshift.sh

A codemod is a function, which receives the source code (as a string), the jscodeshift API and some options, and expected to return the source code as a string.

gist:ed8d8854f42efc530b5c27c55a3eb081#transform_module.js

Walking and Searching in the Source Tree

Finding a certain code structure can be done simply by providing a pattern or the absolute specific way, using filters, depending on exactly what you’re searching for.

If you can explicitly tell yourself what is that you are looking for, you can show a source tree structure pattern to jscodeshift to look for. For example, let's find any smoosh method calls on a library object

gist:ed8d8854f42efc530b5c27c55a3eb081#find_smoosh.js

If you want to look for some computed value, you can use a filter

gist:ed8d8854f42efc530b5c27c55a3eb081#find_computed_value.js

Builder API

After you’ve found what you’re looking for, it’s time to create nodes for replacement or insertion. Jscodeshift wraps the Builder API from recast, so let’s use it! The difference from the matcher constants, like jscodeshift.Literal is that they start lowercased, and all of them is a function call, like jscodeshift.literal('foo'). You might guess, this will create a simple string literal, with the string “foo”.

These builders have different function signatures, depending on the properties of the nodes they create. Literal needs a simple string as the value, but a CallExpression for example, needs much more, like a callee, and the arguments list:

gist:ed8d8854f42efc530b5c27c55a3eb081#builder_example.js

The documentation for these is basically nonexistent, but if you take a look at the definitions over ast-types it’s actually readable. For example in the core ES definitions the CallExpression looks like this

gist:ed8d8854f42efc530b5c27c55a3eb081#callexpression.js

In this - and every definition in there - the build contains the expected function signature, and every field is described for what type is required.

With the help of the ast-types definitions, you can create powerful builder structures, and update your code easy.

Note: Using the builder API with jscodeshift methods does not necessary ensures that the resulting code will be valid! So always test your codemod, double-check what it does.

Example:

gist:ed8d8854f42efc530b5c27c55a3eb081#invalid_build.js

Now let's take a deeper look at the data structures you can work with when creating a codemod.

Collections, NodePaths, Nodes

The data you can retrieve from jscodeshift consists Collections, NodePaths, and Nodes.

  • Collection is an array of NodePaths, spiced with some helper methods
  • NodePaths are wrappers around Nodes to provide context and helpers,
  • Nodes are the actual AST nodes

Collection

It’s an array of NodePaths, usually the result of find or filter or similar traversing and searching methods.

The basic methods are relatively well-documented but there are some useful ones missing from there, or just need an example.

paths()

Get the NodePaths from the collection in an array

closest()

Returns the closest provided node type up in the AST to each of the nodes.

gist:ed8d8854f42efc530b5c27c55a3eb081#closest_example.js

findVariableDeclarators([name])

find variable declarations in collection, optionally filtered by name

gist:ed8d8854f42efc530b5c27c55a3eb081#find_variable_declarator.js

insertAfter() insertBefore()

insert nodes after each collection item. Pass in an array of Nodes, using the Builder API - but unfortunately, usage is not that trivial. For example, if you want to simply add a new line below a selected Node, you have to traverse up to the closest ExpressionStatement and insert after that one.

gist:ed8d8854f42efc530b5c27c55a3eb081#insert_example.js

You can pass a function to insertAfter/Before, which returns the array of Nodes to be inserted.

gist:ed8d8854f42efc530b5c27c55a3eb081#insert_with_function.js

remove() remove the collection from the AST

gist:ed8d8854f42efc530b5c27c55a3eb081#remove_example.js

renameTo(newName)

renames all Identifiers in the collection of VariableDeclarators to newName

gist:ed8d8854f42efc530b5c27c55a3eb081#renameto_example.js

replaceWith([nodes|function])

replace collection with the provided array of nodes of the result of the provided function

this function receives the NodePath and the iteration index as arguments and expected to return an Array-like collection

gist:ed8d8854f42efc530b5c27c55a3eb081#replacewith_example.js

NodePath

These methods come from recast

canBeFirstInStatement(), firstInStatement()

Check a Node’s place in a statement, and if a Node can be first.

needsParens()

Check if a Node needs to be wrapped in parentheses (it’s mainly used by the recast code builder)

prune()

Remove a node and its parent, if it would leave a redundant AST node - comes from ast-types

replace()

Replace a Node with another one

Properties

  • parent - traversing up the AST, the parent NotePath
  • scope - get scope information, like these:
    • isGlobal - is it the global scope
    • bindings - get scope bindings (honestly, I could not get my head around this)
    • depth - how deep is this node in the scope chain
    • path - the NodePath of the scope in the AST
    • getGlobalScope() - access the global scope
  • node, value - return the wrapped Node

AST Node types, Best practices, and Edge cases

It's OK to not know the structure of every AST node type.

This is 100% true, no-one should say that every true JS developer should know these. To help, there is a wonderful online tool called ASTExplorer, where you can write/paste the code, and see the relevant AST nodes on the right, as you click in your code. You can test and debug your jscodeshift codemod right there, just pick the right option in the “Transform” menu at the top.

Ignore patterns

Just like in a .gitignore file, you can set paths or path fragments, that should be avoided by your codemods. We skip 3rd party libraries, node_modules, vendor modules for example. You can put these in a file, and pass it to the runner, like this

gist:ed8d8854f42efc530b5c27c55a3eb081#ignore.sh

Here’s an example file for jscodeshift ignore-config

gist:ed8d8854f42efc530b5c27c55a3eb081#ignore_config.txt

Best practices

We started to write codemods on separate workbranches/pull requests. Only the codemod is committed, and any changes it may need to run, and any dependencies it brings to, or updates in the codebase. In a nice docblock at the top of the file, we describe what this codemod should do, what it should not do, and include the exact CLI commands we’ve run it - it’s useful when your teammates need to run it, or if a custom ignore pattern was used.

Then we run the codemod on a pre-release branch, where it’s tested and released before merging the whole changeset to master.

Anyone who has a conflict after pulling master, can ignore the change, and re-run the codemod on the files they’re working with.

Edge cases

If you find a piece of code that has some unconventional approach compared to what you try to modify, do not over-engineer and over-complicate your codemod.

You can create a separate codemod for that use case, or - even better - you can refactor the unconventional approach in a way, that your codemod can handle.

I hope you've found this post useful, and it will help you take your codebase to the next level of JavaScript! Here’s a collection to update your codebase to modern JS and a small collection of other useful examples.