Manipulate Page Content During Build

I’d like to manipulate page content during the build process, like, for example, doing a find-and-replace, substituting the word “foo” with the word “bar” wherever it appears on the page.

I think I can do this by making an extension (which uses manipulate_resource_list) and activate the extension in config.rb during the configure :build do block.

I can read the page content with render() but I can’t figure out how to write to the page. Here’s my attempt thus far…

def manipulate_resource_list(resources)
  # Replace the word "foo" with "bar" in all resources.
  resources.each do |resource|
    # This does nothing because I cannot write to the resource.
    resource.render({:layout => false}).gsub!("foo", "bar")
  end
end

Does anyone know if I’m heading in the right direction? What am I missing?

I think the best approach would be to use after_build. Walk all the rendered files, do your substitution and then write it back to file.

The problem with your approach is that the resurce_list and the resources in it are just metadata. The actual content of the page remains on the disk, as a file. You can read it and render it, as you have done, but the build process will use the original file when building.

(You could write the result of you .render...gsub back to the file, but that would change it permanently, and effect also what is shown in development view.)

Huge thanks! I started looking into after_build and once I learned how to access Thor actions, a whole world of options opened up. Specifically, there’s a gsub_file method that is exactly designed to do regex text substitution in files. How perfect.

Here’s the code snippet that does the magic, for anybody trying to do the same thing:

app.after_build do |builder|
  # Looping through all files and directories in our build folder.
  Dir.glob("build/**/*").each do |path|
    # Perform the substitution on files only.
    builder.gsub_file(path, /foo/, 'bar') unless File.directory?(path)
  end
end

Just out of curiosity: Why do you like the pages to differ between development and build?

In general, it’s vise to treat development as a staging view, so that you can check exactly what you will get when you build and deploy.

The project I am working on gets source content from Markdown files in a person’s Github repo (like this). As such, I can’t expect people to use Middleman-specific conventions like frontmatter or the link_to helper.

Specifically, authors who make relative links between their markdown pages (like href="mypage.md") need the links to be converted (to href="mypage.html") during build. That way, the links work pre-build on github and post-build on Github pages.

We have some ideas for how to make “content rewriting” extensions easier for 4.0, but in the meantime we tend to use Rack middlewares to rewrite content. See some of the builtin extensions like asset_hash for an example.