Run large Publisher website using MM // Performance Issues with 100k+ articles

We’re evaluting MM for a huge publisher website with 100.000+ articles and 500+ pages.

Our issue is that the performance with a large number of articles decreases drastically. Rendering 100.000 articles with a very simple template (basically just a prototype text output) takes around 2hrs with a decent setup (CPU/SSD).

Our goal would be to render single files most of the time (e.g. attached to a hook once the editor publishes or updated the article). This is already possible using the " --glob [ filename ] --no-clean" parameters, but even that takes around 5 minutes with the file scan we can see in verbose mode and “== Prerendering CSS” which stays for around 3 min on the command line.

Q1: Is MM a suitable tool for our approach?
Q2: Is anyone running MM with such an amount of articles / mostly single-file publishing approach?
Q3: How can the performance with this amount of data be increased, especially when generating just a single article?

I maintain a bilingual website with around 160 English pages and roughly equal number of French pages. While the scale is not as large as what you are describing here, I did get hit by performance issues that appear similar.

In my case, I found that the number of pages/articles was not the culprit for the slowness. The number of assets (images, CSS files, JavaScript files) that Middleman, or specifically Sprockets, had to scan was what required a long time even to render a single page.

Please try removing all the images, CSS files and JavaScript files and see if the speed will be significantly faster to render a single page. If so, I will share my solution for speeding up the file scan process.

@funkymusic
Out of curiosity, what did you end up using?
I’m evaluating for 200.000 articles and growing …

@hjbarraza We ended up with a custom solution holding the data in elastic search/redis and passing the markdown for processing to a node JS library rendering everything with the provided data. This was IMHO the better fit for a site with highly dynamic content. Details can be found here http://insights.burda-studios.de/carrier-headless-decoupled-cms-at-bunte/

Oh wow, just around 320 pages and you are getting performance issues? That’s worrisome.

@justshipit

I have since moved on to another job that doesn’t use Middleman. Based on my memory, the biggest issue with Middleman was that the build was single threaded. That may or may not have changed. When I had to support about 320 pages, I had to write creative bash scripts that would parallelize the page generation by spawning multiple middleman build processes for different subpaths. The end result was that the whole site would take about 5 minutes to generate on an m4 large EC2 node. (Yes, to me, 5 minutes was problematic!) Though, to be fair, those pages had heavy layouts, such as multiple sidebars, multiple tabbed content, accordions, etc., and would include many partials per page. There were also tons of high resolution images, PDFs, file downloads, etc.

I would say, based on my experience working with Middleman between 2013 and 2017, you might want to look elsewhere if you need to support thousands of pages per site and beyond. The lack of multi-threaded compilation and incremental builds means generating a large site during the build process will eventually become unbearable.

I will stick to it and see how it go as my single page is relatively simple. I don’t want to worry about scalability now as I don’t make money from this new project yet and just validating the idea. Once I see the traction and as the site grows, if I hit performance issues, will think of better solution :slight_smile:

@justshipit

I will say that build performance was probably the only major issue I had with my Middleman setup. Because the build process was automated away in a Jenkins job, it was not, after all, really an issue affecting most people.

Development-wise, things would not get noticeably slower when the number of pages increased. It took about 10 seconds to start Middleman server for my project with 300+ pages, and then it did not need to be restarted frequently.

If you don’t have to worry about scalability, then I am certain you will find Middleman to be a pleasant tool to use. Good luck with your project!