At that kind of scale, you might need to think about disk access and non code sources of issues. Maybe try running the oerf test using flamegraph to get a picture of where the slowdowns are occuring Also perf test at different sizes. See if it scales linearly or not.