An efficient approach to optimise I/O cost in data-intensive applications using inverted indexes on HDFS splits