Mining structures from massive text data: will it help software engineering?

The real-world big data are largely unstructured, interconnected text data. One of the grand challenges is to turn such massive unstructured text data into structured, actionable knowledge. We propose a text mining approach that requires only distant or minimal supervision but relies on massive text data. We show quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entities/attributes/values can be discovered by meta-path directed pattern discovery. We show text-rich and structure-rich networks can be constructed from massive unstructured data. Finally, we speculate whether such a paradigm could be useful for turning massive software repositories into multi-dimensional structures to help searching and mining software repositories.