DSA Thrust Seminar: Similarity Query Processing of High-dimensional Data
High-dimensional data of large volumes are becoming increasing popular, partly due to the exponential growth of Machine/Deep Learning applications (such as multimedia databases and recommendation systems).
In this talk, we will provide an overview of our recent works on this topic. Firstly, we will present various applications that generate and rely on high-dimensional data and their operations. Then, we will consider exact range query processing in a Hamming space. Prior approaches mainly adopt pruning based on the pigeonhole principle. We observed several inefficiencies including the non-tightness of the pruning, and inability to handle data skews. We propose a general form of the pigeonhole principle which allows variable partition size and threshold. Based on the new principle, we first develop efficient query processing and optimization algorithms that exploits data skewness to improve the query performance. Next, we will introduce several techniques to answer k-NN queries approximately, based on Locality Sensitive Hashing and Learned Hashing.
Dr. Wei Wang is a currently a Visiting Professor in the School of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China. He is also a Professor in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include Similarity Query Processing, Artificial Intelligence, Knowledge Graphs, and Security for AI Models. He has published over a hundred research papers, with most of them in premier journals (TODS, VLDB J, and TKDE) and conferences (SIGMOD, VLDB, ICDE, WWW, IJCAI, AAAI, ACL). He is currently an Associate Editor of IEEE Transactions on Knowledge and Data Engineering (TKDE), and program committee members in various first-tier conferences (SIGMOD, VLDB, ICDE, SIGIR, SIGKDD, WSDM, etc.).
More details can be found on his homepage at: http://www.cse.unsw.edu.au/~weiw/