Recent years have witnessed the great success of RL with deep feature representations in many challenging tasks, including computer games, robotics, smart city, and so on. Yet, solely focusing on the optimal solution based on a reward proxy and learning the reward-maximizing policy is not enough. Diversity of the generated states is desirable in a wide range of important practical scenarios such as drug discovery, recommender systems, dialogue systems, etc. For example, in molecule generation, the reward function used in in-silico simulations can be uncertain and imperfect itself (compared to the more expensive in-vivo experiments). Therefore, it is not sufficient to only search for the solution that maximizes the return. Instead, it is desired that we sample many high-reward candidates, which can be achieved by sampling them proportionally to the reward of each terminal state. The Generative Flow Network (GFlowNet) is a probabilistic framework proposed by Yoshua Bengio in 2021 where an agent learns a stochastic policy for object generation, such that the probability of generating an object is proportional to a given reward function, i.e., by learning a reward-matching policy. Its effectiveness has been shown in discovering high-quality and diverse solutions in molecule generation, biological sequence design, etc. The talk concerns my recent research works about how we tackle three important challenges in such decision-making systems. Firstly, how can we ensure a robust learning behavior and value estimation of the agent? Secondly, how can we improve its learning efficiency? Thirdly, how to successfully apply them in important practical applications such as computational sustainability problems and drug discovery?