Browsing by Subject "Multi-Agent Reinforcement Learning"
Results Per Page
Sort Options
Item Open Access A NEW ZEROTH-ORDER ORACLE FOR DISTRIBUTED AND NON-STATIONARY LEARNING(2021) Zhang, YanZeroth-Order (ZO) methods have been applied to solve black-box or simulation-based optimization prroblems. These problems arise in many important applications nowa- days, e.g., generating adversarial attacks on machine learning systems, learning to control the system with complicated physics structure or human in the loop. In these problem settings, the objective function to optimize does not have an explicit math- ematical form and therefore its gradient cannot be obtained. This invalidates all gradient-based optimization approaches. On the other hand, ZO methods approxi- mate the gradient by using the objective function values. Many existing ZO methods adopt the two-point feedback scheme to approximate the unknown gradient due to its low estimation variance and fast convergence speed. Specifically, two-point ZO method estimates the gradient at the current iterate of the algorithm by querying the objective function value twice at two distinct neighbor points around the current iterate. Such scheme becomes infeasible or difficult to implement when the objective function is time-varying, or when multiple agents collaboratively optimize a global objective function that depends on all agents’ decisions, because the value of the objective function can be queried only once at a single decision point. However, con- ventional ZO method based on one-point feedback is subject to large variance of the gradient estimation and therefore slows down the convergence.
In this dissertation, we propose a novel one-point ZO method based on the residual feedback. Specifically, the residual feedback scheme estimates the gradient using the residual between the values of the objective function at two consecutive iterates of the algorithm. When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes. Moreover, the query complexity of the proposed algorithm can be improved when the objective function has Lipschitz gradient. Then, for stochastic bandit optimization problems, we show that ZO with one-point residual feedback achieves the same convergence rate as that of two-point scheme with uncontrollable data samples.
Next, we apply the proposed one-point residual-feedback gradient estimator to solve online optimizaiton problems, where the objective function varies over time. In the online setting, since each objective function can only be evaluated once at a single decision point, existing two-point ZO methods are not feasible and only one-point ZO methods can be used. We develop regret bounds for ZO with the proposed one- point residual feedback scheme for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one- point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods.
The proposed residual-feedback scheme is next decentralized to conduct dis- tributed policy optimization in the multi-agent reinforcement learning (MARL) prob- lems. Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the pro- posed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, the local ZO policy gradients relying on one-point residual-feedback significantly reduces the vari- ance of the local policy gradient estimates compared to the conventional one-point policy gradient estimates, improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to a neighborhood of the global optimal policy that depends on the number of consensus steps used to calculate the local estimates of the global accumulated rewards.Another challenge in the distributed ZO optimization problems is that the agents may conduct local updates in an asynchronous fashion when they do not have ac- cess to a global clock. To deal with this challenge, we propose an asynchronous zeroth-order distributed optimization method that relies on the proposed one-point residual feedback gradient estimator. We show that this estimator is unbiased under asynchronous updating, and theoretically analyze its convergence. We demonstrate the effectiveness of all proposed algorithms via extensive numerical experiments.
Item Open Access Security and Synthesis Solutions for Digital Microfluidic Biochips(2021) Liang, Tung-CheConsiderable effort has been devoted in recent years to the design and implementation of digital microfluidic biochips (DMFBs) for biochemical analysis procedures, such as high-throughput DNA sequencing and point-of-care clinical diagnosis. As DMFBs make the transition to the marketplace for commercial exploitation, security and IP protection are emerging as important design considerations. Recent studies have shown that DMFBs are vulnerable to threats that can be classified into two main categories: (1) malicious attacks and (2) intellectual-property (IP) theft. Malicious attacks on the integrity of a DMFB system can occur before bioassay execution begins, e.g., attacks on sample integrity. Moreover, DMFBs are supported by computer-based controllers and sensor feedback. The signals transmitted from the controllers to the laboratory devices are also vulnerable to malicious attacks. Additionally, DMFBs are vulnerable to reverse engineering aimed at stealing biomolecular protocols (IP theft). Consequently, these issues need to be addressed using interdisciplinary expertise in microfluidics, microbiology, hardware design, and cybersecurity.
Prior work has also identified a number of failure mechanisms for digital microfluidic biochips. Defects such as microelectrode degradation can occur throughout the lifetime of the system. If an electrode is degraded during bioassay execution, fluidic operations associated with this degraded electrode will fail, resulting in bioassay failure. To ensure reliable bioassay execution in digital microfluidic systems, online and adaptive synthesis solutions are needed so that fluidic operations can be dynamically allocated to healthy regions.
Motivated by the above needs, this dissertation is focused on developing trustworthy and reliable methodologies for DMFB systems. The dissertation first identifies a set of security vulnerabilities in DMFBs, namely IP theft for bio-protocols, malicious attacks, and sample forgery. The dissertation presents a one-time-programmability daisychain structure that scrambles the scan-in and scan-out data from a microelectrode-dot-array (MEDA) biochip, which is the next-generation DMFB. Several attacks are developed to evaluate the security strength of this architecture, including a SAT-based attack. Simulations show that this proposed architecture can protect important bio-protocols from IP theft. The dissertation also presents a provably secure method that exploits the inherent sensing mechanism in MEDA biochips. This method validates assay execution by reconstructing the sequencing graph (i.e., the assay specification) from the droplet-location maps and comparing it against the golden sequencing graph.
In addition, this dissertation presents a robust molecular-barcoding technique that can be used to ensure the integrity of the samples and reagents in DMFBs. Related benchtop experimental studies are reported to show that the proposed method can thwart stealthy sample-forgery attacks. A small-in-size DMFB is fabricated and used for molecular barcoding without any user intervention.
Next, the dissertation presents solutions that address the reliability concerns in DMFBs. An online droplet routing method based on deep reinforcement learning is presented. The RL-based model is first trained in a DMFB simulator, and then it is used in fabricated DMFBs. The experimental results show that even though electrodes on a DMFB degrade over time, the RL droplet router can learn the degradation behavior and transport droplets using only healthy electrodes. The dissertation also presents a novel multi-agent reinforcement learning (MARL) framework for parallel droplet routing on MEDA biochips. The experimental results show that the MARL droplet router can reliably transport multiple droplets without unintended fluidic contamination. To prolong the lifetime of MEDA biochips, the dissertation finally presents a selective-sensing method such that only a small fraction of microelectrodes are utilized for droplet sensing during bioassay execution. The proposed selective-sensing method is implemented based on a new microelectrode cell design. This design is the first attempt to dynamically enable/disable droplet-sensing operations in MEDA biochips.
In summary, the dissertation tackles important problems related to key stages of DMFB design and usage. The results emerging from this dissertation provide the first set of secure and reliable methodologies for DMFBs. It is anticipated that the emerging DMFB industry will benefit from these solutions.