The pathological primary tumor (pT) stage assesses the extent to which the primary tumor invades surrounding tissues, a factor crucial in determining prognosis and treatment strategies. The pT staging's reliance on field-of-views from multiple gigapixel magnifications complicates pixel-level annotation. Thus, this undertaking is often structured as a weakly supervised whole slide image (WSI) classification task, guided by the slide-level label. Weakly supervised classification methods frequently employ the multiple instance learning strategy, treating patches from the same magnification as independent instances and extracting their morphological features. Progressively representing contextual information from multiple magnification levels is, however, beyond their capabilities, which is essential for pT staging. Therefore, we present a structure-informed hierarchical graph-based multi-instance learning architecture (SGMF), drawing on the diagnostic protocols of pathologists. A structure-aware hierarchical graph (SAHG) is a novel graph-based instance organization method designed for representing the WSIs. BMS-986278 Based on these observations, we introduce a novel hierarchical attention-based graph representation (HAGR) network. This network effectively identifies essential patterns for pT staging through the learning of cross-scale spatial features. Employing a global attention layer, the top nodes of the SAHG are aggregated to produce a representation at the bag level. Significant pT staging research spanning two cancer types, as evidenced by three major multi-center datasets, proves SGMF's superiority, showing an advantage of up to 56% over current leading-edge methods in terms of the F1-score.
End-effector tasks performed by robots are invariably accompanied by internal error noises. A novel fuzzy recurrent neural network (FRNN), explicitly designed for and implemented on field-programmable gate arrays (FPGAs), is presented to resist internal error noise generated within robots. Implementing the system in a pipeline fashion guarantees the ordering of all the operations. Data processing is leveraged across multiple clock domains to accelerate computing units. The FRNN's performance surpasses that of traditional gradient-based neural networks (NNs) and zeroing neural networks (ZNNs), manifesting in a faster convergence rate and improved correctness. The Xilinx XCZU9EG chip's resource utilization for the fuzzy RNN coprocessor, based on practical tests of a 3-degree-of-freedom (DOF) planar robot manipulator, is determined as 496 LUTRAMs, 2055 BRAMs, 41,384 LUTs, and 16,743 FFs.
The primary goal of single-image deraining is the reconstruction of a rain-free image from a single rainy image, hampered by the difficulty in disentangling rain streaks from the input rainy image. Progress made in existing substantial works notwithstanding, vital questions, for example, how to distinguish rain streaks from clean images, how to separate rain streaks from low-frequency pixels, and how to avoid blurry edges remain poorly addressed. This paper brings a single, unified strategy to resolve each of these problems. A noticeable characteristic of rainy images is the presence of rain streaks—bright, uniformly distributed stripes exhibiting elevated pixel values in each color channel. The process of separating the high-frequency rain streaks essentially amounts to reducing the pixel distribution's standard deviation in the rainy image. BMS-986278 This paper introduces a self-supervised rain streak learning network, which focuses on characterizing the similar pixel distribution patterns of rain streaks in various low-frequency pixels of grayscale rainy images from a macroscopic viewpoint. This is further complemented by a supervised rain streak learning network to analyze the unique pixel distribution of rain streaks at a microscopic level between paired rainy and clear images. Following this, a self-attentive adversarial restoration network is proposed to curb the recurring problem of blurry edges. An end-to-end network, M2RSD-Net, is constructed to discern macroscopic and microscopic rain streaks, thereby enabling the subsequent process of single-image deraining. The experimental data shows this method's benefits in deraining, outperforming current leading techniques in comparative benchmarks. The downloadable code is hosted at the GitHub address https://github.com/xinjiangaohfut/MMRSD-Net.
Multi-view Stereo (MVS) seeks to create a 3D point cloud model by utilizing multiple visual viewpoints. Compared to traditional methods, recent years have seen a substantial increase in the utilization and success of machine learning-driven multi-view stereo systems. In spite of their effectiveness, these procedures still exhibit shortcomings, including the escalating error in the graduated precision technique and the imprecise depth hypotheses based on the even distribution sampling method. This paper introduces NR-MVSNet, a coarse-to-fine architecture built upon depth hypotheses derived from normal consistency (DHNC) and refined through reliable attention (DRRA). The DHNC module efficiently produces depth hypotheses, more effective ones, by aggregating depth hypotheses from neighboring pixels, all of which have the same normals. BMS-986278 Consequently, the predicted depth is capable of exhibiting a smoother and more precise representation, particularly within areas characterized by a lack of texture or recurring patterns. Conversely, the DRRA module refines the initial depth map in the preliminary stage, merging attentional reference features and cost volume features to boost depth estimation precision and mitigate the cumulative error during this initial phase. To conclude, a range of experiments are undertaken with the DTU, BlendedMVS, Tanks & Temples, and ETH3D datasets. The experimental evaluation of our NR-MVSNet reveals its efficiency and robustness, exceeding that of current state-of-the-art methods. Our project's implementation is available to view at the given GitHub address: https://github.com/wdkyh/NR-MVSNet.
Video quality assessment (VQA) has been the subject of considerable recent attention. The temporal quality of videos is often captured by recurrent neural networks (RNNs), a method utilized by the majority of popular video question answering (VQA) models. However, a solitary quality score is commonly assigned to every extensive video sequence. RNNs may have difficulty mastering the long-term trends in quality. What then is the practical contribution of RNNs in the realm of video visual quality learning? Is the model's spatio-temporal representation learning as predicted, or does it simply over-aggregate and duplicate spatial characteristics? By utilizing carefully designed frame sampling strategies and spatio-temporal fusion techniques, we conduct a thorough investigation of VQA models in this study. Our in-depth investigations across four public, real-world video quality datasets yielded two key conclusions. Primarily, the plausible spatio-temporal modeling module, component i., starts. RNNs are not equipped to learn spatio-temporal features with quality. Sparsely sampled video frames demonstrate a performance level that is competitive with the performance obtained by utilizing every video frame as input, in the second place. For video quality analysis in VQA, spatial elements are indispensable. As far as we are aware, this is the inaugural investigation into the subject of spatio-temporal modeling in VQA.
The recently introduced DMQR (dual-modulated QR) codes are further enhanced through optimized modulation and coding techniques. These codes add supplemental data within the barcode image, replacing black modules with elliptical dots. Gains in embedding strength are realized through dynamic dot-size adjustments in both intensity and orientation modulations, which transmit the primary and secondary data, respectively. We subsequently constructed a model for the coding channel of secondary data to enable soft-decoding by utilizing 5G NR (New Radio) codes currently available on mobile devices. Theoretical analysis, simulations, and hands-on smartphone testing are instrumental in characterizing the performance advantages of the optimized designs. Simulation results and theoretical analyses inform the modulation and coding choices in our design; experimental results demonstrate the performance gains of the optimized design compared to the original, unoptimized designs. The optimized designs, importantly, markedly improve the usability of DMQR codes by using standard QR code beautification, which encroaches on a section of the barcode's space to accommodate a logo or graphic. At a 15-inch capture distance, the optimized designs exhibited a 10% to 32% elevation in the success rate of secondary data decoding, concurrent with gains in primary data decoding for longer capture distances. When applied to typical scenarios involving beautification, the secondary message is successfully deciphered in the proposed optimized models, but prior, unoptimized models are consistently unsuccessful.
Deeper insights into the brain, coupled with the widespread utilization of sophisticated machine learning methods, have significantly fueled the advancement in research and development of EEG-based brain-computer interfaces (BCIs). Nonetheless, current research demonstrates that machine learning systems are exposed to attacks by adversaries. Narrow-period pulses are proposed in this paper for EEG-based BCI poisoning attacks, thereby facilitating the implementation of adversarial strategies. Introducing purposefully deceptive samples during machine learning model training can result in the creation of potentially harmful backdoors. Test samples identified with the backdoor key are then categorized under the attacker's predefined target class. The backdoor key in our approach, unlike those in previous methods, avoids the necessity of synchronization with EEG trials, simplifying implementation substantially. A demonstration of the backdoor attack's effectiveness and resilience underlines a crucial security weakness in EEG-based BCIs, emphasizing the urgent need for remediation.