Paper-Conference

High Performance Computing Framework for Variable Selection on Genome-wide Association Studies

A high-performance computing framework for variable selection in GWAS that integrates state-of-the-art methods and employs novel optimization strategies for efficient processing of high-dimensional data with sparse characteristics.

Dec 3, 2024

UVMap-ID: A Controllable and Personalized UV Map Generative Model

A controllable and personalized UV Map generative model that fine-tunes a pre-trained text-to-image diffusion model with a face fusion module for ID-driven customized generation, addressing the challenges of personalized texture map generation and quality evaluation.

Oct 28, 2024

Neural Graphics Primitives-based Deformable Image Registration for On-the-fly Motion Extraction

In this study, we have successfully integrated NGP into DIR, a novel contribution that significantly enhances the accuracy and efficiency of medical image alignment as demonstrated on the DIR-lab dataset. The NGPDIR framework exhibits robust performance across various metrics, particularly in landmark alignment precision and the accommodation of anatomical sliding boundaries. This advancement not only propels the DIR field forward but also opens new avenues for real-time clinical applications, potentially transforming patient care with its rapid, reliable imaging capabilities.

Jul 8, 2024

Neural Clustering based Visual Representation Learning

A neural clustering framework (FEC) for visual representation learning that views feature extraction as selecting representatives from data, automatically capturing data distribution and providing interpretable cluster assignments.

Jun 17, 2024

VG4D: Vision-Language Model Goes 4D Video Recognition

A Vision-Language Model Goes 4D (VG4D) framework that transfers VLM knowledge to 4D point cloud networks for improved video recognition, achieving state-of-the-art performance on action recognition datasets.

May 13, 2024

FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation

A novel one-shot federated learning method that performs layer-wise posterior aggregation, achieving superior performance by aggregating posterior distributions instead of point estimates.

Jan 1, 2024

Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

In this work, we propose the Skeleton-in-Context, designed to process multiple skeleton-base tasks simultaneously after just one training time. Specifically, we build a novel skeleton-based in-context benchmark covering various tasks. In particular, we propose skeleton prompts composed of TGP and TUP, which solve the overfitting problem of skeleton sequence data trained under the training framework commonly applied in previous 2D and 3D in-context models. Besides, we demonstrate that our model can generalize to different datasets and new tasks, such as motion completion. We hope our research builds the first step in the exploration of in-context learning for skeleton-based sequences, which paves the way for further research in this area.

Dec 15, 2023

Explore In-Context Learning for 3D Point Cloud Understanding

We propose Point-In-Context (PIC), the first framework adopting the in-context learning paradigm for 3D point cloud understanding. Specifically, we set up an extensive dataset of point cloud pairs with four fundamental tasks to achieve in-context ability. We propose effective designs that facilitate the training and solve the inherited information leakage problem. PIC shows its excellent learning capacity, achieves comparable results with single-task models, and outperforms multitask models on all four tasks. Besides, it shows good generalization ability to out-of-distribution samples and unseen tasks and has great potential via selecting higher-quality prompts. We hope it paves the way for further exploration of in-context learning in the 3D modalities.

Dec 10, 2023

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video

This paper proposes the Pose and Mesh Co-Evolution network (PMCE), a new two-stage pose-to-mesh framework for recovering 3D human mesh from a monocular video. PMCE frst estimates 3D human pose motion in terms of spatial and temporal domains, then performs image-guided pose and mesh interactions by our proposed AdaLN that injects body shape information while preserving their spatial structure. Extensive experiments on popular datasets show that PMCE outperforms state-of-the-art methods in both perframe accuracy and temporal consistency. We hope that our approach will spark further research in 3D human motion estimation considering both pose and shape consistency.

Oct 2, 2023

Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation

This paper presents a joint Caption Grounding and Generation (CGG) framework for instance-level open vocabulary segmentation. The main contributions are: (1) using fine-grained object nouns in captions to improve grounding with object queries. (2) using captions as supervision signals to extract rich information from other words helps identify novel categories. To our knowledge, this paper is the first to unify segmentation and caption generation for open vocabulary learning. The proposed framework significantly improves OVIS and OSPS and comparable results on OVOD without pre-training on large-scale datasets.

Oct 2, 2023

Interweaved Graph and Attention Network for 3D Human Pose Estimation

An Interweaved Graph and Attention Network (IGANet) for 3D human pose estimation that enables bidirectional communication between GCNs and attentions, capturing both global and local correlations in human skeleton representations.

Jun 4, 2023

Gator: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose

A Graph-Aware Transformer (GATOR) framework for 3D human mesh recovery from 2D pose, combining Graph-Aware Transformer encoder and Motion-Disentangled Regression decoder to capture joint-joint, joint-vertex, and vertex-vertex relations.

Jun 4, 2023

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Sep 12, 2021

Quasi-dense similarity learning for multiple object tracking

We present Quasi-Dense Similarity Learning, which densely samples hundreds of region proposals on a pair of images for contrastive learning.

Feb 20, 2021

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

Feb 13, 2021

Is Attention Better Than Matrix Decomposition?

Self-attention is not better than the matrix decomposition~(MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies.

Sep 28, 2020

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

Jul 3, 2020

Position Constraint Loss For Fashion Landmark Estimation

A Position Constraint Loss (PCLoss) method for fashion landmark estimation that constrains error landmark locations by utilizing position relationships, applicable to both regression and heatmap-based methods without modifying network structure.

May 4, 2020

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

Feb 24, 2020

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

We leverage each object’s category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances.

Feb 7, 2020

Dynamic System Inspired Adaptive Time Stepping Controller for Residual Networks Families

We analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps.

Feb 7, 2020

Self-Refining Deep Symmetry Enhanced Network for Rain Removal

A Self-Refining Deep Symmetry Enhanced Network (DSEN) for rain removal that extracts rotation equivariant features and uses a self-refining mechanism to remove accumulated rain streaks in a coarse-to-fine manner.

Sep 22, 2019

Expectation Maximization Attention Networks for Semantic Segmentation

We formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed.

Jul 22, 2019

R^2 Net Recurrent and Recursive Network for Sparse View CT Artifacts Removal

We propose a novel neural network architecture to reduce streak artifacts generated in sparse-view 2D Cone Beam Computed To-mography (CBCT) image reconstruction.

Jun 19, 2019

Recurrent Squeeze-and-Excitation Net for Single Image Deraining

We propose a novel deep network architecture based on deep convolutional and recurrent neural networksfor single image deraining.

Jul 19, 2018