new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Dec 25

Deep Synoptic Array Science: Searching for Long Duration Radio Transients with the DSA-110

We describe the design and commissioning tests for the DSA-110 Not-So-Fast Radio Burst (NSFRB) search pipeline, a 1.4 GHz image-plane single-pulse search sensitive to 134 ms-160.8 s radio bursts. Extending the pulse width range of the Fast Radio Burst (FRB) search by 3 orders of magnitude, the NSFRB search is sensitive to the recently-discovered Galactic Long Period Radio Transients (LPRTs). The NSFRB search operates in real-time, utilizing a custom GPU-accelerated search code, cerberus, implemented in Python with JAX. We summarize successful commissioning sensitivity tests with continuum sources and pulsar B0329+54, estimating the 6sigma flux (fluence) threshold to be ~290 mJy (~40 Jy ms). Future tests of recovery of longer timescale transients, e.g. CHIME J1634+44, are planned to supplement injection testing and B0329+54 observations. An offline DSA-110 NSFRB Galactic Plane Survey was conducted to search for LPRTs, covering -3.5^circ<b<5.7^circ and 141^circ<l<225^circ (~770 square degrees) in Galactic coordinates. We estimate an upper limit Poissonian burst rate ~1 hr^{-1} per square degree (~7 hr^{-1} per 3^circtimes3^circ survey grid cell) maximized across the inner |b|<0.25^circ of the surveyed region. By imposing the ~290 mJy flux limit on two representative models (the magnetar plastic flow model and the White Dwarf-M Dwarf binary model), we reject with 95% confidence the presence of White Dwarf-M Dwarf binary LPRTs with periods between ~10-70s within ~95% of the surveyed region. Combined with the prevalence of LPRTs in the Galactic Plane, our results motivate further consideration of both White Dwarf-M Dwarf binary models and isolated magnetar models. We will continue to explore novel LPRT search strategies during real-time operations, such as triggered periodicity searches and additional targeted surveys.

  • 13 authors
·
Oct 20

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.

bigcode BigCode
·
Jun 22, 2024 8

Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges

Surveillance videos are an essential component of daily life with various critical applications, particularly in public security. However, current surveillance video tasks mainly focus on classifying and localizing anomalous events. Existing methods are limited to detecting and classifying the predefined events with unsatisfactory semantic understanding, although they have obtained considerable performance. To address this issue, we propose a new research direction of surveillance video-and-language understanding, and construct the first multimodal surveillance video dataset. We manually annotate the real-world surveillance dataset UCF-Crime with fine-grained event content and timing. Our newly annotated dataset, UCA (UCF-Crime Annotation), contains 23,542 sentences, with an average length of 20 words, and its annotated videos are as long as 110.7 hours. Furthermore, we benchmark SOTA models for four multimodal tasks on this newly created dataset, which serve as new baselines for surveillance video-and-language understanding. Through our experiments, we find that mainstream models used in previously publicly available datasets perform poorly on surveillance video, which demonstrates the new challenges in surveillance video-and-language understanding. To validate the effectiveness of our UCA, we conducted experiments on multimodal anomaly detection. The results demonstrate that our multimodal surveillance learning can improve the performance of conventional anomaly detection tasks. All the experiments highlight the necessity of constructing this dataset to advance surveillance AI. The link to our dataset is provided at: https://xuange923.github.io/Surveillance-Video-Understanding.

  • 7 authors
·
Sep 25, 2023