Patrick H. Gaughan1 En Cheng2 Taylor Burgess2 and Aine C. Bolton2, 1School of Law, University of Akron, Akron, Ohio, USA, 2Department of Computer Science, University of Akron, Akron, Ohio, USA
This study uses Latent Dirichlet Allocation (“LDA”) to explore the dimensionality of the U.S. practice of law. The dataset came from a national directory of U.S. lawyers in private practice in 2000. The dataset consisted of 1,058,788 practice areas contained within 437,210 individual lawyer profiles. The resulting coherence scores implied a preferred eight (8) topic solution. The resulting topic allocations also made it possible to systematically bin individual practice areas into discrete practice area distributions. Although the short document lengths may present questions and the practice area lists contained uncontrolled semantic relations, these questions will be subjected to future research. In its current state, this study makes contributions to the existing literature in at least two areas: 1) it provides support for the existence of the hypothesized law practice dimensionality, and 2) it provides an empirical basis for developing an improved measurement of the U.S. practice of law.
Law Practice Measure, Practice Area, Practice Grouping, Practice Dimensionality, Latent Dirichlet Allocation.
Sinan Gultekin, Achille Globo, Andrea Zugarini,Marco Ernandes, and Leonardo Rigutini, Department of Hybrid Linguistic Technologies.expert.ai, Siena, Italy
Most Machine Learning research evaluates the best solutions in terms of performance. However, in the race for the best performing model, many important aspects are often overlooked when, on the contrary, they should be carefully considered. In fact, sometimes the gaps in performance between different approaches are neglectable, whereas factors such as production costs, energy consumption, and carbon footprint must take into consideration. Large Language Models (LLMs) are extensively adopted to address NLP problems in academia and industry. In this work, we present a detailed quantitative comparison of LLM and traditional approaches (e.g. SVM) on the LexGLUE benchmark, which takes into account both performance (standard indices) and alternative metrics such as timing, power consumption and cost, in a word: the carbon-footprint. In our analysis, we considered the prototyping phase (model selection by training-validation-test iterations) and in-production phases separately, since they follow different implementation procedures and also require different resources. The results indicate that very often, the simplest algorithms achieve performance very close to that of large LLMs but with very low power consumption and lower resource demands. The results obtained could suggest companies to include additional evaluations in the choice of Machine Learning (ML) solutions.
NLP, text mining, green AI, green NLP, carbon footprint, energy consumption, evaluation..
Ilias Boulbarj1 Bouklouze Abdelaziz1 Yousra El Alami2 Douzi Samira3,4 Douzi Hassan1, 1IRF-SIC Laboratory ,Ibn Zohr Universitys -Agadir, Morocco, 2Laboratory of Pharmacology and Toxicology, Faculty of Medicine and Pharmacy, Mohammed V University, Morocco, 3Faculty of Sciences, IPSS Laboratory, Mohammed V University, Rabat, Morocco, 4Faculty of Medicine and Pharmacy, Mohammed V University, Rabat, Morocco
Honey, an organic and highly esteemed dietary substance, is vulnerable to adulteration, hence presenting significant implications for both public health and economic welfare. The conventional techniques employed for the identification of honey adulteration are characterized by prolonged duration and frequently exhibit limited sensitivity. This paper introduces a novel methodology to tackle the aforementioned problem by utilizing Convolutional Neural Networks (CNNs) to classify honey samples using thermal pictures. Thermal imaging technology possesses a distinct edge in the identification of adulterants, as it has the capability to unveil temperature discrepancies within honey samples resulting from variances in sugar composition, moisture levels, and additional adulterating agents. In order to develop a reliable and precise approach for categorizing honey, we gathered a comprehensive dataset consisting of thermal pictures of genuine and contaminated honey samples. Multiple convolutional neural network (CNN) architectures were trained and fine-tuned using this dataset, the findings of our study showcase the capacity of thermal image analysis in conjunction with convolutional neural networks (CNNs) as an effective instrument for promptly and accurately identifying instances of honey adulteration. The methodology presents a potentially advantageous pathway for implementing quality control measures within the honey business, thereby guaranteeing the authenticity and safety of this valuable organic commodity.
Honey adulteration, Inception Net, CNN, Thermal imaging, Quality control, human health.
Theo Zangato, Osmani, Sorbonne Paris Nord University - LIPN-UMR CNRS 7030, France
Amidst increasing energy demands and growing environmental concerns, the promo- tion of sustainable and energy-efficient practices has become imperative. This paper introduces a reinforcement learning-based technique for optimizing energy consumption and its associated costs, with a focus on energy management systems. A three-step approach for the efficient man- agement of charging cycles in energy storage units within buildings is presented combining RL with prior knowledge. A unique strategy is adopted: clustering building load curves to discern typical energy consumption patterns, embedding domain knowledge into the learning algorithm to refine the agent’s action space and predicting of future observations to make real-time decisions. We showcase the effectiveness of our method using real-world data. It enables controlled explo- ration and efficient training of Energy Management System (EMS) agents. When compared to the benchmark, our model reduces energy costs by up to 15%, cutting down consumption during peak periods, and demonstrating adaptability across various building consumption profiles.
Reinforcement Learning, Energy Management Systems, Time-Series, Clustering
Ram Sivaraman1, Joe Xiao2, 1Liberal Arts and Science Academy, Austin, Texas, USA, 2Optum/UnitedHealthCare, Minneapolis, Minnesota, USA
An electrocardiogram (ECG) is a common method used for diagnosis of heart diseases. ECG is not sufficient to detect heart abnormalities early. Heart sound monitoring or phonocardiogram (PCG) is a non-invasive assessment that can be performed during routine exams. PCG can provide valuable details for both heart disorder diagnosis as well as any perioperative cardiac monitoring. Further, heart murmurs are abnormal signals generated by turbulent blood flow in the heart and are closely associated with specific heart diseases.This paper presents a new machine learning-based heart sounds evaluation for murmurs with high accuracy. A random forest classifier is built using the statistical moments of the coefficients extracted from the heart sounds. The classifier can predict the location of the heart sounds with over 90% accuracy. The random forest classifier has a murmur detection accuracy of over 70% for test dataset and detects with over 98% accuracy for the full dataset.
Random Forest Network, Phonocardiogram, Heart Murmur, Sound Features
Xabier Echeberria-Barrio, Mikel Gorricho, Selene Valencia, and Francesco Zola, Vicomtech Foundation, Basque Research and Technology Alliance (BRTA); Paseo Mikeletegi, 57, Donostia 20009, Spain
The usage of Artificial Intelligence (AI) systems has increased exponentially, thanks to their ability to reduce the amount of data to be analyzed, the user efforts and preserving a high rate of accuracy. However, introducing this new element in the loop has converted them into attacked points that can compromise the reliability of the systems. This new scenario has raised crucial challenges regarding the reliability and trustworthiness of the AI models, as well as about the uncertainties in their response decisions, becoming even more crucial when applied in critical domains such as healthcare, chemical, electrical plants, etc. To contain these issues, in this paper, we present NeuralSentinel (NS), a tool able to validate the reliability and trustworthiness of AI models. This tool combines attack and defence strategies and explainability concepts to stress an AI model and help non-expert staff increase their confidence in this new system by understanding the model decisions. NS provide a simple and easy-to-use interface for helping humans in the loop dealing with all the needed information. This tool was deployed and used in a Hackathon event to evaluate the reliability of a skin cancer image detector. During the event, experts and non-experts attacked and defended the detector, learning which factors were the most important for model misclassification and which techniques were the most efficient. The event was also used to detect NS’s limitations and gather feedback for further improvements.
Adversarial Attack, Defence Strategy, Trustworthiness AI, Explainability, Human-AI Teaming.
Ricardo de Deijn, Aishwarya Batra, Brandon Koch, Hema Makkena and Naseef Mansoor, Department of Computer Information Science, Minnesota State University, Mankato, United States of America
The growth of generative models has increased the ability of image processing and provides numerous industries the technology to produce realistic image transformations. However, with the field being newly established there are new evaluation metrics that can further this research. Previous research has shown the FID to be an effective metric when testing these image-to-image GANs in real-world applications. SID, a founded metric in 2023, expands on FID by allowing unsigned distances. This paper uses data that consists of building façades, cityscapes, and maps within Pix2Pix and CycleGAN models. After training these models, we evaluate the models on both the FID and SID metrics. FID is a standalone metric within image-to-image GAN and is commonly used to assess GAN performances. Our findings indicate that the creation of SID incorporates a new efficient and effective metric to complement, or even exceed the ability shown using the FID for the translation GANs.
Signed Inception Distance, Fréchet Inception Distance, Generative Adversarial Networks, Supervised Image-to-Image Translation.
Wouter Knibbe, Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Games and agents are the two main components of Reinforcement Learning (RL). While the literature on agents is extensively detailed, little knowledge is shared about how the games they play should be designed. Through a data-science-oriented, integrative literature review, a body of RL literature is analyzed. It was found that no papers reference a method for RL environment design, game design, or a method that serves a similar purpose. In an attempt to create such a method, an evidence-based guide is synthesized which summarizes the best practices in RL environment design. The results show a methodological difference between RL environments in general and those that allow researchers to study and solve real-world systems. The latter we define as AI games with a purpose, or AI games for short. While the results of this study can be used to inform design decisions for future AI games, it is also based on a body of literature that is fundamentally lacking and largely unreproducible. This study should be seen as a zero measurement from which the literature can start to catch up to advancements in the field and develop its methodology of modern RL environment design.
Artificial Intelligence, Reinforcement Learning, Methodology, Literature Review, Games, Real-world AI.