Research Article | | Peer-Reviewed

Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool

Received: 29 October 2024     Accepted: 14 November 2024     Published: 18 December 2024
Views:       Downloads:
Abstract

This study evaluates the productivity improvements achieved using a generative artificial intelligence personal assistant tool (PAT) developed by Trane Technologies. The PAT, based on OpenAI’s GPT 3.5 model, was deployed on Microsoft Azure to ensure secure access and protection of intellectual property. To assess the tool’s productivity effectiveness, an experiment was conducted comparing the completion times and content quality of four common office tasks: writing an email, summarizing an article, creating instructions for a simple task, and preparing a presentation outline. Sixty-three (63) participants were randomly divided into a test group using the PAT and a control group performing the tasks manually. Results indicated significant productivity enhancements, particularly for tasks involving summarization and instruction creation, with improvements ranging from 3.3% to 69%. The study further analyzed factors such as the age of users, response word counts, and quality of responses, revealing that the PAT users generated more verbose and higher-quality content. Writing email content improved by 3.3%, summarizing text improved by 69%, creating instructions improved by 45.9%, and preparing an outline improved by 24.8%. An ’LLM-as-a-judge’ method employing GPT-4 was used to grade the quality of responses, which effectively distinguished between high and low-quality outputs. The findings underscore the potential of PATs in enhancing workplace productivity and highlight areas for further research and optimization.

Published in American Journal of Artificial Intelligence (Volume 8, Issue 2)
DOI 10.11648/j.ajai.20240802.16
Page(s) 68-80
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Generative AI, Productivity, Employee Assistant, Task Analysis, AI Integration

References
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems. 2017, 30, 5998- 6008.
[2] OpenAI. Introducing ChatGPT. [Internet]. Available from:
[3] Lindsey Wilkinson. How P&G rolled out its internal generative AI model. [Internet]. Available from: https://www.ciodive.com/news/procter-gamble- PG-chatgp-AI-openAI/697067/. [Accessed 25 May 2024].
[4] Shakked Noy, Whitney Zhang. Experimental evidence on the productivity effects of generative artificial intelligence. Science. 2023, 381(6654), 187-192.
[5] F. Dell’Acqua, E. McFowland III, E. Mollick, H. Lifshitz-Assaf, K. C. Kellogg, S. Rajendran, L. Krayer, F. Candelon, K. R. Lakhani. Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Technical Report 24-013, Harvard Business School, September 2023.
[6] Erik Brynjolfsson, Danielle Li, Lindsey Raymond. Generative AI at work. [Internet]. Available from: https://www.nber.org/system/files/working papers/w311 61/w31161.pdf
[7] Jakob Nielsen. AI improves employee productivity by 66%. [Internet]. Available from: https://www.nngroup.com/articles/ai-tools-productivity- gains/. [Accessed 27 May 2024].
[8] Sida Peng, Eirini Kalliamvakou, Peter Cihon, Mert Demirer. The impact of AI on developer productivity: Evidence from github copilot, 2023.
[9] Samia Kabir, David Udo-Imeh, Bonan Kou, Tianyi Zhang. Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions, 2024.
[10] Patrick E. McKnight, Julius Najab. Mann-Whitney U Test. In Methods in behavioral research, Edition. John Wiley: New York, NY, USA; 2010, pp. 1-1.
[11] M Morris, V Venkatesh. Age differences in technology adoption decisions: Implications for a changing work force. Personnel Psychology. 2006.
[12] C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology. 1904, 15(1), 72-101.
[13] David C. Hoaglin. Volume 16: How to detect and handle outliers, 2013. [Internet]. Available from:
[14] Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, NicoleLimtiaco, RhomniSt. John, NoahConstant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil. Universal sentence encoder. CoRR, abs/1803.11175, 2018. [Internet]. Available from:
[15] R. L. Boyd, A. Ashokkumar, S. Seraj, J. Pennebaker. The development and psychometric properties of LIWC-22. University of Texas at Austin, 2022. [Internet]. Available from:
[16] Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica. Judging LLM-as-a-judge with mt-bench and chatbot arena, 2023.
[17] Quinn Leng, Kasey Uhlenhuth, Alkis Polyzotis. Best practices for LLM evaluation of RAG applications. [Internet]. Available from: https://www.databricks.com/blog/LLM-auto-eval-best- practices-RAG. [Accessed 7 June 2024].
Cite This Article
  • APA Style

    Freeman, B. S., Arriola, K., Cottell, D., Lawlor, E., Erdman, M., et al. (2024). Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool. American Journal of Artificial Intelligence, 8(2), 68-80. https://doi.org/10.11648/j.ajai.20240802.16

    Copy | Download

    ACS Style

    Freeman, B. S.; Arriola, K.; Cottell, D.; Lawlor, E.; Erdman, M., et al. Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool. Am. J. Artif. Intell. 2024, 8(2), 68-80. doi: 10.11648/j.ajai.20240802.16

    Copy | Download

    AMA Style

    Freeman BS, Arriola K, Cottell D, Lawlor E, Erdman M, et al. Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool. Am J Artif Intell. 2024;8(2):68-80. doi: 10.11648/j.ajai.20240802.16

    Copy | Download

  • @article{10.11648/j.ajai.20240802.16,
      author = {Brian S. Freeman and Kendall Arriola and Dan Cottell and Emmett Lawlor and Matt Erdman and Trevor Sutherland and Brian Wells},
      title = {Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool},
      journal = {American Journal of Artificial Intelligence},
      volume = {8},
      number = {2},
      pages = {68-80},
      doi = {10.11648/j.ajai.20240802.16},
      url = {https://doi.org/10.11648/j.ajai.20240802.16},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20240802.16},
      abstract = {This study evaluates the productivity improvements achieved using a generative artificial intelligence personal assistant tool (PAT) developed by Trane Technologies. The PAT, based on OpenAI’s GPT 3.5 model, was deployed on Microsoft Azure to ensure secure access and protection of intellectual property. To assess the tool’s productivity effectiveness, an experiment was conducted comparing the completion times and content quality of four common office tasks: writing an email, summarizing an article, creating instructions for a simple task, and preparing a presentation outline. Sixty-three (63) participants were randomly divided into a test group using the PAT and a control group performing the tasks manually. Results indicated significant productivity enhancements, particularly for tasks involving summarization and instruction creation, with improvements ranging from 3.3% to 69%. The study further analyzed factors such as the age of users, response word counts, and quality of responses, revealing that the PAT users generated more verbose and higher-quality content. Writing email content improved by 3.3%, summarizing text improved by 69%, creating instructions improved by 45.9%, and preparing an outline improved by 24.8%. An ’LLM-as-a-judge’ method employing GPT-4 was used to grade the quality of responses, which effectively distinguished between high and low-quality outputs. The findings underscore the potential of PATs in enhancing workplace productivity and highlight areas for further research and optimization.},
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Evaluation of Task Specific Productivity Improvements Using a Generative Artificial Intelligence Personal Assistant Tool
    AU  - Brian S. Freeman
    AU  - Kendall Arriola
    AU  - Dan Cottell
    AU  - Emmett Lawlor
    AU  - Matt Erdman
    AU  - Trevor Sutherland
    AU  - Brian Wells
    Y1  - 2024/12/18
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ajai.20240802.16
    DO  - 10.11648/j.ajai.20240802.16
    T2  - American Journal of Artificial Intelligence
    JF  - American Journal of Artificial Intelligence
    JO  - American Journal of Artificial Intelligence
    SP  - 68
    EP  - 80
    PB  - Science Publishing Group
    SN  - 2639-9733
    UR  - https://doi.org/10.11648/j.ajai.20240802.16
    AB  - This study evaluates the productivity improvements achieved using a generative artificial intelligence personal assistant tool (PAT) developed by Trane Technologies. The PAT, based on OpenAI’s GPT 3.5 model, was deployed on Microsoft Azure to ensure secure access and protection of intellectual property. To assess the tool’s productivity effectiveness, an experiment was conducted comparing the completion times and content quality of four common office tasks: writing an email, summarizing an article, creating instructions for a simple task, and preparing a presentation outline. Sixty-three (63) participants were randomly divided into a test group using the PAT and a control group performing the tasks manually. Results indicated significant productivity enhancements, particularly for tasks involving summarization and instruction creation, with improvements ranging from 3.3% to 69%. The study further analyzed factors such as the age of users, response word counts, and quality of responses, revealing that the PAT users generated more verbose and higher-quality content. Writing email content improved by 3.3%, summarizing text improved by 69%, creating instructions improved by 45.9%, and preparing an outline improved by 24.8%. An ’LLM-as-a-judge’ method employing GPT-4 was used to grade the quality of responses, which effectively distinguished between high and low-quality outputs. The findings underscore the potential of PATs in enhancing workplace productivity and highlight areas for further research and optimization.
    VL  - 8
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Advanced Technologies, Trane Technologies, Davidson NC, USA

  • Data & Analytics, Trane Technologies, Davidson NC, USA

  • Data & Analytics, Trane Technologies, Davidson NC, USA

  • Engineering, Trane Technologies, Galway, Ireland

  • Data & Analytics, Trane Technologies, Davidson NC, USA

  • AI Foundry, Trane Technologies, Davidson NC, USA

  • AI Foundry, Trane Technologies, Davidson NC, USA

  • Sections