OmniFill: Domain-Agnostic Form Filling Suggestions Using Multi-Faceted Context arXiv Preprint, October 2023
the task involves transformation operations, additional prompts to
a lighter-weight LLM (or a semantic search using an embedding
model) could be used to handle many simpler cases. OmniFill’s full
LLM prompt would still be responsible for producing the actual sug-
gestions, since the full bag of context may be valuable for making
high-quality predictions (e.g. if the form structure is poorly-labeled
but can be learned through prior form-lling examples), but post
hoc “attribution” may be achieved through an auxiliary system.
Although in situ training of the system can be a low-friction
way to oer predictive suggestions, systems should provide users
with the ability to view and rene their task specications, e.g. by
curating their set of examples to maximize system accuracy. Future
work could assist users in this process by detecting and surfacing
potentially anomalous prior examples or by engaging the user in a
dialogue to dene and ne-tune task specications as the system is
used over time.
9 CONCLUSION
Not every task calls for full automation or an elaborate specica-
tion. Even when task denitions are fuzzy, partial automation of
the simpler tedious components of form lling tasks can prove
valuable, and LLM-backed systems like OmniFill can serve as a
“glue” between arbitrary context sources and target forms without
heavy conguration. We demonstrate opportunities of LLM-backed
systems to assist in a unique subspace of form lling tasks, then
describe our observations of users trying the prototype. We believe
this is a rich space for future system designers to explore, but care
must be taken to understand how people perceive and use such
systems, especially in a landscape of rapidly-expanding capabilities
and expectations for articial intelligence tools.
DISCLOSURE
The authors used GitHub Copilot v1.111.404 for code prediction in
the preparation of gure source code.
ACKNOWLEDGMENTS
We would like to thank Shm Garanganao Almeda, James Smith,
and Matthew Beaudouin-Lafon for their valuable insights that con-
tributed to the framing of this work.
REFERENCES
[1]
2023. Admidio – Free online membership management software. https://www.
admidio.org/. Accessed: 2023-09-14.
[2]
2023. EspoCRM.com: Free Self Hosted & Cloud CRM software. https://www.
espocrm.com/. Accessed: 2023-09-14.
[3]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes,
Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Haus-
man, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan,
Eric Jang, Rosario Jauregui Ruano, Kyle Jerey, Sally Jesmonth, Nikhil Joshi, Ryan
Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao
Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao,
Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan,
Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu,
Mengyuan Yan, and Andy Zeng. 2022. Do As I Can and Not As I Say: Grounding
Language in Robotic Aordances. In arXiv preprint arXiv:2204.01691.
[4]
Kenneth C Arnold, Krysta Chauncey, and Krzysztof Z Gajos. 2020. Predictive text
encourages predictable writing. In Proceedings of the 25th International Conference
on Intelligent User Interfaces. 128–138.
[5]
Shaon Barman, Sarah Chasins, Rastislav Bodik, and Sumit Gulwani. 2016. Ringer:
Web Automation by Demonstration. In Proceedings of the 2016 ACM SIGPLAN In-
ternational Conference on Object-Oriented Programming, Systems, Languages, and
Applications (Amsterdam, Netherlands) (OOPSLA 2016). Association for Comput-
ing Machinery, New York, NY, USA, 748–764. https://doi.org/10.1145/2983990.
2984020
[6]
Holger Bast and Ingmar Weber. 2006. Type Less, Find More: Fast Autocompletion
Search with a Succinct Index. In Proceedings of the 29th Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval (Seattle,
Washington, USA) (SIGIR ’06). Association for Computing Machinery, New York,
NY, USA, 364–371. https://doi.org/10.1145/1148170.1148234
[7]
Hichem Belgacem, Xiaochen Li, Domenico Bianculli, and Lionel Briand. 2023.
A Machine Learning Approach for Automated Filling of Categorical Fields in
Data Entry Forms. ACM Trans. Softw. Eng. Methodol. 32, 2, Article 47 (apr 2023),
40 pages. https://doi.org/10.1145/3533021
[8]
Eric A Bier, Edward W Ishak, and Ed Chi. 2006. Entity quick click: rapid text
copying based on automatic entity extraction. In CHI’06 Extended Abstracts on
Human Factors in Computing Systems. 562–567.
[9]
Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN
based machine learning approach for text and document mining. International
Journal of Database Theory and Application 7, 1 (2014), 61–70.
[10]
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora,
Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma
Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon,
Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Dem-
szky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John
Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren
Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori
Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu,
Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth
Karamcheti, Geo Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark
Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina
Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu
Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele
Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman,
Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut,
Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance,
Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong,
Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori
Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Ro-
han Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang,
Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Ji-
axuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui
Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2022. On the Opportunities
and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
[11]
Raymond R. Bond, Tomas Novotny, Irena Andrsova, Lumir Koc, Martina Sisakova,
Dewar Finlay, Daniel Guldenring, James McLaughlin, Aaron Peace, Victoria
McGilligan, Stephen J. Leslie, Hui Wang, and Marek Malik. 2018. Automation
bias in medicine: The inuence of automated diagnoses on interpreter accuracy
and uncertainty when reading electrocardiograms. Journal of Electrocardiology 51,
6, Supplement (2018), S6–S11. https://doi.org/10.1016/j.jelectrocard.2018.08.007
[12]
Vinayak R. Borkar, Kaustubh Deshmukh, and Sunita Sarawagi. 2000. Automat-
ically extracting structure from free text addresses. IEEE Data Eng. Bull. 23, 4
(2000), 27–32.
[13]
José Cambronero, Sumit Gulwani, Vu Le, Daniel Perelman, Arjun Radhakrishna,
Clint Simon, and Ashish Tiwari. 2023. FlashFill++: Scaling Programming by Ex-
ample by Cutting to the Chase. In Principles of Programming Languages. ACM SIG-
PLAN, ACM. https://www.microsoft.com/en-us/research/publication/ashll-
scaling-programming-by-example-by-cutting-to-the-chase/
[14]
Sarah E. Chasins, Maria Mueller, and Rastislav Bodik. 2018. Rousillon: Scrap-
ing Distributed Hierarchical Web Data. In Proceedings of the 31st Annual ACM
Symposium on User Interface Software and Tec hnology (Berlin, Germany) (UIST
’18). Association for Computing Machinery, New York, NY, USA, 963–975.
https://doi.org/10.1145/3242587.3242661
[15]
John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar,
and Minsuk Chang. 2022. TaleBrush: Visual Sketching of Story Generation with
Pretrained Language Models. In Extended Abstracts of the 2022 CHI Conference
on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22).
Association for Computing Machinery, New York, NY, USA, Article 172, 4 pages.
https://doi.org/10.1145/3491101.3519873
[16]
Morgan Dixon and James Fogarty. 2010. Prefab: implementing advanced behav-
iors using pixel-based reverse engineering of interface structure. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems. 1525–1534.
[17]
Nouha Dziri, Sivan Milton, Mo Yu, Osmar Zaiane, and Siva Reddy. 2022. On
the Origin of Hallucinations in Conversational Models: Is it the Datasets or the
Models?. In Proceedings of the 2022 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies.
Association for Computational Linguistics, Seattle, United States, 5271–5285.
https://doi.org/10.18653/v1/2022.naacl-main.387
[18]
Steven M. Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach,
Tianie N. Horne, Michal Lahav, Robert MacDonald, Rain Breaw Michaels, Ajit