Training Data for Chatbots

Training Data for Chatbots

Case Study – Creation of IT Support Questions in Text Format to Train an Artificial IT Service Desk Agent.

Thousands of Clickworkers formulate possible IT support inquiries based on given IT user problem cases. This creates a multitude of query formulations which demonstrate how real users could communicate via an IT support chat. With these text samples a chatbot can be optimized for deployment as an artificial IT service desk agent, and the recognition rate considerably increased. For more insights into AI data collection processes and their significance in training chatbots, visit our detailed AI glossary page here.

Get in touch with us! +1 (212) 878-6686 +49 201 95971830
Training Data for Chatbots

The Challenge

As a rule chatbots access canned knowledge databases, in which answers to diverse questions are recorded. The principal challenge when programming chatbots is correctly recognizing the users’ questions, classifying them accurately in the database and issuing the correct answer, or asking valid follow-up questions if required. In addition to text data, understanding how to leverage audio data collection can further enhance a chatbot’s ability to process and respond to voice commands or queries, significantly broadening its applicability and efficiency. Chatbots learn from each new inquiry. The more requests a chatbot has processed, the better trained it is.

The knowledge database is continually expanded, and the bot’s detection patterns are refined. To further understand the intricacies of handling diverse data inputs, especially in the form of images, learning about image transcription can provide deeper insights into enhancing a chatbot’s recognition capabilities.

Depending on the field of application for the chatbot, thousands of inquiries in a specific subject area can be required to make it ready for use. Moreover, a large number of additional queries are necessary to optimize the bot, working towards the goal of reaching a recognition rate approaching 100%. For an in-depth example of how voice commands and speech datasets enhance chatbot interactions, see our detailed case study..

The Solution

Our Clickworkers have reformulated 500 existing IT support queries in seven languages, and so have created multiple new variations of how IT users could communicate with a support chatbot. Each predefined question is restated in three versions with different perspectives (neutral, he, she) for those languages that differentiate noun genders, or in two versions for languages that don’t.

Training Data for Chatbots

Example:

Preset sentence:“I can’t log in.”
Neutral – restated:“A login is not possible.”
He/She – restated:“My colleague can’t log in.”
“My boss cannot log in.” or
“Our new employee can’t log in.”

In an additional job type, Clickworkers formulate completely new queries for a fictitious IT support, akin to tasks involving audio annotation for enhancing chatbot understanding and response mechanisms.

Example:

The user receives one of the following error messages
1) “No user account exists.” or2) “No user account found.” etc.)

… along with sample sentences which echo inquiries from company employees to an IT support unit. Per language and issue, 20 Clickworkers each formulate six questions in the first person, the way they would ask them in an IT support chat, to describe the given problem

With the help of the numerous possible query formulations, the manufacturer trains the chatbot specifically for use as an IT service desk agent, and considerably increases the recognition rate and quality of the bot. Similarly, gathering diverse image datasets for machine learning is essential for training AI to accurately recognize and process visual information.

More about our service “Training data for AI systems

Project Data

Job “Reformulation”

  • Number of tasks: 3,500
  • Number of reformulated sentences: 10,500
  • Languages: DE, ES, EN, NL, NO, SE, DA
  • Data transfer method: vie API

Job “New Formulation”

  • Number of tasks: 7,000
  • Number of reformulated sentences: 42;000
  • Languages: DE, ES, EN, NL, NO, SE, DA
  • Data transfer method: vie API
Training Data for Chatbots

Workflow

  1. The project is discussed with the customer. The resulting tasks are defined and set down in a briefing for the Clickworkers.
  2. clickworker sets up the project, and publishes the text creation and/or query formulation and reformulation task as individual jobs on the clickworker platform.
  3. Numerous Clickworkers with the applicable native languages accept the jobs simultaneously, and create the texts in the Clickworker workplace according to the job guidelines.
  4. All compiled texts are delivered to the customer for approval via an API connection.

Advantages

  • Prompt receipt of training data
  • Easy access to a broad, international range of query texts / training data
  • Simple data transfer
  • Scalable throughput
  • Flexible workforce
  • Full service
Training Data for Chatbots

Whitepapers

Datasets for Voice bot training - White Paper

Bringing Intelligence to Voice Bots to Improve the Customer Experience

Datasets for Machine Learning - White Paper

Achieving AI ROI Through Data Quality and Diversity