Amplifying Bug Coverage through Social Forum Insights and BERT

Client Overview
The client is a leading global ride-hailing and transportation network company operating through a robust mobile application platform. With operations spanning over 900 metropolitan regions worldwide, the client has transformed urban mobility by delivering scalable, real-time alternatives to conventional taxi services. In addition to ride-hailing, the client is strategically diversifying its portfolio across adjacent logistics and mobility sectors, leveraging its digital infrastructure to optimize on-demand transportation and last-mile delivery solutions.
Business Requirements: Structuring Social Conversations for Actionable Outcomes
Social Forum Monitoring
Continuously monitor and ingest posts from Reddit and other relevant social platforms where customers share feedback, report issues, or discuss the service.
Automated Ticket Creation
Integrate the classification pipeline with Jira to automatically generate tickets based on the categorized posts, ensuring seamless handoff to relevant internal teams.
Enhanced Issue Resolution Process
Enable faster response times and more comprehensive coverage of user-reported issues by systematically capturing valuable feedback from public forums, improving overall customer satisfaction and service quality.
Post Classification Framework
Implement a multi-layer classification model: Primary Classification: Identify whether a post is actionable or non-actionable. Secondary Classification: For actionable posts, categorize them into predefined groups: Bug Reports - Issues affecting the functionality of services.Feature Requests - Suggestions for new or improved features. Support Inquiries - Customer questions or requests for assistance.
The Roadblocks to Actionable Intelligence: Key Challenges Identified
Unstructured User Content
Reddit posts were inherently unstructured, with wide variations in tone, length, and context, making precise classification a complex task.
Blurred Category Boundaries
Many posts overlapped multiple actionable areas. For example, a single post often described a bug while simultaneously suggesting a new feature, which demanded a refined, multi-label classification approach.
Massive Data Scale
The continuous influx and high volume of posts made manual review and categorization impractical, highlighting the need for a robust, scalable automation framework.
Mining Reddit Conversations: The Two-Layer NLP Engine
We designed and implemented a robust Natural Language Processing (NLP)-powered two-layer classification model to decode Reddit conversations and extract meaningful, actionable insights. This intelligent solution empowered the client to surface high-value insights, eliminate noise, and prioritize internal resources effectively, ultimately strengthening customer engagement and response efficiency.
Actionability Filter (Layer 1)
Classified incoming posts as either actionable or non-actionable, cutting through irrelevant chatter.
Deep Dive Categorization (Layer 2)
Further sorted actionable posts into bug reports, feature requests, or support inquiries, providing clear, structured data streams.
The Engine Behind the Insights: Solution Architecture & Highlights
The NLP solution was architected with flexibility at its core. Its modular structure enabled seamless expansion to new subcategories or additional social platforms without requiring a complete redesign. Designed to process high volumes of unstructured data, the system comfortably handled the traffic levels typical of platforms like Reddit, ensuring reliable performance as monitoring needs grew.
Data Sourcing and Preprocessing
-
Collected data from Reddit’s API, focusing on posts related to the client’s services.
-
Preprocessed the data by removing irrelevant information such as HTML tags, URLs, and user mentions.
-
Tokenized and vectorized the text data using NLP techniques, including TF-IDF and word embeddings.
Two-Layer Classification Model
-
Layer 1
Implemented a binary classification model using supervised learning to classify posts into actionable and non-actionable categories. The training data for this layer was sourced from a manually curated Reddit classification sheet.
-
Layer 2
Developed a multi-class classification model to classify actionable posts into subcategories (bug reports, feature requests, support inquiries). This layer was trained using data from the client’s internal Jira system to identify specific actionable types accurately.
Both layers leveraged transformer-based models (e.g., BERT) to capture the contextual meaning of posts.
Model Training and Optimization
-
Trained the models on labeled datasets curated from the Reddit classification manual sheet (Layer 1) and Jira data (Layer 2).
-
Validated the models using cross-validation techniques to ensure robustness.
-
Tuned hyperparameters to optimize model performance and reduce misclassification rates.
Feedback Loop for Continuous Improvement
-
Integrated a feedback loop where misclassified posts were manually reviewed and added to the training data to improve model accuracy over time.
From Social Buzz to Business Value: Impact Delivered

Efficient Categorization
Automated Reddit post classification reduced manual effort by 60% and accelerated response times to actionable posts by 70%.
Enhanced Customer Engagement
Enabled the client to address bugs quickly, roll out new features, and resolve support queries, leading to higher customer satisfaction.
Scalable Architecture
The model’s flexible design ensured seamless integration with new platforms and subcategories, keeping the solution future-ready.
Improved Decision-Making
Delivered structured, actionable insights that helped the client prioritize initiatives and investments based on real-time user feedback.
Reduction in Noise
Effectively filtered out irrelevant posts, boosting the efficiency of downstream workflows and support teams.
About Indium
Indium is an Al-driven digital engineering company that helps enterprises build, scale, and innovate with cutting-edge technology. We specialize in custom solutions, ensuring every engagement is tailored to business needs with a relentless customer-first approach. Our expertise spans Generative Al, Product Engineering, Intelligent Automation, Data & Al, Quality Engineering, and Gaming, delivering high-impact solutions that drive real business impact.
With 5,000+ associates globally, we partner with Fortune 500, Global 2000, and leading technology firms across Financial Services, Healthcare, Manufacturing, Retail, and Technology-driving impact in North America, India, the UK, Singapore, Australia, and Japan to keep businesses ahead in an Al-first world.
