Machine learning based Customer Segmentation
Machine learning based Customer Segmentation
Introduction
Since it is easy for customers to watch videos and live streaming programs in mobile phones, several customers are making use of huge amount of data with the support of faster means of 4G network system. Moreover, the huge collection of diverse apps in mobile phones also consume a big space of the memory. Because of this, the data service providers also increase for mobile providers [9]. In addition, as more apps for iOS and Android devices are advanced, new customers download new apps and use them, which in advance increases their data usage [1]. As of increasing mobile data prices, huge numbers of subscriber’s churn from one provider to a different one in pursuit of enhanced taxes. They too churn providers to obtain assistances for validating up with a new carrier, such as receiving a free or deeply discounted phone [10]. In addition, the lower signup fees associated with prepaid mobile services also encourages customers to churn.
“The ability of mobile customers to keep their existing mobile numbers through the Wireless local number portability (WLNP) reduces barriers to churning within the industry, which is a major problem for companies in the telecommunications industry [2]. Because of the likelihood of customers to change providers, the deals that telecommunication companies offer may differ based on the needs of individual customers and their wiliness to pay for particular services. ID Mobile Ireland are the company that is the basis for the work performed in this paper. They are a start-up telecommunications provider in the Republic of Ireland. The company differentiates itself in the competitive Irish market by separating the mobile tariff from the handset [7]. This allows customers the flexibility to enter or leave a 12, 18 or 24-month contract without penalty, and purchase a new handset every three months, should they wish to do so once the previous handset cost is fully paid off”.
Additionally, as the customer is not tied down into an extended contract where the cost of the phone is subsidized by the tariff price, customers may change their tariff call, text and data allowances every month, to suit their individual needs, allowing them more control over their account charges. They could, for example, increase their call minutes bundle amount for the month of December should they envisage making more calls during this peak holiday period. The company has access to a wide range of data, with the prospect to capture even more data, growing at a rapid rate. The data that the company can access are currently not being used to their full potential as a means of understanding the customers that are served, their sale patterns, the potential fraud risks, and churn patterns [8]. In this paper, the goal is to collect, clean, categorize, and gain insight from a large dataset spanning 16 months of Bill Pay customer account data that contains 26717 rows and 86 columns of attributes. The data also contain an additional 11 columns comprised of formula derived values or classes used to categorize the data. The primary aim of this effort is to better meet customer needs, improve customer satisfaction, developing customer loyalty to the brand as a means of improving customer retention.
“The initial step in carrying out this effort was to acquire the relevant data to generate the various reports, using the attributes available, and to cross check the results in the production customer care system as a means of confirming the accuracy of the data. This verification step proved to be very important because as several tables of data where combined, erroneous results occurred [4]. This meant that separate reports had to be created because all of the data could not be in one report due to the database tables not containing the required logic to be joined together or the report data outputted exceeding the current maximum capable by the system, which was 70,000 cells of data”.
Literature Review
The variables that are generally used for market segmentation are Demographic, socioeconomic, and geographic characteristics of the customers. A very useful technique for behavioral-based data mining method in the RFM analysis, which involves the extraction of customer profiles by using a few criteria, which reduces the complexity of analysis [5]. “In RFM analysis, customer data are classified by Recency (R), Frequency (F) and Monetary (M) variables. It has been noted that RFM enables the practitioners to observe customer behavior, as well as to segment customers in order to determine immediate customer value [5]. It should also be noted that using decision rules algorithms for the purpose of customer segmentation may result in an efficient evaluation of a segmentation plan [3]. Decision trees can be identified into sets of if-then rules, which means that they can be used to solve a variety of problems, such as customer segmentation and customer churn prediction. In fact, many researchers have used this method to study customer segmentation”.
A customer satisfaction survey can be used to construct a customer segmentation system based on demographic variables and even customer reviews [1]. Researchers have provided ideas about modelling customer satisfaction using unstructured data with a Bayesian approach. They explain that the transformation of unstructured data taken from customer’s reviews into a semi-structured form associated with each aspect reflecting the frequency counts for positive, negative, and neutral sentiments. One assumption of this model is that the rating of each aspect is based on a particular combination of the positive, neutral, and negative sentiments of that particular aspect. The result is that the overall aspect rating depends upon how many times an aspect has been associated with positive, neutral and negative sentiments in a single customer review. Furthermore, there is also an overall rating that is assigned to each review by the contributor [6].
Research Methodology
The following project is based on identifying potential customers for a particular product. This project will be implemented using the python programming language. For machine learning techniques we will use K-means clustering with ANFIS. The algorithm used for the project is very essential. Segmentation is the process of dividing customers into various groups for targeted selling. This data analytics project can help sellers a lot in many ways. The sellers can know about the customer’s mentality hence increasing the market for the sellers. The algorithm is custom developed for the features trained and the data management is managed through SQL.
A. K-Means Clustering
“K-means clustering algorithm is one of the clustering algorithms based on division. It adopts a heuristic iterative process to re-divide data objects and re-update cluster centers. The basic idea of the algorithm is: suppose a set with element objects and the number of clusters to be generated [2]. In the first round, a sample element is randomly selected as the initial cluster center [6], and the distance between other sample elements and the center point is analyzed the clusters are respectively divided according to the distance. In each of the following rounds, the iterative operation of the above steps is continuously performed, and the average value of the element objects obtained this time is taken as the center point of the next round of clustering until the condition that the clustering center point no longer changes in the iteration process is met. The specific processing steps are as follows”:
Fig. 1 K-Means Algorithm
B. Adaptive Neuro-Fuzzy Inference System (ANFIS)
“Fuzzy logic and neural network are widely used in prediction problems. ANFIS is a useful technique based on fuzzy logic and neural network approaches [27]. It takes the advantages of both fuzzy set theory in applying rule-based systems and neural networks in automatic learning from data. A fuzzy inference system in the ANFIS technique consists of if–then rules, membership functions, inference mechanism (called fuzzy reasoning), and couples of input–output. In Figure 2, a structure of a fuzzy inference system is presented. As seen from this figure, in the first step, the inputs are fuzzified to produce their degrees of truth. In the second step, the degree of truth of the consequents is obtained by combining this information through inference rules. In the last step, final output is obtained by defuzzification”.
Fig. 2 A structure of a fuzzy inference system for revealing customer satisfaction
References
1. Dullaghan, C., & Rozaki, E. (2017). Integration of machine learning techniques to evaluate dynamic customer segmentation analysis for mobile customers. arXiv preprint arXiv:1702.02215.
2. Ezenkwu, C. P., Ozuomba, S., & Kalu, C. (2015). Application of K-Means algorithm for efficient customer segmentation: a strategy for targeted customer services.
3. Smeureanu, I., Ruxanda, G., & Badea, L. M. (2013). Customer segmentation in private banking sector using machine learning techniques.
4. Monil, P., Darshan, P., Jecky, R., Vimarsh, C., & Bhatt, B. R. (2020). Customer Segmentation Using Machine Learning. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 8(6), 2104-2108.
5. Kansal, T., Bahuguna, S., Singh, V., & Choudhury, T. (2018, December). Customer segmentation using K-means clustering. In 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS) (pp. 135-139). IEEE.
6. Hiziroglu, A. (2013). Soft computing applications in customer segmentation: State-of-art review and critique. Expert Systems with Applications, 40(16), 6491-6507.
7. Tsiptsis, K. K., & Chorianopoulos, A. (2011). Data mining techniques in CRM: inside customer segmentation. John Wiley & Sons.
8. Hung, P. D., Lien, N. T. T., & Ngoc, N. D. (2019, March). Customer segmentation using hierarchical agglomerative clustering. In Proceedings of the 2019 2nd International Conference on Information Science and Systems (pp. 33-37).
9. Ozan, Ş. (2018, September). A case study on customer segmentation by using machine learning methods. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP) (pp. 1-6). IEEE.
10. Wu, S., Yau, W. C., Ong, T. S., & Chong, S. C. (2021). Integrated churn prediction and customer segmentation framework for telco business. IEEE Access, 9, 62118-62136.
1