News
A short and simplified guide to vehicle data analytics
Digitalization is an important driver of innovation in the automotive industry. Vehicles have become computers on four wheels and are equipped with a variety of sensors generating huge amounts of vehicle data such as vehicle position, rotation, acceleration, vehicle speed, wheel speed, engine oil temperature, or steering wheel angle, to name just a few examples. While these vehicle data are mainly used to ensure the functionality of vehicles and safety of driving, the data generated can also enable novel products and services that extend beyond the vehicle sector. If these vehicle data are linked with other contextual, geo-spatial data, such as the respecting weather during diving or the traffic situation, even more interesting and potentially valuable services would be possible.
Generating value out of vehicle data is a challenging task: For this purpose, vehicle data analytics has become an important technique in identifying the value of generated vehicle data. However, to exploit this value in products and services, several steps must be performed, and a number of (not only technical) challenges have to be solved.
In the beginning, an appropriate analytics question must be identified such as e.g. identify the driving style of the driver from vehicle data, detect the road surface quality, identify potholes on roads, or predict the engine’s wear.
Then, vehicle data must be captured: Three different approaches for data capturing are possible: the installation/use of own sensors within the vehicle to record vehicle movements and other contextual information, the connection of a vehicle data logger to the vehicle’s on board diagnostic (OBD) interface to capture vehicle data such as vehicle speed or RPM, or the installation of a professional Controller Area Network (CAN) logger to obtain even more vehicle data from the vehicle’s CPUs such as for example the state of vehicle assistance systems or the steering wheel angle. While the first option is probably the simplest one, it can only record contextual data and track the movement of the vehicle, but it does not allow access to vehicle sensors. The second option can provide already access to some vehicle sensor data such as vehicle speed or engine temperature, which is relevant for testing whether the vehicle's emissions are still within tolerance. The third option in theory provides access to all vehicle sensor signals, but only if the device listening to the CAN bus can decode the streamed raw CAN bus data to readable data, requiring either the vehicle manufacturer or the respective vehicle CPU manufacturer to provide the necessary decoding information (usually referred to as CAN-DBC files).
Figure 1: CAN DBC files (Source: CSS electronics)
Different data loggers may store the data in different formats. Typically, they can collect multiple signals at once, which are all transmitted on the same wire. Thus, the logger needs to know and save at least three different properties of the data: What was measured, what was its value and when was it measured. This naturally leads to a tabular format very similar to the example depicted in figure 2.
While this format is convenient for the logger to store data, it is much less suited for analyses of the data. There are three main difficulties: First, several signals are mixed together in one column, creating the need for grouping and filtering even before very simple operations. Second, there can be multiple signals that were measured at the same time, requiring the analyst to investigate multiple rows at once to check a single instance in time. The third difficulty lies in the varying sampling rates of the signals. Each signal may have been captured with a different rate and even within a single signal, smaller deviations of the sampling rate are possible and common. Clearly pre-processing of the captured vehicle data is in needed to make it better explorable for data analysts.
Figure 2: Vehicle raw data structure (example)
After the required vehicle data is stored, a series of further steps must be performed to prepare the data for analysis. This data (pre-)processing process can be quite comprehensive and depends very much on the respective analysis question to be solved, e.g. the detection of potholes from vehicle data. A crucial step in this process is the alignment of the coordinate axis of data logger and vehicle. Many signals are vector valued, with acceleration as the maybe most prominent example. To simplify analyses and interpretations, it is highly desirable to express these vectors in the reference frame of the car, i.e. x-Acceleration should be the component in the x-direction of the car / the driving direction. In general, one cannot assume that the logger was mounted such that its internal coordinate system corresponds to the one of the vehicle. This is especially true when cheap devices that are mounted by end-users are used. Any misalignment of the reference frames needs to be detected and corrected prior to analysis.
As with most other analyses, vehicle data signals should be searched for missing values, wrong values, and outliers and these should be removed. Some signals may contain a lot of noise and must be smoothed. To separate the signals into different columns the data should be transformed using the ‘signal name’ as pivot. Simultaneously, it makes sense to resample each signal to a common sampling rate from the analysis’ viewpoint. The “right” sampling rate again depends on the question the be answered. The result is than in a similar form as depicted in figure 3. Now each row corresponds exactly to a point in time and the time interval between the rows is constant, in this example 0.1s / 10Hz.
Figure 3: Vehicle pre-processed data structure
The data prepared in this way can now be used to work on the vehicle data analysis question and/or to search for interesting events (such as potholes for example). Depending on the type of event, multiple signals can be relevant. Events should usually be post-processed to combine separate events, which are only divided by a short-time interruption, into a single event. The recorded events may be linked with weather and position data, so that for each event the time and place of occurrence as well as the prevailing weather is known.
For different types of events, different detection methods need to be employed. One can detect a pothole event (driving over a pothole) by investigating acceleration values and rotation rates as follows: Consider the acceleration normal to the road, as well as the vehicle’s rotation around its lateral axis (‘pitch’) The acceleration readings will exhibit a distinct spike, while a certain pattern is simultaneously visible in the rotation rate: When the front tyres are in the pothole, the front of the vehicle is lower than the rear, if the rear tyres are in the pothole, it is vice versa, causing a rotation around the lateral axis. This results in a typical "pitch" movement that can be detected. In a last step, the results of the analysis, in this case the detected potholes, can be visualised on a map. In our case this supports drivers in not choosing bad roads, or support road operators in better maintaining roads.
Literature
Alexander Stocker, Christian Kaiser, Michael Fellmann (2017), Quantified Vehicles - Novel Services for Vehicle Lifecycle Data, Michael Fellmann (2017), Business & Information Systems Engineering: Vol. 59: Iss. 2, 125-130. (Link: https://aisel.aisnet.org/bise/vol59/iss2/5/)
Christian Kaiser, Alexander Stocker, Andreas Festl, Gernot Lechner, Michael Fellmann (2018), A Research Agenda for Vehicle Information Systems, European Conference on Information Systems, Research-in-Progress Papers. 33. (Link: https://aisel.aisnet.org/ecis2018_rip/33/)
Christian Kaiser, Alexander Stocker, Michael Fellmann (2019), Understanding Data-driven Service Ecosystems in the Automotive Domain, Proeedings of Americas Conference on Information Systems. (Link: https://aisel.aisnet.org/amcis2019/org_transformation_is/org_transformation_is/14/)
Christian Kaiser; Andreas Festl; Gernot Pucher; Michael Fellmann and Alexander Stocker (2019), The Vehicle Data Value Chain as a Lightweight Model to Describe Digital Vehicle Services, Proceedings of the 15th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, 68-79, 2019, Vienna, Austria (Link: https://www.scitepress.org/PublicationsDetail.aspx?ID=+/eR5km7R90=&t=1
1. CAN DBC File - Convert Data in Real Time (Wireshark, J1939), https://www.csselectronics.com/screen/page/dbc-database-can-bus-conversion-wireshark-j1939-example/language/en