Humanrecognition Using ‘Pso-Ofa’ In Low Resolution Videos K.Ranga

Abstract: In present scenario, tracking of target in videos with low resolution is most important task. The problem aroused due to lack of discriminatory data that have low visual visibility of the moving objects. However, earlier detection methods often extract explanations around fascinating points of space or exclude mathematical features in moving regions, resulting in limited capabilities to detect better video functions. To overcome the above problem, in this paper a novel method which recognizes a person from low resolution videos is proposed. A Three step process is implemented in which during the first step, the video data acquired from a low-resolution video i.e. from three different datasets. The acquired video is divided into frames and converted into gray scale from RGB. Secondly, background subtraction is performed using LBP and thereafter Histogram of Optical Flow (HOF) descriptors is extracted from optical flow images for motion estimation. In the third step, the eigen features are extracted and optimized using particle swarm optimization (PSO) model to eliminate redundant information and obtain optimized features from the video which is being processed. Finally to find a person from low resolution videos, the features are classified by Support Vector Machine (SVM) and parameters are evaluated. Experimental results are performed on VIRAT, Soccer and KTH datasets and demonstrated that the proposed detection approach is superior to the previous method.


Introduction
Over the past few decades, many efforts are made in the area of motion detection and compliance to make the following reliable, robust and effective: video surveillance, robots, authentication system, media production, biological research, etc [1]. But there are many challenges that create barriers to improving these applications. These challenges can include lighting change, dynamic background, concealment, space closure, shadow etc as in [2]. For object tracking in low resolution videos these barriers become more severe. In a low-resolution video it is very difficult to find exactly what you like because a lot of discriminatory information such as visual and original material is lost. It leads to the wrong pursuit that continues to lead to the discovery of a malfunctioning event. However, there are some advantages to using low-resolution video as it requires less storage, time of transfer and time interval [3]. Most common tracking methods are based on high resolution video (HR) to extract a straight line as in [4] and stand as in [5] target books. But these methods require additional calculation costs because they work in high-resolution frames. Other methods in the literature use videos with lower resolution as input but over time these videos are developed to higher resolution with the help of higher resolution techniques, proving that they are less expensive. In unconventional event detection documents, many methods such as [6], [7] use classification categories to identify events and do not use low-resolution video input. These dividers require study time and careful attention to the training database. Other methods such as [8] require manual setup at the start of the event's default program and have high computer costs. From the above literature it is observed that a new effective algorithm to be developed for detection of objects in low resolution videos which help in fully automated monitoring system.

Related Work
Unsaved field monitoring systems are required to have the ability to process accurate image and real-time to detect and track the moving objects. For tracking a system the accurate acquisition process of moving objects is one of the necessary requirements. While monitoring performance is guaranteed, it is challenging to meet realtime requirements in highly customized squares with a large viewing platform. In addition, in active field conditions, complex background, lighting changes, local movements such as tree shaking, dust tracking, concealment and more make the system problematic.
With the acquisition of moving motions, existing technological approaches in particular include light flow [9], [10], background output [11], frame differences [12] and in-depth learning methods [13]. In many low resolution video surveillance systems these modes can only detect moving objects without having to find a category and direct links to each object which is in moving state. In [14],removal of features and classifications techniques was included to address the above problem. However, the critical reversal of this approach cannot be overlooked is that the regions of the object obtained by the moving acquisition cannot be classified as a single object or group of objects.The effect of separation of foreground and background images is disturbed due to non availability of direct access to the moving objects.
Typically, each video frame is split front and back to get moving objects depending on the type of pixel or color distribution. In the literature [15], [16], researchers proposed a variety of artificial extraction methods. As a widely used method, to judge the movement the frames are considered, the difference between the frames are based on gray level difference the video frames which are adjacent to each other. Apart from modeling the background, it is easy to use, as the local movements in the backgrounds are prone to noise or complex scenes. Meanwhile, the Gaussian composite (GMM) model [17] is very powerful, but requires a lot of modeling and rear modeling frames respectively, which has high computer cohesion and difficulty handling video frames with varying brightness, something that moves frequently and hides.
After the acquisition process, regional links can be obtained by connecting regional algorithms. By taking a photo input several times, scan face techniques [18] paste the label on each pixel and separate the target according to the label. For faster detection, methods designed to prevent [19] were introduced. However, these algorithms still use a lot of computer resources and are unable to integrate a broken object into sound.
In [23] Ranganarayanaet.al, presented two efficient algorithms for motion detection in low resolution videos captured from surveillance data so as to provide privacy. But these algorithms have their own drawbacks. One being the Standard FCM algorithm doesn't consider any spatial information in the image context due to which the data becomes sensitive to noise and other imaging artifacts. Secondly the mean shift algorithm is only based on colour feature, which exhibits performance limited to partial occlusions.
In [24] the authors made a detailed comparative study on various approaches, datasets and applications with respect to human identification in low resolution videos which reveals certain facts and challenges to be addressed. Methods like spatio-temporal based and Optical Flow analysis were also discussed which play a vital role and could be used in various applications.

Proposed Methodology
The proposed framework comprises various activities i.e. dividing the video into frames, background subtraction, Feature extraction, feature processing, finally classification and human recognition. The activities and associated techniques are presented in figure 1.

A.Dataset
One of the most important dataset which is used in improving of vision based community is VIRAT video dataset. The dataset is designed very realistic and challenging for video processing domains, as it provides low resolution videos and recognition of persons in video task will be performed. Virat dataset provide videos sequences of length from 0.5sec to 5min for performing experimental evaluations. In this paper, we have considered 10000 video sequences with a frame rate of 30F/sec. The size of frame is 1280pixels in width and 720pixels in height [22].
The KTH database [23] is the largest and most common data commonly used in the signaling classification of human actions. The database contains activities like walking, running, jogging etc which are performed on several topics and different situations. The dataset consists of 600 videos in every category, six different actions and four situations mentioned. Every sequence as a unique background with vertical camera of 25fps (frames per second). The sequence of video is reduced to 5sec of length. The size of frame is 160pixels in width and 120pixels in height.
The soccer dataset consist of four activities i.e. kicking, running, walking and dribbling. The available videos in this dataset are 255. The sequence of video is reduced to 3sec of length and 28 frames per second. The size of frame is 150pixels in width and 150pixels in height.

B. Back ground Subtraction (BS)
The videos obtained from the datasets are divided into frames and perform background subtraction. The construction of background is one of the important steps while performing background subtraction. The performance of BS purely depends on the features which are selected for background modeling. This back ground subtraction is done using local binary pattern (LBP) operator.
LBP is operated by taking difference of center pixel with the other neighbor pixels. Thishelps in extracting the visual descriptors. The mathematical evaluation is given as, (1) Here, Rr is --the radius and Sp is the sampling point.
is the neighbour pixel and is the center pixel. The coordinates is taken as (0,0). The coordinates is taken as ( + (2 / ), − (2 / ) ) In the process of BS, the first frame of video is used as background model and for every pixel in the frame we calculate LBP using equation 1. The next frame LBP is compared with background model LBP. If the value is similar for any of the background model LBP then the pixel is classified as background else the pixel will be classified as foreground.

C. Feature Extraction
The motion of objects in a video sequence is characterized and unified using optical flow estimation method. It is widely used technique for tracking of systems, detection of objects and specific needs that are required. In this HOF features are extracted from the motion range images which grasp the information that is available in form of dynamic. The algorithm also evaluates the vectors fields which are very much useful in identifying the spatial movements of images over a period of time and provides necessary information for motion analysis [20].
In order to identify the required object in the low resolution videos, the flow constraint evaluation done using following equations:

Fig2. Process of Optical Flow motion Estimation
The Eigen values of the features need to be determined. Based on number of blocks per frame and number of frames per volume Eigen values are evaluated. The changes observed in the pixels of a particular direction need to be identified to calculate the Eigen values. The highest change observed in Eigen values is termed as Eigen vector in that direction. Hence these Eigen value and Eigen vectors are calculated in every direction for all the frames in the video. These eigen features helps in identifying the person.

D. Feature Processing
In this preprocessing technique modified particle swarm optimization (MPSO) is considered for optimizing the feature obtained using feature extraction process. The features are selected randomly using modified PSO. These randomly selected features will be given to fitness function calculated using equation 2.
( , ) = exp (− ‖ − ‖ 2 ) , > 0 Where is a radial basic function (RBF) SVM kernel and are training vectors ( ≠ , ≠ ) and is kernel parameter. Warping Scaling = 2. 2 (12) b is linearly decreased from 2 to 0 over the course of iterations and r1, r2 are random vectors. To generate a better position, the information of best position P is used in PSO. In this the best position is considered as best features. = 1 + 2 2 (13) E. Classification Technique The features obtained using optimization technique is divided into training and testing data. The data is retrieved for n number ofiterations. The use of machine learning technique in the final stage helps to improve the recognition system. In this paper, proposed support vector machine (SVM) classifier. The SVM classifiers usehyper-plane for performing its operation. The dataset frames are trained with both positives and negatives and are divided using the hyper plane. Later the SVM makes decision by providing test dataset. Parameters are selected based on K-fold cross-validation. Division of dataset considered for training is 70% and 30% for testing using SVM non-linear RBF kernel. Hence, person identification is done. The performance metrics are evaluated from the output retrieved from SVM.

Results and Discussion
The evaluation of proposed methodology is done using KTH Dataset, Soccer Dataset and VIRAT Dataset.       The accuracy obtained for KTH dataset using optimization technique is 90.51% which is 5% higher than without optimization technique.

Conclusion
The problem of recognition of person in low resolution videos is studied in this paper as a part of our research work. By using optical flow motion estimation, Eigen values and particle swarm optimization techniques, the underlying problem of person recognition has been improved. Feature extraction technique plays a key role in identifying the person. The optimization techniques help in obtaining the global best features. The improvement is shown using the calculated parameters. The performance of proposed methodology is executed on KTH video dataset, VIRAT dataset and Soccer dataset. By considering different datasets, the accuracy of proposed aspect has been evaluated. Soccer dataset contains action videos and we performed recognition of the person. The proposed PSO-OFA technique outperforms well. Further we can combine more features with the existing ones for improving the performance of human recognition.