Browsing by Author "Naidoo, Bashan."

Now showing 1 - 13 of 13

Achieving diversity through collaborative planning in mixed used precincts : a case study of Florida Road, Durban.
(2015) Cele, Desiree Noshipo.; Naidoo, Bashan.
Entertainment precincts are typically packed with a mix of retail, art galleries, nightclubs, restaurants and even places of worship. The mix of people and land uses creates a diverse and dynamic area which has economic and social benefits. However, for an entertainment precinct like Florida Road in Durban there seems to be difficulty in communicating some of the resulting social issues. The unclear role of stakeholder input renders communication inefficient in the precinct. This study therefore sought to understand the social impact of mixed use development using Florida road as the case study. Both quantitative and qualitative research approaches were employed using a sample of 66 respondents to capture firstly; the everyday life perspectives of the residents and the visitors/users of Florida Road, secondly; to examine and clarify the extent of the social impact resulting from the changes in patterns of land use and finally, to understand the processes followed to redevelop Florida Road. A land use survey of the Florida Road corridor which when compared with the land use pattern in 2007 revealed some changes in the land use pattern and the introduction of high intensity land use activity such as nightclubs in close proximity with residential land uses. The results from the surveys and interviews with property owners, business management, precinct manager and municipal officials showed that while precinct management has made commendable physical progress since its inception in 2012 there are underlying challenges. This paper argues for the need for intimate collaboration and examines available knowledge which could assist in guiding and analyze stakeholders, bureaucratic fragmentation and citizen participation in South African spatial planning. The case study appraises consistent collaborative planning in the decision-making processes in order to enable communities and local government to communicate effectively without squandering opportunities to diversify.
Adaptive sedimentation and patch optimization for multi-viewed stereo reconstruction.
(2015) Khuboni, Ray Leroy.; Naidoo, Bashan.
This dissertation presents two main contributions towards the Patch-based Multi-View Stereo (PMVS) algorithm. Firstly, we present an adaptive segmentation method for preprocessing input data to the PMVS algorithm. This method applies a specially developed grayscale transformation to the input to redefine the intensity histogram. The Nelder- Mead (NM) simplex method is used to adaptively locate an optimized segmentation threshold point in the modified histogram. The transformed input image is then segmented using the acquired threshold value into foreground and background data. This segmentation information is thus applied to the patch-based method to exclude the background artefacts. The results acquired indicated a reduction in cumulative error whilst achieving relatively similar results with a beneficial factor of reduced time and space complexity. Secondly, two improvements are made to the patch optimisation stage. Both the optimisation method and the photometric discrepancy function are changed. A classical quasi-newton BFGS method with stochastic objectives is used to incorporate curvature information into stochastic optimisation method. The BFGS method is modified to introduce stochastic gradient differences, whilst regularising the Hessian approximation matrix to ensure a well-conditioned matrix. The proposed method is employed to solve the optimisation of newly generated patches, to refine the 3D geometric orientation and depth information with respect to its visible set of images. We redefine the photometric discrepancy function to incorporate a specially developed feature space in order to address the problem of specular highlights in image datasets. Due to this modification, we are able to incorporate curvature information of those patches which were deemed to be depleted in the refinement process due to their low correlation scores. With those patches contributing towards the refinement algorithm, we are able to accurately represent the surface of the reconstructed object or scene. This new feature space is also used in the image feature detection to realise more features. From the results, we noticed reduction in the cumulative error and obtained results that are denser and more complete than the baseline reconstruction.
Artificial intelligence based design optimization for improving diversity in wireless links.
(2021) Solwa, Shaheen.; Naidoo, Bashan.; Quazi, Tahmid Al-Mumit.
Abstract available in PDF.
Gaussian mixture model classifiers for detection and tracking in UAV video streams.
(2017) Pillay, Treshan.; Naidoo, Bashan.
Manual visual surveillance systems are subject to a high degree of human-error and operator fatigue. The automation of such systems often employs detectors, trackers and classifiers as fundamental building blocks. Detection, tracking and classification are especially useful and challenging in Unmanned Aerial Vehicle (UAV) based surveillance systems. Previous solutions have addressed challenges via complex classification methods. This dissertation proposes less complex Gaussian Mixture Model (GMM) based classifiers that can simplify the process; where data is represented as a reduced set of model parameters, and classification is performed in the low dimensionality parameter-space. The specification and adoption of GMM based classifiers on the UAV visual tracking feature space formed the principal contribution of the work. This methodology can be generalised to other feature spaces. This dissertation presents two main contributions in the form of submissions to ISI accredited journals. In the first paper, objectives are demonstrated with a vehicle detector incorporating a two stage GMM classifier, applied to a single feature space, namely Histogram of Oriented Gradients (HoG). While the second paper demonstrates objectives with a vehicle tracker using colour histograms (in RGB and HSV), with Gaussian Mixture Model (GMM) classifiers and a Kalman filter. The proposed works are comparable to related works with testing performed on benchmark datasets. In the tracking domain for such platforms, tracking alone is insufficient. Adaptive detection and classification can assist in search space reduction, building of knowledge priors and improved target representations. Results show that the proposed approach improves performance and robustness. Findings also indicate potential further enhancements such as a multi-mode tracker with global and local tracking based on a combination of both papers.
Human motion reconstruction fom video sequences with MPEG-4 compliant animation parameters.
(2005) Carsky, Dan.; Naidoo, Bashan.; McDonald, Stephen A.
The ability to track articulated human motion in video sequences is essential for applications ranging from biometrics, virtual reality, human-computer interfaces and surveillance. The work presented in this thesis focuses on tracking and analysing human motion in terms of MPEG-4 Body Animation Parameters, in the context of a model-based coding scheme. Model-based coding has emerged as a potential technique for very low bit-rate video compression. This study emphasises motion reconstruction rather than photorealistic human body modelling, consequently a 3-D skeleton with 31 degrees-of-freedom was used to model the human body. Compression is achieved by analysing the input images in terms of the known 3-D model and extracting parameters that describe the relative pose of each segment. These parameters are transmitted to the decoder which synthesises the output by transforming the default model into the correct posture. The problem comprises two main aspects: 3-D human motion capture and pose description. The goal of the 3-D human motion capture component is to generate 3-D locations of key joints on the human body without the use of special markers or sensors placed on the subject. The input sequence is acquired by three synchronised and calibrated CCD cameras. Digital image matching techniques including cross-correlation and least squares matching are used to find spatial correspondences between the multiple views as well as temporal correspondences in subsequent frames with sub-pixel accuracy. The tracking algorithm automates the matching process examining each matching result and adaptively modifying matching parameters. Key points must be manually selected in the first frame, following which the tracking commences without the intervention of the user, employing the recovered 3-D motion of the skeleton model for prediction of future states. Epipolar geometry is exploited to verify spatial correspondences in each frame before the 3-D locations of all joints are computed through triangulation to construct the 3-D skeleton. The pose of the skeleton is described by the MPEG-4 Body Animation Parameters. The subject's motion is reconstructed by applying the animation parameters to a simplified version of the default MPEG-4 skeleton. The tracking algorithm may be adapted to 2-D tracking in monocular sequences. An example of 2-D tracking of facial expressions demonstrates the flexibility of the algorithm. Further results involving tracking separate body parts demonstrate the advantage of multiple views and the benefit of camera calibration, which simplifies the generation of 3-D trajectories and the estimation of epipolar geometry. The overall system is tested on a walking sequence where full body motion capture is performed and all 31 degrees-of freedom of the tracked model are extracted. Results show adequate motion reconstruction (i.e. convincing to most human observers), with slight deviations due to lack of knowledge of the volumetric property of the human body.
An improved randomization of a multi-blocking jpeg based steganographic system.
(2010) Dawoud, Peter Dawoud Shenouda.; Peplow, Roger Charles Samuel.; Naidoo, Bashan.
Steganography is classified as the art of hiding information. In a digital context, this refers to our ability to hide secret messages within innocent digital cover data. The digital domain offers many opportunities for possible cover mediums, such as cloud based hiding (saving secret information within the internet and its structure), image based hiding, video and audio based hiding, text based documents as well as the potential of hiding within any set of compressed data. This dissertation focuses on the image based domain and investigates currently available image based steganographic techniques. After a review of the history of the field, and a detailed survey of currently available JPEG based steganographic systems, the thesis focuses on the systems currently considered to be secure and introduces mechanisms that have been developed to detect them. The dissertation presents a newly developed system that is designed to counter act the current weakness in the YASS JPEG based steganographic system. By introducing two new levels of randomization to the embedding process, the proposed system offers security benefits over YASS. The introduction of randomization to the B‐block sizes as well as the E‐block sizes used in the embedding process aids in increasing security and the potential for new, larger E‐block sizes also aids in providing an increased set of candidate coefficients to be used for embedding. The dissertation also introduces a new embedding scheme which focuses on hiding in medium frequency coefficients. By hiding in these medium frequency coefficients, we allow for more aggressive embedding without risking more visual distortion but trade this off with a risk of higher error rates due to compression losses. Finally, the dissertation presents simulation aimed at testing the proposed system performance compared to other JPEG based steganographic systems with similar embedding properties. We show that the new system achieves an embedding capacity of 1.6, which represents round a 7 times improvement over YASS. We also show that the new system, although introducing more bits in error per B‐block, successfully allows for the embedding of up to 2 bits per B‐block more than YASS at a similar error rate per B‐block. We conclude the results by demonstrating the new systems ability to resist detection both through human observation, via a survey, as well as resist computer aided analysis.
Investigating the combined appearance model for statistical modelling of facial images.
(2007) Allen, Nicholas Peter Legh.; Naidoo, Bashan.; McDonald, Stephen A.
The combined appearance model is a linear, parameterized and flexible model which has emerged as a powerful tool for representing, interpreting, and synthesizing the complex, non-rigid structure of the human face. The inherent strength of this model arises from the utilization of a representative training set which provides a-priori knowledge of the allowable appearance variation of the face. The model was introduced by Edwards et al in 1998 as part of the Active Appearance Model framework, a template alignment algorithm which used the model to automatically locate deformable objects within images. Since this debut, the model has been utilized within a plethora of applications relating to facial image processing. In essence, the ap pearance model combines individual statistical models of shape and texture variation in order to produce a single model of correlations between both shape and texture. In the context of facial modelling, this approach produces a model which is flexible in that it can accommodate the range of variation found in the face, specific in that it is restricted to only facial instances, and compact in that a new facial instance may be synthesized using a small set of parameters. It is additionally this compactness which makes it a candidate for model based video coding. Methods used in the past to model faces are reviewed and the capabilities of the statistical model in general are investigated. Various approaches to building the intermediate linear Point Distribution Models (PDMs) and grey-level models are outlined and an approach decided upon for implementation. The respective statistical models for the Informatics and Modelling (IMM) and Extended Multi-Model Verification for Teleservices and Secu- rities (XM2VTS) facial databases are built using MATLAB in an approach incorporating Procrustes Analysis, Affine Transform Warping and Principal Components Analysis. The MATLAB implementation's integrity was validated against a similar approach encoun tered in literature and found to produce results within 0.59%, 0.69% and 0.69% of those published for the shape, texture and combined models respectively. The models are consequently assessed with regard to their flexibility, specificity and compactness. The results demonstrate the model's ability to be successfully constrained to the synthesis of "legal" faces, to successfully parameterize and re-synthesize new unseen images from outside the training sets and to significantly reduce the high dimensionality of input facial images to produce a powerful, compact model.
Markerless pose tracking of a human subject.
(2012) Hendry, Neil.; Naidoo, Bashan.
High capacity wireless and xed-line broadband services have a relatively small footprint over South Africa's vast expanse. This results in many rural areas, as well as military communication when deployed, relying on low-bandwidth communication networks instead, making live video communication over these links impractical. Traditional and advanced data compression methods cannot produce the payload reduction required for video use over these bandwidths. Instead, a model-based vision system is used to address this problem. This is not video compression but rather image understanding and representation in the context of prior models of the observed object. Markerless human tracking and pose recovery are the specific interests of this research. Markerless human pose tracking is a relatively new and growing field of image processing. It has many potential areas of application apart from low-bandwidth video communication, including the medical field, sporting arena, security and surveillance and human-machine interaction. As multimedia technologies continue to grow and improve, pose tracking systems have the potential to be used more and more. While a few markerless tracking devices are beginning to emerge, many currently available commercial motion capture systems require the use of a special suit and markers or sensors. This makes them very impractical for easy everyday, anywhere use. Current research in computer vision and image processing incorporates a significant focus on the development of markerless approaches to human motion capture. This dissertation looks at a complete markerless human pose tracking system which can be split into four distinct but interlinking stages: the image capture, image processing, body model and optimisation stages. After video data from multiple camera views is captured, the processing stage extracts image cues such as silhouettes, 2-D edges and 3-D colour volumetric reconstruction. Following the basic principle of a model-based approach, a 24 degree-of-freedom superellipsoid body model is fitted to the observed image cue data. An objective function is used to measure the closeness of this match. A number of different optimisation approaches are examined for use in refining and finding the best fitting body pose for each image frame. These approaches are all based around Stochastic Meta Descent (SMD) optimisation with SMD by itself, SMD in a hierarchical approach, SMD with pose prediction and Smart Particle Filtering, SMD inside a particle filter framework, all explored. The performance of the system with the various optimisation approaches is tested using the HumanEvaII datasets. These datasets contain a number of different subjects performing a variety of actions while wearing ordinary clothes. They contain markerbased ground-truth data obtained using a ViconPeak motion capture system. This allows a relative error measurement of the predicted poses to be calculated. With its robustness to clutter and occlusion, the Smart Particle Filter approach is shown to give the best results.
Multimodal enhancement-fusion technique for natural images.
(2018) Maharaj, Rivania.; Naidoo, Bashan.
This dissertation presents a multimodal enhancement-fusion (MEF) technique for natural images. The MEF is expected to contribute value to machine vision applications and personal image collections for the human user. Image enhancement techniques and the metrics that are used to assess their performance are prolific, and each is usually optimised for a specific objective. The MEF proposes a framework that adaptively fuses multiple enhancement objectives into a seamless pipeline. Given a segmented input image and a set of enhancement methods, the MEF applies all the enhancers to the image in parallel. The most appropriate enhancement in each image segment is identified, and finally, the differentially enhanced segments are seamlessly fused. To begin with, this dissertation studies targeted contrast enhancement methods and performance metrics that can be utilised in the proposed MEF. It addresses a selection of objective assessment metrics for contrast-enhanced images and determines their relationship with the subjective assessment of human visual systems. This is to identify which objective metrics best approximate human assessment and may therefore be used as an effective replacement for tedious human assessment surveys. A subsequent human visual assessment survey is conducted on the same dataset to ascertain image quality as perceived by a human observer. The interrelated concepts of naturalness and detail were found to be key motivators of human visual assessment. Findings show that when assessing the quality or accuracy of these methods, no single quantitative metric correlates well with human perception of naturalness and detail, however, a combination of two or more metrics may be used to approximate the complex human visual response. Thereafter, this dissertation proposes the multimodal enhancer that adaptively selects the optimal enhancer for each image segment. MEF focusses on improving chromatic irregularities such as poor contrast distribution. It deploys a concurrent enhancement pathway that subjects an image to multiple image enhancers in parallel, followed by a fusion algorithm that creates a composite image that combines the strengths of each enhancement path. The study develops a framework for parallel image enhancement, followed by parallel image assessment and selection, leading to final merging of selected regions from the enhanced set. The output combines desirable attributes from each enhancement pathway to produce a result that is superior to each path taken alone. The study showed that the proposed MEF technique performs well for most image types. MEF is subjectively favourable to a human panel and achieves better performance for objective image quality assessment compared to other enhancement methods.
Parallel patch-based volumetric reconstruction from images.
(2014) Jermy, Robert Sydney.; Naidoo, Bashan.; Tapamo, Jules-Raymond.
Three Dimensional (3D) reconstruction relates to the creating of 3D computer models from sets of Two Dimensional (2D) images. 3D reconstruction algorithms tend to have long execution times, meaning they are ill suited to real time 3D reconstruction tasks. This is a significant limitation which this dissertation attempts to address. Modern Graphics Processing Units (GPUs) have become fully programmable and have spawned the field known as General Purpose GPU (GPGPU) processing. Using this technology it is possible to of- fload certain types of tasks from the Central Processing Unit (CPU) to the GPU. GPGPU processing is designed for problems that have data parallelism. This means that a particular task can be split into many smaller tasks that can run in parallel, the results of which and are not dependent upon the order in which the tasks are completed. Therefore to properly make use of both CPU parallelism and GPGPU processing a 3D reconstruction algorithm with data parallelism was required. The selected algorithm was the Patch-Based Multi-View Stereopsis (PMVS) method, proposed and implemented by Yasutaka Furukawa and Jean Ponce. This algorithm uses small oriented rectangular patches to model a surface and is broken into four major steps: Feature detection; feature matching, expansion and filtering. The reconstructed patches are independent and as such the algorithm is data parallel. Some segments of the PMVS algorithm were programmed for GPGPU and others for CPU parallelism. Results show that the feature detection stage runs 10 times faster on the GPU than the equivalent CPU implementation. The patch creation and expansion stages also benefited from GPU implementation. Which brought an improvement in the execution time of two times for large images, and equivalent execution times for small images, when compared to the CPU implementation. These results show that the use of GPGPU and CPU parallelism can indeed improve the performance of this 3D reconstruction algorithm.
A structure from motion solution to head pose recovery for model-based video coding.
(2005) Heathcote, Jonathan Michael.; Naidoo, Bashan.
Current hybrid coders such as H.261/263/264 or MPEG-l/-2 cannot always offer high quality-to-compression ratios for video transfer over the (low-bandwidth) wireless channels typical of handheld devices (such as smartphones and PDAs). Often these devices are utilised in videophone and teleconferencing scenarios, where the subjects of inte:est in the scene are peoples faces. In these cases, an alternative coding scheme known as Model-Based Video Coding (MBVC) can be employed. MBVC systems for face scenes utilise geometrically and photorealistically accurate computer graphic models to represent head !md shoulder views of people in a scene. High compression ratios are achieved at the encoder by extracting and transmitting only the parameters which represent the explicit shape and motion changes occurring on the face in the scene. With some a priori knowledge (such as the MPEG-4 standard for facial animation parameters), the transmitted parameters can be used at the decoder to accurately animate the graphical model and a synthesised version of the scene (originally appearing at the encoder) can be output. Primary components for facial re-animation at the decoder are a set of local and global motion parameters extracted from the video sequence appearing at the encoder. Local motion describes the changes in facial expression occurring on the face. Global motion describes the three-dimensional motion· of the entire head as a rigid object. Extraction of this three-dimensional global motion is often called head tracking. This thesis focuses on the tracking of rigid head pose in a monocular video sequence. The system framework utilises the recursive Structure from Motion (SfM) method of Azarbayejani and Pentland. Integral to the SfM solution are a large number of manually selected two-dimensional feature points, which are tracked throughout the sequence using an efficient image registration technique. The trajectories of the feature points are simultaneously processed by an extended Kalman filter (EKF) to stably recover camera geometry and the rigid three-dimensional structure and pose of the head. To improve estimation accuracy and stability, adaptive estimation is harnessed within the Kalman filter by dynamically varying the noise associated with each of the feature measurements. A closed loop approach is used to constrain feature tracking in each frame. The Kalman filter's estimate of motion and structure of the face are used to predict the trajectory of the features, thereby constraining the search space for the next frame in the video sequence. Further robustness in feature tracking is achieved through the integration of a linear appearance basis to accommodate variations in illumination or changes in aspect on the face. Synthetic experiments are performed for both the SfM and the feature tracking algorithm. The accuracy of the SfM solution is evaluated against synthetic ground truth. Further experimentation demonstrates the stability of the framework to significant noise corruption on arriving measurement data. The accuracy of obtained pixel measurements in the feature tracking algorithm is also evaluated against known ground truth. Additional experiments confirm feature tracking stability despite significant changes in target appearance. Experiments with real video sequences illustrate robustness of the complete head tracker to partial occlusions on the face. The SfM solution (including two-dimensional tracking) runs near real time at 12 Hz. The limits of Pitch, Yaw and Roll (rotational) recovery are 45°,45° and 90° respectively. Large translational recovery (especially depth) is also demonstrated. The estimated motion trajectories are validated against (publically available) ground truth motion captured using a commercial magnetic orientation tracking system. Rigid reanimation of an overlayed wire frame face model is further used as a visually subjective analysis technique. These combined results serve to confirm the suitability of the proposed head tracker as the global (rigid) motion estimator in an MBVC system.
Towards the development of an electronic nose.
(2003) Naidoo, Bashan.; Broadhurst, Anthony D.
Electronic noses are targeted at determining odour character in a fashion that emulates conscious odour perception in mammals. The intention of this study was to develop an organisational framework for electronic noses and deploy a sample cheese odour discriminator within this framework. Biological olfactory systems are reviewed with the purpose of extracting the organisational principles that result in successful olfaction. Principles of gas handling, chemoreception, and neural processing are considered in the formulation of an organisational framework. An electronic nose is then developed in accordance with the biologically inspired framework. Gas sensing is implemented by an array of six commercially available (tin oxide) semiconductor sensors. These popular gas sensors are known to lack stability thus necessitating hardware and signal processing measures to limit or compensate for instability. An odorant auto-sampler was developed to deliver measured amounts of odorant to the sensors in a synthetic air medium. Each measurement event encodes a simulated sniff, and is captured across six sensor channels over a period of 256 seconds at a sampling rate of 1Hz. The simulated sniff captures sensor base references and responses to odorant introduction and removal. A technique is presented for representation and processing of sensor-array data as a two-dimensional (2D) image where one dimension encodes time, and the other encodes multi-channel sensory outputs. The near optimal, computationally efficient 2D Discrete Cosine Transform (DCT) is used to represent the 2D signal in a decorrelated frequency domain. Several coefficient selection strategies are proposed and tested. A heuristic technique is developed for the selection of transform domain coefficients as inputs to a non-linear neural network based classifier. The benefits of using the selection heuristic as compared to standard variance-based selection are evident in the results. Benefits include: significant dimensionality reduction with concomitant reduction in classifier size and training time, improved generalisation by the neural network and improved classification performance. The electronic nose produced a 99.1% classification rate across a set of seven different cheeses.
Volumetric reconstruction of rigid objects from image sequences.
(2012) Ramchunder, Naren.; Naidoo, Bashan.
Live video communications over bandwidth constrained ad-hoc radio networks necessitates high compression rates. To this end, a model based video communication system that incorporates flexible and accurate 3D modelling and reconstruction is proposed in part. Model-based video coding (MBVC) is known to provide the highest compression rates, but usually compromises photorealism and object detail. High compression ratios are achieved at the encoder by extracting and transmit- ting only the parameters which describe changes to object orientation and motion within the scene. The decoder uses the received parameters to animate reconstructed objects within the synthesised scene. This is scene understanding rather than video compression. 3D reconstruction of objects and scenes present at the encoder is the focus of this research. 3D Reconstruction is accomplished by utilizing the Patch-based Multi-view Stereo (PMVS) frame- work of Yasutaka Furukawa and Jean Ponce. Surface geometry is initially represented as a sparse set of orientated rectangular patches obtained from matching feature correspondences in the input images. To increase reconstruction density these patches are iteratively expanded, and filtered using visibility constraints to remove outliers. Depending on the availability of segmentation in- formation, there are two methods for initialising a mesh model from the reconstructed patches. The first method initialises the mesh from the object's visual hull. The second technique initialises the mesh directly from the reconstructed patches. The resulting mesh is then refined by enforcing patch reconstruction consistency and regularization constraints for each vertex on the mesh. To improve robustness to outliers, two enhancements to the above framework are proposed. The first uses photometric consistency during feature matching to increase the probability of selecting the correct matching point first. The second approach estimates the orientation of the patch such that its photometric discrepancy score for each of its visible images is minimised prior to optimisation. The overall reconstruction algorithm is shown to be flexible and robust in that it can reconstruct 3D models for objects and scenes. It is able to automatically detect and discard outliers and may be initialised by simple visual hulls. The demonstrated ability to account for surface orientation of the patches during photometric consistency computations is a key performance criterion. Final results show that the algorithm is capable of accurately reconstructing objects containing fine surface details, deep concavities and regions without salient textures.