How Big is Big Data?

There has been a lot of buzz around big data lately. The volume of data we’re handling is growing exponentially with the popularity of social media, digital pictures, videos, and data from sources like sensors, legacy documents, weather and traffic systems to name a few. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone, states a report from IBM. According to a report by analyst firm IDG, 70% of enterprises have either deployed or are planning to deploy big data projects and programs this year due to the increase in the amount of data they need to manage.

I’d like to ask you, what is the maximum database size you have seen so far? A professional in database and related subjects may be able to answer this question. But if you do not know anything about databases, it might not be possible for you to answer this.

Let us take the example of Aadhar Card. UIDIA (Unique Identification Authority of India) has issued about 25 Crore Aadhar Cards in India so far.  The size of each card is around 5MB (it includes photo, finger prints, scanning), so the existing database size could be

5*25, 00, 00,000 MB = 1250 TB = 1PB (1000 TB ~ 1Peta Byte)

On an average, UIDIA issues 1M Aadhar Cards each day. So the size of the database increases by 1 Tera Byte per day. The current population of India in 2014 is 1,270,272,105 (1.27 billion), so the minimum size of the database required to store Aadhar Card data would be around 5 Peta Bytes.  Is this database big enough? Probably not.

Facebook has 680 Million active users on a monthly basis. Is this big?

Google receives 6000 million searches per day.Is this big? May be.

Storing 6000 Million records is not a big thing; you can use conventional databases like Oracle to store these many records.  But it can be more interesting that. What if I ask you to store 6000 million search phrases that are searched in Google everyday for two years and at the end of it create a report on the 25 most searched keywords related to “cricket”? This might sound insane. 6000M * 365 * 2 = 4380 Billion Records! Even if you are able to store these many records, how can you perform analysis on this data and create reports?

That is where big data technologies will help you.  Big data does not use RDBMS, SQL queries or conventional databases. Instead, it uses tools like Hadoop, Hive, Map Reduce etc. Map Reduce is a programming paradigm that allows massive job execution scalability against thousands of servers or clusters of servers. Hadoop is by far the most popular implementation of MapReduce. It aggregates multiple sources of data in order to do large scale processing and also reads data from a database in order to run processor-intensive machine learning jobs. Hive is a SQL like bridge that lets conventional BI applications run queries against a Hadoop cluster. It has increased Hadoop’s reach by making it more familiar for BI users.

While Big Data represents all kinds of opportunities for businesses, collecting, cleaning and storing it can be a nightmare. Not only is it difficult to know whether the data is being transmitted properly, but also that the best possible data is being used. Here are some key points to keep in mind while testing big data:

  • Test every entry point in the system (feeds, database, internal messaging, and front end transactions) to provide rapid localization of data issues between entry points.
  • Compare source data with the data landed on Hadoop system to ensure they match.
  • Verify the right data is extracted and loaded into the correct HDFS location.
  • Verification of output data. Validate that processed data remains the same even when executed on a distributed environment.
  • Verify the batch processes designed for data transformation.
  • Verify more data faster.
  • Verification of output data. Validate that processed data remains the same even when executed on a distributed environment.
  • Verify the batch processes designed for data transformation.
  • Automate testing efforts.
  • You should be able to test across different platforms
  • Test data management is the key to effective testing.

The list above is not static and will keep growing as big data keeps getting bigger by the day. Big data amplifies testing concerns intrinsic to databases of any size as well as poses some new challenges. Therefore, a testing strategy is critical for success with big data. Companies that get this right will be able to realize the power of big data for business expansion and growth.

Manoj Pandey | Zen Test Labs

Mistakes Still Prevalent in the 3 Decade Old Testing Industry

‘I make one mistake only once’ –the dream statement of most people, processes and organizations.

The software testing industry being an industry more than 3 decades old is still stuck in the spiral of some mistakes that should have been overcome before they resulted in grave mishaps. Although a lot has been written about the ‘classic mistakes in software testing’, I want to go ahead and highlight the mistakes that are still being overlooked at each phase of testing:

1. Requirements Phase:
We miss on acquiring articulate information. We don’t explore the requirements by using a requirements traceability matrix which is the most ideal system for maintaining the relationship of requirements to design, development, testing and release of software. An RTM also charters out the testable, non-testable and unfrozen requirements- those that have been finalized and signed off.

2. Test Plan
Anybody who has been in software testing knows what importance a test plan holds. It gives a detailed testing effort of the scope of testing, schedule, test deliverables, risks involved and release criteria. However, a number of things that could go wrong are unclear out of scope items and omitting of misplaced assumptions which could result in misdiagnosed estimations. Many a times, under business pressure to release, there is a high tendency to estimate fewer testing cycles which adversely affects the quality of testing.Test design and test data design are also an integral part of the test plan. Faults occurring in them could collapse the test plan to bits, which is possible if the test design is too detailed or too brief, if a wrong template is used or if the expected results are missing. Test data design can cause a major glitch if it is not centralized, rationalized, automatable or hard coded. Ensuring the above pointers makes the process efficient and provides high quality testing.

3. Test Environment
Establishing no separate test environment for testing would result in missing critical defects and the inability to cover business flows. Missing preconditions from the environment will further throttle the process from creating a production like environment which would overlook vital bugs.

4. Test Execution
Mistakes still occurring at the test execution stage would be not optimizing test cases for execution resulting in unnecessary effort which translates to delayed release time, a lack of smoke testing and not prioritizing test cases. Prioritizing test cases guarantees maximum coverage and depth in testing which could be compromised if not done.

These are some of the points that I think are very crucial, my next post will cover more about defect analysis and bug reporting

Zen Test Labs

Building a Test Centre of Excellence: Experiences, Insights and Failures

As organizations mature in their Testing Processes, the perennial quest to achieve ultimate excellence has led them towards attempts to establish the “Test Centre of Excellence” better known as TCoE. Many such initiatives have been plagued with issues ranging from partial implementations to complete abandonment midway. Additionally, most TCoE initiatives find heavy resistance and inertia within teams as it is perceived as a threat to their independence and way of doing things.

At the heart of some of these issues lies poor alignment to business goals, poor ROI analysis prior to investing, poor communication and incorrect choice of models to centralize amongst many others. Drawing from their experience of consulting with organizations on TCoE initiatives and building one for their own, Krishna and Mukesh have written a whitepaper to share insights, experiences and lessons learnt from both successes and failures.

Download the whitepaper to learn how to go about creating your own TCoE while overcoming the common and not so common challenges you will face along the way. Draw on their experience to troubleshoot some of your unique problems.

Communication- A Key to Excel at Testing

Enabling better communication is not a onetime activity. It requires continuous effort across the company. Good internal and external communication is extremely important to a business’s success. In order to work together effectively, there must be clear and coherent communication among all the departments.

Here are a few scenarios wherein communication gaps may arise and lead to poor quality:

1. Continuously Changing Requirements:
At times, requirement changes are implemented directly without updating the specification document. In such cases, there is a chance that the changed requirements remain untested or are tested incorrectly.
Any change in the requirements should be communicated correctly to all stake holders and it is necessary to update the specification document on a timely basis.

2. Configurations:
Lack of clarity from the stakeholders on the configurations to be tested can lead to wasted effort and extra work. Configuration testing can be expensive and time consuming. Investment in hardware and software is required to test the different permutations and combinations. There is also the cost of test execution, test reporting and managing the infrastructure.
Increasing communication between development, QA and stakeholders can help deal with these challenges.

3. Team Size:
When team sizes are large, some members of the team may miss changes in requirements or may not be communicated updates on activities in the project. This could lead to severe problems in the project or project failure.  Each team member should be abreast of the activities in the project through a log or through other means.

4. Changes in  Application Behavior not Communicated:
Continuous changes in the application behavior may lead to requirements being tested incorrectly.  All the functionality implemented in the application should be frozen while testing. If any changes are made to the functionality, they should be communicated to the testing team on a timely basis.

5. Unclear Requirements:
Complex requirements that contain insufficient data may be difficult to understand and therefore, may lead to improper testing. The functional/technical specification documents should be clear and easy to understand; they should contain a glossary, screenshots and examples wherever necessary.

The path to project success is through ensuring that small communication problems are eliminated completely before they build up, so that the message is delivered correctly and completely. Instead of discovering problems, we should figure out how to stop them from appearing in the first place.

Poonam Rathi | Test Consultant | Zen Test Labs

Major Checkpoints for Database Migration Testing

I had the opportunity to work on my very first data migration project a few days ago. The objective of our project was to ensure the correctness and completeness of data migrated from a source system with Oracle as its database to a target system with MS SQL as its database.  I learned a lot while working on this project. I got an opportunity to hone my SQL query writing skills and got well versed with the major checkpoints and activities performed during database migration testing. I wanted to share my learning’s and experience in this post.

23 million data sets had to be migrated from Oracle to SQL. In order to do this, we divided the activities into two phases:

Info1Our job was to test and verify the correctness and completeness of around 23 million data sets at the end of each activity. The real challenge was to perform this job manually without automation. What made it even more interesting was that there’s a chance of up to 5% data loss during data migration.

To achieve our goal, we divided our tasks into 4 major Phases:

Info2

The experience I gained while working on this project can be divided into 3 parts:

  1. Basic skills required for database migration testing
  2. Major checkpoints for database migration testing
  3. Major activities performed during database testing

Basic skills required for database migration testing

Other than testing skills required for all types of testing, one needs following skills for data migration testing

  1. SQL query writing for legacy and target database
  2. Knowledge of the data migration testing tool (if any) being used or advance knowledge of Excel features for data comparison

Major checkpoints for database migration testing, data profile matching

  1. Data profiling
  • Table count
  • Data type matching of columns
  • Identifying different classes of records from business point of view.
  • Performing Check Sum on column holding numerical data
  • Row count matching of legacy and target database
  • Column count matching of legacy and target database
  • Matching of total Null values per column of legacy and target database
  • Matching of Not Null count per column
  • Checking the Distinct count per column
  • Checking the Group By count per column
  • Matching of Summation of numeric fields
  • Checking the Min value per numeric column
  • Checking of Max value per numeric column
  • Matching of Average data value in numeric columns
  • Checking the biggest values in non-numeric columns
  • Checking the shortest values in non-numeric columns

2. Checking  data redundancy
3. Matching  control tables to verify the exact transfer of relationships among tables
4. Functional testing on migrated data
5. Random sampling- Picking and matching data from corresponding tables randomly.

Major activities performed during database testing

  1. Identification and matching of the number of tables in legacy and target databases.
  2. Identification and matching of the data types of columns in corresponding tables.
  3. Identification and matching of the relationship between tables in legacy database and target database.
  4. Identification and matching of business rules in legacy database.
  5. Identification of primary key attribute for each table
  6. Identification of columns with Not Null property
  7. Identification of numeric fields for data profiling
  8. Query writing for both databases to identify all data profiling checkpoints:
    1. Row count
    2. Column count
    3. Null values count per column
    4. Not Null count per column
    5. Distinct values count per column
    6. Group By count per column
    7. Summation of numeric fields
    8. Min value per numeric column
    9. Max value per numeric column
    10. Average value of numeric fields
    11. Biggest values in non-numeric columns
    12. Shortest values in non-numeric columns
  9. Matching the data profiling results of both databases
  10. Query writing for random sampling to ensure that data has been migrated correctly and completely
  11. Using migrated data with the related application to test its functional use

Migrating data is a challenging activity. Users expect it to be fast and efficient, without loss of data.  Thorough data migration testing has to be performed to reduce risk and ensure that data has been migrated as per requirements. I believe that using the points outlined above, you can refine your test approach and ensure a better data migration testing process.

Mayank Raj | Trainee Test Analyst | Zen Test Labs

The Art of Test Automation

Test automation has evolved to become a strategic and integral part of the software development process. Most of us start our test automation careers with record and playback. Over time, some of us move to data-driven test automation, but very few of us move towards the core where in the principles of design and development are applied to test automation. Test automation is like developing a system where test cases are requirements. The depth of thinking and planning that goes into test automation before hitting the record button is similar to developing software

Over the last 10 years, I have seen multiple Fortune clients struggle with automation and some of them eventually getting it right. For some of the projects that failed, we had the best test automation resources and a very stable manual testing practice but in spite of all this there was a huge gap between what was dreamt and what actually got realized. Over the period, we realized that the planning process is a key component for successful test automation. In 65% of the projects that failed the planning process and sequence of steps followed were the reasons.

Based on my experience, the automation process is:

  • Why (Purpose)
  • When (Stable Setup and Manual Process)
  • Which (Tool Selection)
  • What (Test Case Selection)
  • How (Design)

We have written a detailed whitepaper “The Art of Test Automation” based on the test automation process above. Through this white paper, we have attempted to outline how to actually go about automating, planning, prioritizing and using better practices to ensure a lesser risk of complete failure in automation projects.

Some of the important test automation questions that this paper attempts to address:

  • Why automation fails in spite of having technical resources?
  • Is there a standard process to be followed for test automation?
  • When to start and stop automation?
  • Test selection criterion

Download “The Art of Test Automation” to read more about the ideal automation process.

Poonam Rathi | Test Consultant | Zen Test Labs

Using Mind Maps in Testing

Mind maps are an excellent tool and can be used in a variety of testing activities like requirements analysis, test design, test planning, session reports, measuring test coverage etc. Testing relies heavily on communicating stories about what should be tested, how it should be tested, what are the risky areas and so on. Making this process visual can help testing teams articulate their thoughts & ideas better. Drawing mind maps also makes generating new ideas much easier.

Take a look at the simple “Replace” dialogue box below

Dialogue Box

We can easily create a mind map for testing this functionality using the following steps:

  1. Draw the main theme in the centre.
  2. Draw the module name/features of the application branching out from the main theme.
  3. Draw the sub module/feature branching out from each module/feature.
  4. Add colors to your mind map to make it easier for your brain to group things together.
  5. Write test cases for each feature and sub feature.
  6. Include only testable items in your mind map
  7. Try not to use full sentences in your mind map

Mind Map 1

Some examples for creating exclusive mind maps or creating branches in existing mind maps are:

  • Mind maps for field level validation of all fields on the screen
  • Identify fields that are common to all screens and create a ‘Common Fields’ mind map. Eg. Date Field- this field is the same in all screens
  • Mind maps that include business rules
  • Mind maps for events like Mouse Over, Click etc
  • Mind maps based on Screen Names
  • Mind maps based on Functionality

An example Mind Map for validating a subscription form

Subscription Form Mind Map

Ideas for using mind maps in testing:

  • Mind Map Jamming: All the testing team members read /analyze a particular requirement/feature and create a mind map for it together.
  • Using Mind Maps for Defect/ Execution Summary: Create a mind map of test cases. After execution, you can mark (tick or cross) the mind map as per the Actual Result, thus using it to provide Defect/Execution Summary.
  •  Smoke/Sanity Testing: Create a mind map for all the flows that are to be Smoke tested or Sanity tested.
  • Scope: Create a mind map to show what is in Scope and what is not in Scope.

You can use mind maps anywhere and everywhere! Mind maps exist to make your life easy, so if a mind map is getting too big or complicated try splitting it.  The great thing about mind maps is that all test cases are visible in one view; you don’t need to scroll up and down. This also makes it simpler to add new points whenever you want. Mind maps provide more coverage and the likelihood of missing important points is lesser. You cannot use long detailed sentences in mind maps. Using one word per line improves clarity and understanding. It makes recollection easier. Using single keywords will make your mind maps more powerful and flexible.

 Mindmap testing-final

Mind mapping skills improve over time and with practice your mind maps will become more extensive & wide-ranging. Although, mind maps help you simplify information and make it easily understandable, you must not forget that they are ultimately models and therefore, they may leave out important aspects. So make sure that you question what might be missing from the map and add those things. This is quite simple as all you have to do is add another node to the map!

Satish Tilokchandani | Lead Consultant | Zen Test Labs