Convert Unsymmetrical PDF Data to Google Sheets

Using a combination of formulas, you can efficiently clean, format, and organize data imported from a PDF into a spreadsheet, ensuring the results align with your requirements.

Here, we discuss the process of converting unsymmetrical data from a PDF file into a well-organized format in a spreadsheet or Excel. When copying and pasting data into Google Sheets or Excel, we often expect it to align with the desired format. However, in many cases, the data appears disorganized, as illustrated in the accompanying image.

Organizing Unsymmetrical PDF Data in Google Sheets

To resolve this, we apply a combination of Google Sheets formulas to clean, format, and align the data correctly. These formulas help transform the raw, pasted content into a structured format that meets our requirements.

For instance, splitting data using a custom delimiter, such as a space, may result in unsymmetrical output. The values can become misaligned, failing to correspond to the appropriate headers in each row. This inconsistency often requires additional manual adjustments to structure the data properly. Below, we explore formula-based methods to address such issues effectively.

If we copy data and paste in sheet it may be not similar as your preferences.
countLen/result appears unsymmetrical

Here’s a practice sheet to help you explore and understand the process better. Click the link below to access it. To edit, make a copy and save it for your use.

https://docs.google.com/spreadsheets/d/1XQuuIK7zoMmlKbd7wGxv4KKZ5RAkQ7eefTt9lbB-FJA/edit?gid=0#gid=0

To convert PDF file data into a well-formatted table in Google Sheets without losing formatting, especially for table data, here’s a structured approach combining built-in formulas, advanced tools, and manual adjustments:

How to Convert Unsymmetrical PDF Data to Google Sheets

When you paste copied data from a PDF into Google Sheets (e.g., column Q), you may encounter unsymmetrical results where the data appears unformatted or crammed into single cells like below.

Q1:Q4Q
1👇 | Paste PDF Data here | 👇
21 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 56539
33 2021311149S Alkesh Dindral SENIOR ASSISTANT 68000 37700 105700 50412 55288 300.00 0 0 54988
42 2021110866L Raj Rani Devi Supdt Gd-II 83600 60548 144148 54060 90088 300.00 0 0 89788
PDF data copy/paste result shown in the Google Sheets, which is not in a suitable formats, one row table data paste in single cell.

Step-by-Step Guide Using Google Sheets Formulas

The following formulas combination can be used to manipulate, clean, and structure data transferred from a PDF to a spreadsheet, ensuring the final result is well-organized and meets your desired format:

  • IF(logical_expression, value_if_true, value_if_false)
  • LEN(text)
  • FIND(search_for, text_to_search, [starting_at])
  • TRIM(text)
  • SPLIT(text, delimiter)
  • RIGHT(text, [num_chars])
  • QUERY(data, query, [headers])
  • LOWER(text)
  • IFERROR(value, [value_if_error])
  • TEXTJOIN(delimiter, ignore_empty, text1, [text2], …)
  • SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])
  • REGEXREPLACE(text, regular_expression, replacement)
  • REGEXEXTRACT(text, regular_expression)

Step 1: REGEXEXTRACT, REGEXREPLACE and Combination of IF, LEN, SPLIT and LOWER Function

Syntax:

=REGEXEXTRACT(text, regular_expression)

=REGEXREPLACE(text, regular_expression, replacement)

=LEN(text)
=SPLIT(text, delimiter)

=lower(text)

Formula

=REGEXEXTRACT(Q3,"[[:alpha:]]+")
=IFERROR( REGEXEXTRACT(Q3,"[[:alpha:]]+"))

=REGEXREPLACE(Q3,"[^a-zA-Z]", "")
=IFERROR( IF(LEN(T3)=1,REGEXREPLACE(Q3,"[^a-zA-Z]", ""),""))

=IFERROR( IF( LEN(T3)=1, SPLIT( lower(Q3),"qwertyuiopasdfghjklzxcvbnm`-=[]\;',./!@#$%^&*()"),""))

PDF to Sheets data converter

To extract only letter sequences from a text string, the pattern "[[:alpha:]]+" captures those sequences, while the pattern [^a-zA-Z] matches non-alphabetic characters. The LOWER function converts all letters in the text to lowercase. The LEN function counts the total characters in a text string, including spaces. The SPLIT function divides the text into parts based on a chosen delimiter, such as a comma or space.

Step 2: IF, RIGHT, LEN, FIND & SUBSTITUTE, (IF, RIGHT & LEN) and (TRIM &TEXTJOIN)

=IF(V3="","",( RIGHT(V3,LEN(V3)-FIND("*",SUBSTITUTE(V3," ","*",LEN(V3)-LEN(SUBSTITUTE(V3," ",""))))))&""&T3)

=IF(U3="","", RIGHT(U3, LEN(U3)-1))

=TRIM( TEXTJOIN(" ",1,W3:AH3))

PDF to Sheets data convert step 3
  • The IF formula checks a condition and returns one value if the condition is true and another if it’s false.
  • FIND locates the position of a substring within a text string, returning the starting character’s position. It’s case-sensitive and often used to extract or manipulate parts of a string.
  • The RIGHT function extracts a specified number of characters from the end of a text string. It’s helpful for isolating portions of a string, such as the last few digits of a code or ID.
  • SUBSTITUTE replaces occurrences of a specific substring within a text string with another substring. It’s useful for modifying text, such as correcting or updating specific parts of a string.
  • TRIM removes all leading, trailing, and excess spaces from a text string, leaving only single spaces between words.

Step 3: TEXTJOIN

TEXTJOIN combines multiple text strings into one, using a specified delimiter to separate them. Here cells in the range AM3:AO3 are simplifies the process of combining text, especially when working with ranges that may contain empty cells.

=TEXTJOIN(" ",1,AM3:AO3)

Convert PDF to Spreadsheets

The formula would output "2021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539" in the cell where the formula is placed.

Q1:Q12QAQ
31 2021110848G Kajol Gobar ASSISTANT 70000 38700 108700 51861 56839 300.00 0 0 565392021110848G KajolGobarASSISTANT 70000 38700 108700 51861 56839 300 00 0 0 56539
Results shown in cell AQ3

Step 4: IFERROR and SPLIT

The formula uses ArrayFormula to split text from the range AQ3:AQ into columns wherever there’s a space, handling multiple rows at once. IFERROR ensures errors from empty or invalid cells display as blank. Sufficient output space is required to avoid truncation or overwriting.

=ArrayFormula(IFERROR( SPLIT('Convert PDF to Sheet'!AQ3:AQ," ",0,0)))

Convert PDF to Spreadsheets

The formula is valid and effectively splits text into separate parts, provided the output range is adequately prepared.

Step 5: QUERY

The final output process involves converting data from the PDF into Google Sheets and utilizing the QUERY function for advanced operations. This function simplifies filtering, sorting, and summarizing the extracted data, organizing it into a clean and structured format for analysis.

Syntax:

=QUERY(data, query, [headers])

Formula:

=QUERY({'Converted Data Sheet'!E3:Q},"Select * where Col3 is not null",0)

A2:M3ABCDEFGHIJKLMN
2Sr.no.emp-idName and RankBasic PayAlwn.GrossDeductSub NetNG RecO. GpfO. Gise- Salary
312021110848GKajolGobarASSISTANT7000038700108700518615683930000056539
The final results or output, process to convert the pdf data in the Google Sheets

For simple PDFs, data can often be copied directly into Google Sheets without losing formatting. However, complex PDFs with intricate layouts may not translate well through copy-pasting, resulting in altered or lost formatting.

In such cases, the methods and functions in Google Sheets can help with structured data, while specialized online tools are recommended for handling complex tables and ensuring precise extraction.

If you have any questions, need assistance, or encounter challenges, feel free to share your concerns in the comments. The CountLen team is dedicated to providing quick and effective solutions. If you notice any inaccuracies or misleading information, please provide feedback—we’re here to support you!

CountLen Team
CountLen Team

CountLen Team is known for making complex topics accessible. Aiming to bridge the Excel and Google Sheets knowledge gap.

guest
0 Comments
Inline Feedbacks
View all comments